51
|
Stenton SL, Pejaver V, Bergquist T, Biesecker LG, Byrne AB, Nadeau E, Greenblatt MS, Harrison S, Tavtigian S, Radivojac P, Brenner SE, O’Donnell-Luria A. Assessment of the evidence yield for the calibrated PP3/BP4 computational recommendations. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.05.24303807. [PMID: 38496501 PMCID: PMC10942508 DOI: 10.1101/2024.03.05.24303807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Purpose To investigate the number of rare missense variants observed in human genome sequences by ACMG/AMP PP3/BP4 evidence strength, following the calibrated PP3/BP4 computational recommendations. Methods Missense variants from the genome sequences of 300 probands from the Rare Genomes Project with suspected rare disease were analyzed using computational prediction tools able to reach PP3_Strong and BP4_Moderate evidence strengths (BayesDel, MutPred2, REVEL, and VEST4). The numbers of variants at each evidence strength were analyzed across disease-associated genes and genome-wide. Results From a median of 75.5 rare (≤1% allele frequency) missense variants in disease-associated genes per proband, a median of one reached PP3_Strong, 3-5 PP3_Moderate, and 3-5 PP3_Supporting. Most were allocated BP4 evidence (median 41-49 per proband) or were indeterminate (median 17.5-19 per proband). Extending the analysis to all protein-coding genes genome-wide, the number of PP3_Strong variants increased approximately 2.6-fold compared to disease-associated genes, with a median per proband of 1-3 PP3_Strong, 8-16 PP3_Moderate, and 10-17 PP3_Supporting. Conclusion A small number of variants per proband reached PP3_Strong and PP3_Moderate in 3,424 disease-associated genes, and though not the intended use of the recommendations, also genome-wide. Use of PP3/BP4 evidence as recommended from calibrated computational prediction tools in the clinical diagnostic laboratory is unlikely to inappropriately contribute to the classification of an excessive number of variants as Pathogenic or Likely Pathogenic by ACMG/AMP rules.
Collapse
Affiliation(s)
- Sarah L. Stenton
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Timothy Bergquist
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Leslie G. Biesecker
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Alicia B. Byrne
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Emily Nadeau
- Department of Medicine and University of Vermont Cancer Center, University of Vermont, Larner College of Medicine, Burlington, VT 05405, USA
| | - Marc S. Greenblatt
- Department of Medicine and University of Vermont Cancer Center, University of Vermont, Larner College of Medicine, Burlington, VT 05405, USA
| | - Steven Harrison
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Ambry Genetics, Aliso Viejo, CA, USA
| | - Sean Tavtigian
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Anne O’Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
52
|
Wu H, Lin JH, Tang XY, Marenne G, Zou WB, Schutz S, Masson E, Génin E, Fichou Y, Le Gac G, Férec C, Liao Z, Chen JM. Combining full-length gene assay and SpliceAI to interpret the splicing impact of all possible SPINK1 coding variants. Hum Genomics 2024; 18:21. [PMID: 38414044 PMCID: PMC10898081 DOI: 10.1186/s40246-024-00586-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 02/13/2024] [Indexed: 02/29/2024] Open
Abstract
BACKGROUND Single-nucleotide variants (SNVs) within gene coding sequences can significantly impact pre-mRNA splicing, bearing profound implications for pathogenic mechanisms and precision medicine. In this study, we aim to harness the well-established full-length gene splicing assay (FLGSA) in conjunction with SpliceAI to prospectively interpret the splicing effects of all potential coding SNVs within the four-exon SPINK1 gene, a gene associated with chronic pancreatitis. RESULTS Our study began with a retrospective analysis of 27 SPINK1 coding SNVs previously assessed using FLGSA, proceeded with a prospective analysis of 35 new FLGSA-tested SPINK1 coding SNVs, followed by data extrapolation, and ended with further validation. In total, we analyzed 67 SPINK1 coding SNVs, which account for 9.3% of the 720 possible coding SNVs. Among these 67 FLGSA-analyzed SNVs, 12 were found to impact splicing. Through detailed comparison of FLGSA results and SpliceAI predictions, we inferred that the remaining 653 untested coding SNVs in the SPINK1 gene are unlikely to significantly affect splicing. Of the 12 splice-altering events, nine produced both normally spliced and aberrantly spliced transcripts, while the remaining three only generated aberrantly spliced transcripts. These splice-impacting SNVs were found solely in exons 1 and 2, notably at the first and/or last coding nucleotides of these exons. Among the 12 splice-altering events, 11 were missense variants (2.17% of 506 potential missense variants), and one was synonymous (0.61% of 164 potential synonymous variants). Notably, adjusting the SpliceAI cut-off to 0.30 instead of the conventional 0.20 would improve specificity without reducing sensitivity. CONCLUSIONS By integrating FLGSA with SpliceAI, we have determined that less than 2% (1.67%) of all possible coding SNVs in SPINK1 significantly influence splicing outcomes. Our findings emphasize the critical importance of conducting splicing analysis within the broader genomic sequence context of the study gene and highlight the inherent uncertainties associated with intermediate SpliceAI scores (0.20 to 0.80). This study contributes to the field by being the first to prospectively interpret all potential coding SNVs in a disease-associated gene with a high degree of accuracy, representing a meaningful attempt at shifting from retrospective to prospective variant analysis in the era of exome and genome sequencing.
Collapse
Affiliation(s)
- Hao Wu
- Department of Gastroenterology, Changhai Hospital, Naval Medical University, 168 Changhai Road, Shanghai, 200433, China
- Shanghai Institute of Pancreatic Diseases, Shanghai, China
| | - Jin-Huan Lin
- Department of Gastroenterology, Changhai Hospital, Naval Medical University, 168 Changhai Road, Shanghai, 200433, China
- Shanghai Institute of Pancreatic Diseases, Shanghai, China
| | - Xin-Ying Tang
- Shanghai Institute of Pancreatic Diseases, Shanghai, China
- Department of Prevention and Health Care, Eastern Hepatobiliary Surgery Hospital, Naval Medical University, Shanghai, China
| | - Gaëlle Marenne
- Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France
| | - Wen-Bin Zou
- Department of Gastroenterology, Changhai Hospital, Naval Medical University, 168 Changhai Road, Shanghai, 200433, China
- Shanghai Institute of Pancreatic Diseases, Shanghai, China
| | - Sacha Schutz
- Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France
- Service de Génétique Médicale et de Biologie de La Reproduction, CHRU Brest, Brest, France
| | - Emmanuelle Masson
- Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France
- Service de Génétique Médicale et de Biologie de La Reproduction, CHRU Brest, Brest, France
| | | | - Yann Fichou
- Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France
| | - Gerald Le Gac
- Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France
- Service de Génétique Médicale et de Biologie de La Reproduction, CHRU Brest, Brest, France
| | - Claude Férec
- Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France
| | - Zhuan Liao
- Department of Gastroenterology, Changhai Hospital, Naval Medical University, 168 Changhai Road, Shanghai, 200433, China.
- Shanghai Institute of Pancreatic Diseases, Shanghai, China.
| | - Jian-Min Chen
- Univ Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France.
| |
Collapse
|
53
|
Hoge C, de Manuel M, Mahgoub M, Okami N, Fuller Z, Banerjee S, Baker Z, McNulty M, Andolfatto P, Macfarlan TS, Schumer M, Tzika AC, Przeworski M. Patterns of recombination in snakes reveal a tug-of-war between PRDM9 and promoter-like features. Science 2024; 383:eadj7026. [PMID: 38386752 DOI: 10.1126/science.adj7026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 01/04/2024] [Indexed: 02/24/2024]
Abstract
In some mammals, notably humans, recombination occurs almost exclusively where the protein PRDM9 binds, whereas in vertebrates lacking an intact PRDM9, such as birds and canids, recombination rates are elevated near promoter-like features. To determine whether PRDM9 directs recombination in nonmammalian vertebrates, we focused on an exemplar species with a single, intact PRDM9 ortholog, the corn snake (Pantherophis guttatus). Analyzing historical recombination rates along the genome and crossovers in pedigrees, we found evidence that PRDM9 specifies the location of recombination events, but we also detected a separable effect of promoter-like features. These findings reveal that the uses of PRDM9 and promoter-like features need not be mutually exclusive and instead reflect a tug-of-war that is more even in some species than others.
Collapse
Affiliation(s)
- Carla Hoge
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Marc de Manuel
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Mohamed Mahgoub
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Naima Okami
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Zachary Fuller
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Shreya Banerjee
- Department of Biology, Stanford University, Stanford, CA, USA
| | - Zachary Baker
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Morgan McNulty
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Peter Andolfatto
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Todd S Macfarlan
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Molly Schumer
- Department of Biology, Stanford University, Stanford, CA, USA
- Howard Hughes Medical Institute, Stanford, CA, USA
| | - Athanasia C Tzika
- Laboratory of Artificial & Natural Evolution (LANE), Department of Genetics & Evolution, University of Geneva, Geneva, Switzerland
| | - Molly Przeworski
- Department of Biological Sciences, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| |
Collapse
|
54
|
Nakamura T, Ueda J, Mizuno S, Honda K, Kazuno AA, Yamamoto H, Hara T, Takata A. Topologically associating domains define the impact of de novo promoter variants on autism spectrum disorder risk. CELL GENOMICS 2024; 4:100488. [PMID: 38280381 PMCID: PMC10879036 DOI: 10.1016/j.xgen.2024.100488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/24/2023] [Accepted: 01/02/2024] [Indexed: 01/29/2024]
Abstract
Whole-genome sequencing (WGS) studies of autism spectrum disorder (ASD) have demonstrated the roles of rare promoter de novo variants (DNVs). However, most promoter DNVs in ASD are not located immediately upstream of known ASD genes. In this study analyzing WGS data of 5,044 ASD probands, 4,095 unaffected siblings, and their parents, we show that promoter DNVs within topologically associating domains (TADs) containing ASD genes are significantly and specifically associated with ASD. An analysis considering TADs as functional units identified specific TADs enriched for promoter DNVs in ASD and indicated that common variants in these regions also confer ASD heritability. Experimental validation using human induced pluripotent stem cells (iPSCs) showed that likely deleterious promoter DNVs in ASD can influence multiple genes within the same TAD, resulting in overall dysregulation of ASD-associated genes. These results highlight the importance of TADs and gene-regulatory mechanisms in better understanding the genetic architecture of ASD.
Collapse
Affiliation(s)
- Takumi Nakamura
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Junko Ueda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Shota Mizuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Kurara Honda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - An-A Kazuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Hirona Yamamoto
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Tomonori Hara
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Organ Anatomy, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Atsushi Takata
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Research Institute for Diseases of Old Age, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113-8421, Japan.
| |
Collapse
|
55
|
Omenn GS, Lane L, Overall CM, Lindskog C, Pineau C, Packer NH, Cristea IM, Weintraub ST, Orchard S, Roehrl MHA, Nice E, Guo T, Van Eyk JE, Liu S, Bandeira N, Aebersold R, Moritz RL, Deutsch EW. The 2023 Report on the Proteome from the HUPO Human Proteome Project. J Proteome Res 2024; 23:532-549. [PMID: 38232391 PMCID: PMC11026053 DOI: 10.1021/acs.jproteome.3c00591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Since 2010, the Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify the protein parts list and (2) to make proteomics an integral part of multiomics studies of human health and disease. The HPP relies on international collaboration, data sharing, standardized reanalysis of MS data sets by PeptideAtlas and MassIVE-KB using HPP Guidelines for quality assurance, integration and curation of MS and non-MS protein data by neXtProt, plus extensive use of antibody profiling carried out by the Human Protein Atlas. According to the neXtProt release 2023-04-18, protein expression has now been credibly detected (PE1) for 18,397 of the 19,778 neXtProt predicted proteins coded in the human genome (93%). Of these PE1 proteins, 17,453 were detected with mass spectrometry (MS) in accordance with HPP Guidelines and 944 by a variety of non-MS methods. The number of neXtProt PE2, PE3, and PE4 missing proteins now stands at 1381. Achieving the unambiguous identification of 93% of predicted proteins encoded from across all chromosomes represents remarkable experimental progress on the Human Proteome parts list. Meanwhile, there are several categories of predicted proteins that have proved resistant to detection regardless of protein-based methods used. Additionally there are some PE1-4 proteins that probably should be reclassified to PE5, specifically 21 LINC entries and ∼30 HERV entries; these are being addressed in the present year. Applying proteomics in a wide array of biological and clinical studies ensures integration with other omics platforms as reported by the Biology and Disease-driven HPP teams and the antibody and pathology resource pillars. Current progress has positioned the HPP to transition to its Grand Challenge Project focused on determining the primary function(s) of every protein itself and in networks and pathways within the context of human health and disease.
Collapse
Affiliation(s)
- Gilbert S. Omenn
- University of Michigan, Ann Arbor, Michigan 48109, United States
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics and University of Geneva, 1015 Lausanne, Switzerland
| | - Christopher M. Overall
- University of British Columbia, Vancouver, BC V6T 1Z4, Canada, Yonsei University Republic of Korea
| | | | - Charles Pineau
- University Rennes, Inserm U1085, Irset, 35042 Rennes, France
| | | | | | - Susan T. Weintraub
- University of Texas Health Science Center-San Antonio, San Antonio, Texas 78229-3900, United States
| | | | - Michael H. A. Roehrl
- Department of Pathology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, United States
| | | | - Tiannan Guo
- Westlake Center for Intelligent Proteomics, Westlake Laboratory, Westlake University, Hangzhou 310024, Zhejiang Province, China
| | - Jennifer E. Van Eyk
- Advanced Clinical Biosystems Research Institute, Smidt Heart Institute, Cedars-Sinai Medical Center, 127 South San Vicente Boulevard, Pavilion, 9th Floor, Los Angeles, CA, 90048, United States
| | - Siqi Liu
- BGI Group, Shenzhen 518083, China
| | - Nuno Bandeira
- University of California, San Diego, La Jolla, CA, 92093, United States
| | - Ruedi Aebersold
- Institute of Molecular Systems Biology in ETH Zurich, 8092 Zurich, Switzerland
- University of Zurich, 8092 Zurich, Switzerland
| | - Robert L. Moritz
- Institute for Systems Biology, Seattle, Washington 98109, United States
| | - Eric W. Deutsch
- Institute for Systems Biology, Seattle, Washington 98109, United States
| |
Collapse
|
56
|
Lei H, Li J, Zhao B, Kou SH, Xiao F, Chen T, Wang SM. Evolutionary origin of germline pathogenic variants in human DNA mismatch repair genes. Hum Genomics 2024; 18:5. [PMID: 38287404 PMCID: PMC10823654 DOI: 10.1186/s40246-024-00573-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 01/17/2024] [Indexed: 01/31/2024] Open
Abstract
BACKGROUND Mismatch repair (MMR) system is evolutionarily conserved for genome stability maintenance. Germline pathogenic variants (PVs) in MMR genes that lead to MMR functional deficiency are associated with high cancer risk. Knowing the evolutionary origin of germline PVs in human MMR genes will facilitate understanding the biological base of MMR deficiency in cancer. However, systematic knowledge is lacking to address the issue. In this study, we performed a comprehensive analysis to know the evolutionary origin of human MMR PVs. METHODS We retrieved MMR gene variants from the ClinVar database. The genomes of 100 vertebrates were collected from the UCSC genome browser and ancient human sequencing data were obtained through comprehensive data mining. Cross-species conservation analysis was performed based on the phylogenetic relationship among 100 vertebrates. Rescaled ancient sequencing data were used to perform variant calling for archeological analysis. RESULTS Using the phylogenetic approach, we traced the 3369 MMR PVs identified in modern humans in 99 non-human vertebrate genomes but found no evidence for cross-species conservation as the source for human MMR PVs. Using the archeological approach, we searched the human MMR PVs in over 5000 ancient human genomes dated from 45,045 to 100 years before present and identified a group of MMR PVs shared between modern and ancient humans mostly within 10,000 years with similar quantitative patterns. CONCLUSION Our study reveals that MMR PVs in modern humans were arisen within the recent human evolutionary history.
Collapse
Affiliation(s)
- Huijun Lei
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
- Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China
- Department of Cancer Prevention, Zhejiang Cancer Hospital, Hangzhou, 310022, Zhejiang, China
| | - Jiaheng Li
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
| | - Bojin Zhao
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
| | - Si Hoi Kou
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
| | - Fengxia Xiao
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China
| | - Tianhui Chen
- Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, 310018, Zhejiang, China.
- Department of Cancer Prevention, Zhejiang Cancer Hospital, Hangzhou, 310022, Zhejiang, China.
| | - San Ming Wang
- Ministry of Education Frontiers Science Center for Precision Oncology, Cancer Centre and Institute of Translational Medicine, Faculty of Health Sciences, University of Macau, Taipa, Macau SAR, 999078, China.
| |
Collapse
|
57
|
Yu K, Deuitch N, Merguerian M, Cunningham L, Davis J, Bresciani E, Diemer J, Andrews E, Young A, Donovan F, Sood R, Craft K, Chong S, Chandrasekharappa S, Mullikin J, Liu PP. Genomic landscape of patients with germline RUNX1 variants and familial platelet disorder with myeloid malignancy. Blood Adv 2024; 8:497-511. [PMID: 38019014 PMCID: PMC10837196 DOI: 10.1182/bloodadvances.2023011165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 11/07/2023] [Accepted: 11/07/2023] [Indexed: 11/30/2023] Open
Abstract
ABSTRACT Familial platelet disorder with associated myeloid malignancies (FPDMM) is caused by germline RUNX1 mutations and characterized by thrombocytopenia and increased risk of hematologic malignancies. We recently launched a longitudinal natural history study for patients with FPDMM. Among 27 families with research genomic data by the end of 2021, 26 different germline RUNX1 variants were detected. Besides missense mutations enriched in Runt homology domain and loss-of-function mutations distributed throughout the gene, splice-region mutations and large deletions were detected in 6 and 7 families, respectively. In 25 of 51 (49%) patients without hematologic malignancy, somatic mutations were detected in at least 1 of the clonal hematopoiesis of indeterminate potential (CHIP) genes or acute myeloid leukemia (AML) driver genes. BCOR was the most frequently mutated gene (in 9 patients), and multiple BCOR mutations were identified in 4 patients. Mutations in 6 other CHIP- or AML-driver genes (TET2, DNMT3A, KRAS, LRP1B, IDH1, and KMT2C) were also found in ≥2 patients without hematologic malignancy. Moreover, 3 unrelated patients (1 with myeloid malignancy) carried somatic mutations in NFE2, which regulates erythroid and megakaryocytic differentiation. Sequential sequencing data from 19 patients demonstrated dynamic changes of somatic mutations over time, and stable clones were more frequently found in older adult patients. In summary, there are diverse types of germline RUNX1 mutations and high frequency of somatic mutations related to clonal hematopoiesis in patients with FPDMM. Monitoring changes in somatic mutations and clinical manifestations prospectively may reveal mechanisms for malignant progression and inform clinical management. This trial was registered at www.clinicaltrials.gov as #NCT03854318.
Collapse
Affiliation(s)
- Kai Yu
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Natalie Deuitch
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Matthew Merguerian
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
- Department of Pediatrics, Johns Hopkins University School of Medicine, Balltimore, MD
| | - Lea Cunningham
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
- Immune Deficiency Cellular Therapy Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Joie Davis
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Erica Bresciani
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Jamie Diemer
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Elizabeth Andrews
- Immune Deficiency Cellular Therapy Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD
| | - Alice Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Frank Donovan
- Genomics Core, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Raman Sood
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Kathleen Craft
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Shawn Chong
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Settara Chandrasekharappa
- Genomics Core, Division of Intramural Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Jim Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Paul P. Liu
- Oncogenesis and Development Section, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| |
Collapse
|
58
|
Roy G, Syed R, Lazaro O, Robertson S, McCabe SD, Rodriguez D, Mawla AM, Johnson TS, Kalwat MA. Identification of type 2 diabetes- and obesity-associated human β-cells using deep transfer learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576260. [PMID: 38328172 PMCID: PMC10849510 DOI: 10.1101/2024.01.18.576260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Diabetes affects >10% of adults worldwide and is caused by impaired production or response to insulin, resulting in chronic hyperglycemia. Pancreatic islet β-cells are the sole source of endogenous insulin and our understanding of β-cell dysfunction and death in type 2 diabetes (T2D) is incomplete. Single-cell RNA-seq data supports heterogeneity as an important factor in β-cell function and survival. However, it is difficult to identify which β-cell phenotypes are critical for T2D etiology and progression. Our goal was to prioritize specific disease-related β-cell subpopulations to better understand T2D pathogenesis and identify relevant genes for targeted therapeutics. To address this, we applied a deep transfer learning tool, DEGAS, which maps disease associations onto single-cell RNA-seq data from bulk expression data. Independent runs of DEGAS using T2D or obesity status identified distinct β-cell subpopulations. A singular cluster of T2D-associated β-cells was identified; however, β-cells with high obese-DEGAS scores contained two subpopulations derived largely from either non-diabetic or T2D donors. The obesity-associated non-diabetic cells were enriched for translation and unfolded protein response genes compared to T2D cells. We selected DLK1 for validation by immunostaining in human pancreas sections from healthy and T2D donors. DLK1 was heterogeneously expressed among β-cells and appeared depleted from T2D islets. In conclusion, DEGAS has the potential to advance our holistic understanding of the β-cell transcriptomic phenotypes, including features that distinguish β-cells in obese non-diabetic or lean T2D states. Future work will expand this approach to additional human islet omics datasets to reveal the complex multicellular interactions driving T2D.
Collapse
|
59
|
Barbitoff YA, Ushakov MO, Lazareva TE, Nasykhova YA, Glotov AS, Predeus AV. Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges. Brief Bioinform 2024; 25:bbad508. [PMID: 38271481 PMCID: PMC10810331 DOI: 10.1093/bib/bbad508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/18/2023] [Accepted: 12/12/2023] [Indexed: 01/27/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized the field of rare disease diagnostics. Whole exome and whole genome sequencing are now routinely used for diagnostic purposes; however, the overall diagnosis rate remains lower than expected. In this work, we review current approaches used for calling and interpretation of germline genetic variants in the human genome, and discuss the most important challenges that persist in the bioinformatic analysis of NGS data in medical genetics. We describe and attempt to quantitatively assess the remaining problems, such as the quality of the reference genome sequence, reproducible coverage biases, or variant calling accuracy in complex regions of the genome. We also discuss the prospects of switching to the complete human genome assembly or the human pan-genome and important caveats associated with such a switch. We touch on arguably the hardest problem of NGS data analysis for medical genomics, namely, the annotation of genetic variants and their subsequent interpretation. We highlight the most challenging aspects of annotation and prioritization of both coding and non-coding variants. Finally, we demonstrate the persistent prevalence of pathogenic variants in the coding genome, and outline research directions that may enhance the efficiency of NGS-based disease diagnostics.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| | - Mikhail O Ushakov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Tatyana E Lazareva
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Yulia A Nasykhova
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Andrey S Glotov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Alexander V Predeus
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| |
Collapse
|
60
|
Hofman DA, Ruiz-Orera J, Yannuzzi I, Murugesan R, Brown A, Clauser KR, Condurat AL, van Dinter JT, Engels SAG, Goodale A, van der Lugt J, Abid T, Wang L, Zhou KN, Vogelzang J, Ligon KL, Phoenix TN, Roth JA, Root DE, Hubner N, Golub TR, Bandopadhayay P, van Heesch S, Prensner JR. Translation of non-canonical open reading frames as a cancer cell survival mechanism in childhood medulloblastoma. Mol Cell 2024; 84:261-276.e18. [PMID: 38176414 PMCID: PMC10872554 DOI: 10.1016/j.molcel.2023.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/30/2023] [Accepted: 12/01/2023] [Indexed: 01/06/2024]
Abstract
A hallmark of high-risk childhood medulloblastoma is the dysregulation of RNA translation. Currently, it is unknown whether medulloblastoma dysregulates the translation of putatively oncogenic non-canonical open reading frames (ORFs). To address this question, we performed ribosome profiling of 32 medulloblastoma tissues and cell lines and observed widespread non-canonical ORF translation. We then developed a stepwise approach using multiple CRISPR-Cas9 screens to elucidate non-canonical ORFs and putative microproteins implicated in medulloblastoma cell survival. We determined that multiple lncRNA-ORFs and upstream ORFs (uORFs) exhibited selective functionality independent of main coding sequences. A microprotein encoded by one of these ORFs, ASNSD1-uORF or ASDURF, was upregulated, associated with MYC-family oncogenes, and promoted medulloblastoma cell survival through engagement with the prefoldin-like chaperone complex. Our findings underscore the fundamental importance of non-canonical ORF translation in medulloblastoma and provide a rationale to include these ORFs in future studies seeking to define new cancer targets.
Collapse
Affiliation(s)
- Damon A Hofman
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
| | - Jorge Ruiz-Orera
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany
| | - Ian Yannuzzi
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Adam Brown
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Karl R Clauser
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alexandra L Condurat
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Jip T van Dinter
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
| | - Sem A G Engels
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
| | - Amy Goodale
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jasper van der Lugt
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands
| | - Tanaz Abid
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Li Wang
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kevin N Zhou
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Jayne Vogelzang
- Department of Pathology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02215, USA
| | - Keith L Ligon
- Department of Pathology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02215, USA; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02215, USA; Department of Pathology, Boston Children's Hospital, Boston MA 02115, USA
| | - Timothy N Phoenix
- Division of Pharmaceutical Sciences, James L. Winkle College of Pharmacy, University of Cincinnati, Cincinnati, OH 45229, USA
| | - Jennifer A Roth
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - David E Root
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Norbert Hubner
- Cardiovascular and Metabolic Sciences, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), 13125 Berlin, Germany; Charité-Universitätsmedizin, 10117 Berlin, Germany; German Centre for Cardiovascular Research, Partner Site Berlin, 13347 Berlin, Germany
| | - Todd R Golub
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Division of Pediatric Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Pratiti Bandopadhayay
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Division of Pediatric Hematology/Oncology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Sebastiaan van Heesch
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS Utrecht, the Netherlands.
| | - John R Prensner
- Department of Pediatrics, Division of Pediatric Hematology/Oncology and Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| |
Collapse
|
61
|
Dueñas Rey A, Del Pozo Valero M, Bouckaert M, Wood KA, Van den Broeck F, Daich Varela M, Thomas HB, Van Heetvelde M, De Bruyne M, Van de Sompele S, Bauwens M, Lenaerts H, Mahieu Q, Josifova D, Rivolta C, O'Keefe RT, Ellingford J, Webster AR, Arno G, Ayuso C, De Zaeytijd J, Leroy BP, De Baere E, Coppieters F. Combining a prioritization strategy and functional studies nominates 5'UTR variants underlying inherited retinal disease. Genome Med 2024; 16:7. [PMID: 38184646 PMCID: PMC10771650 DOI: 10.1186/s13073-023-01277-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 12/15/2023] [Indexed: 01/08/2024] Open
Abstract
BACKGROUND 5' untranslated regions (5'UTRs) are essential modulators of protein translation. Predicting the impact of 5'UTR variants is challenging and rarely performed in routine diagnostics. Here, we present a combined approach of a comprehensive prioritization strategy and functional assays to evaluate 5'UTR variation in two large cohorts of patients with inherited retinal diseases (IRDs). METHODS We performed an isoform-level re-analysis of retinal RNA-seq data to identify the protein-coding transcripts of 378 IRD genes with highest expression in retina. We evaluated the coverage of their 5'UTRs by different whole exome sequencing (WES) kits. The selected 5'UTRs were analyzed in whole genome sequencing (WGS) and WES data from IRD sub-cohorts from the 100,000 Genomes Project (n = 2397 WGS) and an in-house database (n = 1682 WES), respectively. Identified variants were annotated for 5'UTR-relevant features and classified into seven categories based on their predicted functional consequence. We developed a variant prioritization strategy by integrating population frequency, specific criteria for each category, and family and phenotypic data. A selection of candidate variants underwent functional validation using diverse approaches. RESULTS Isoform-level re-quantification of retinal gene expression revealed 76 IRD genes with a non-canonical retina-enriched isoform, of which 20 display a fully distinct 5'UTR compared to that of their canonical isoform. Depending on the probe design, 3-20% of IRD genes have 5'UTRs fully captured by WES. After analyzing these regions in both cohorts, we prioritized 11 (likely) pathogenic variants in 10 genes (ARL3, MERTK, NDP, NMNAT1, NPHP4, PAX6, PRPF31, PRPF4, RDH12, RD3), of which 7 were novel. Functional analyses further supported the pathogenicity of three variants. Mis-splicing was demonstrated for the PRPF31:c.-9+1G>T variant. The MERTK:c.-125G>A variant, overlapping a transcriptional start site, was shown to significantly reduce both luciferase mRNA levels and activity. The RDH12:c.-123C>T variant was found in cis with the hypomorphic RDH12:c.701G>A (p.Arg234His) variant in 11 patients. This 5'UTR variant, predicted to introduce an upstream open reading frame, was shown to result in reduced RDH12 protein but unaltered mRNA levels. CONCLUSIONS This study demonstrates the importance of 5'UTR variants implicated in IRDs and provides a systematic approach for 5'UTR annotation and validation that is applicable to other inherited diseases.
Collapse
Affiliation(s)
- Alfredo Dueñas Rey
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Marta Del Pozo Valero
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz, University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
| | - Manon Bouckaert
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Katherine A Wood
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
| | - Filip Van den Broeck
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
- Department of Head & Skin, Ghent University, Ghent, Belgium
| | - Malena Daich Varela
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Huw B Thomas
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
| | - Mattias Van Heetvelde
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Marieke De Bruyne
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Stijn Van de Sompele
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Miriam Bauwens
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Hanne Lenaerts
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Quinten Mahieu
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | | | - Carlo Rivolta
- Department of Ophthalmology, University of Basel, Basel, Switzerland
- Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Raymond T O'Keefe
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
| | - Jamie Ellingford
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
- Genomics England, London, UK
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, UK
| | - Andrew R Webster
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Gavin Arno
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Carmen Ayuso
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz, University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - Julie De Zaeytijd
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
- Department of Head & Skin, Ghent University, Ghent, Belgium
| | - Bart P Leroy
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
- Department of Head & Skin, Ghent University, Ghent, Belgium
- Division of Ophthalmology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Elfride De Baere
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Frauke Coppieters
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium.
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium.
- Department of Pharmaceutics, Ghent University, Ghent, Belgium.
| |
Collapse
|
62
|
Behera S, Catreux S, Rossi M, Truong S, Huang Z, Ruehle M, Visvanath A, Parnaby G, Roddey C, Onuchic V, Cameron DL, English A, Mehtalia S, Han J, Mehio R, Sedlazeck FJ. Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.02.573821. [PMID: 38260545 PMCID: PMC10802302 DOI: 10.1101/2024.01.02.573821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Research and medical genomics require comprehensive and scalable solutions to drive the discovery of novel disease targets, evolutionary drivers, and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size (e.g., SNV/SV) or location (e.g., repeats). Here we present DRAGEN that utilizes novel methods based on multigenomes, hardware acceleration, and machine learning based variant detection to provide novel insights into individual genomes with ~30min computation time (from raw reads to variant detection). DRAGEN outperforms all other state-of-the-art methods in speed and accuracy across all variant types (SNV, indel, STR, SV, CNV) and further incorporates specialized methods to obtain key insights in medically relevant genes (e.g., HLA, SMN, GBA). We showcase DRAGEN across 3,202 genomes and demonstrate its scalability, accuracy, and innovations to further advance the integration of comprehensive genomics for research and medical applications.
Collapse
Affiliation(s)
- Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | | | | | | | | | | | | | | | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, TX, USA
- Department of Computer Science, Rice University, TX, USA
| |
Collapse
|
63
|
Harrison PW, Amode MR, Austine-Orimoloye O, Azov A, Barba M, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji SK, Boddu S, Branco Lins PR, Brooks L, Ramaraju S, Campbell L, Martinez MC, Charkhchi M, Chougule K, Cockburn A, Davidson C, De Silva N, Dodiya K, Donaldson S, El Houdaigui B, Naboulsi T, Fatima R, Giron CG, Genez T, Grigoriadis D, Ghattaoraya G, Martinez JG, Gurbich T, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Lodha D, Marques-Coelho D, Maslen G, Merino G, Mirabueno L, Mushtaq A, Hossain S, Ogeh D, Sakthivel MP, Parker A, Perry M, Piližota I, Poppleton D, Prosovetskaia I, Raj S, Pérez-Silva J, Salam A, Saraf S, Saraiva-Agostinho N, Sheppard D, Sinha S, Sipos B, Sitnik V, Stark W, Steed E, Suner MM, Surapaneni L, Sutinen K, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh TA, Ware D, Wass E, Willhoft N, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley G, Keatley J, Loveland J, Moore B, Mudge J, Naamati G, Tate J, Trevanion S, Winterbottom A, Frankish A, Hunt SE, Cunningham F, Dyer S, Finn R, Martin F, Yates A. Ensembl 2024. Nucleic Acids Res 2024; 52:D891-D899. [PMID: 37953337 PMCID: PMC10767893 DOI: 10.1093/nar/gkad1049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/24/2023] [Indexed: 11/14/2023] Open
Abstract
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Collapse
Affiliation(s)
- Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthieu Barba
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Arne Becker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Simarpreet Kaur Bhurji
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Paulo R Branco Lins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lucy Brooks
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shashank Budhanuru Ramaraju
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lahcen I Campbell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manuel Carbajo Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
| | - Alexander Cockburn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nishadi H De Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tamara El Naboulsi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thiago Genez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dionysios Grigoriadis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gurpreet S Ghattaoraya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tatiana A Gurbich
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vinay Kaykala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tuan Le
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Disha Lodha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diego Marques-Coelho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gabriela Alejandra Merino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Louisse Paola Mirabueno
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Aleena Mushtaq
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Syed Nakib Hossain
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Denye N Ogeh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manoj Pandian Sakthivel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Malcolm Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Daniel Poppleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - José G Pérez-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ahamed Imran Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shradha Saraf
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Swati Sinha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Botond Sipos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vasily Sitnik
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - William Stark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kyösti Sutinen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - David Urbina-Gómez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andres Veidenberg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thomas A Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Natalie L Willhoft
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Garth R Ilsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jon Keatley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
64
|
Sayers E, Beck J, Bolton E, Brister J, Chan J, Comeau D, Connor R, DiCuccio M, Farrell C, Feldgarden M, Fine A, Funk K, Hatcher E, Hoeppner M, Kane M, Kannan S, Katz K, Kelly C, Klimke W, Kim S, Kimchi A, Landrum M, Lathrop S, Lu Z, Malheiro A, Marchler-Bauer A, Murphy T, Phan L, Prasad A, Pujar S, Sawyer A, Schmieder E, Schneider V, Schoch C, Sharma S, Thibaud-Nissen F, Trawick B, Venkatapathi T, Wang J, Pruitt K, Sherry S. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2024; 52:D33-D43. [PMID: 37994677 PMCID: PMC10767890 DOI: 10.1093/nar/gkad1044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/20/2023] [Accepted: 10/23/2023] [Indexed: 11/24/2023] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jeff Beck
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jessica Chan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Donald C Comeau
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Michael DiCuccio
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Catherine M Farrell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Michael Feldgarden
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Anna M Fine
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathryn Funk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Eneida Hatcher
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Marilu Hoeppner
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Megan Kane
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sivakumar Kannan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kenneth S Katz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Christopher Kelly
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - William Klimke
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Avi Kimchi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Melissa Landrum
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stacy Lathrop
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Adriana Malheiro
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Lon Phan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Arjun B Prasad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Amanda Sawyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Erin Schmieder
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Conrad L Schoch
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Shobha Sharma
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Barton W Trawick
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Thilakam Venkatapathi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stephen T Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
65
|
Varadi M, Bertoni D, Magana P, Paramval U, Pidruchna I, Radhakrishnan M, Tsenkov M, Nair S, Mirdita M, Yeo J, Kovalevskiy O, Tunyasuvunakool K, Laydon A, Žídek A, Tomlinson H, Hariharan D, Abrahamson J, Green T, Jumper J, Birney E, Steinegger M, Hassabis D, Velankar S. AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Res 2024; 52:D368-D375. [PMID: 37933859 PMCID: PMC10767828 DOI: 10.1093/nar/gkad1011] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/13/2023] [Accepted: 10/18/2023] [Indexed: 11/08/2023] Open
Abstract
The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.
Collapse
Affiliation(s)
- Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Damian Bertoni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Paulyna Magana
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Urmila Paramval
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Ivanna Pidruchna
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Maxim Tsenkov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sreenath Nair
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | - Jingi Yeo
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | | | | | | | | | | | | | | | | | | | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | | | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
66
|
Song Y, Zhang C, Omenn GS, O’Meara MJ, Welch JD. Predicting the Structural Impact of Human Alternative Splicing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572928. [PMID: 38187531 PMCID: PMC10769328 DOI: 10.1101/2023.12.21.572928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Protein structure prediction with neural networks is a powerful new method for linking protein sequence, structure, and function, but structures have generally been predicted for only a single isoform of each gene, neglecting splice variants. To investigate the structural implications of alternative splicing, we used AlphaFold2 to predict the structures of more than 11,000 human isoforms. We employed multiple metrics to identify splicing-induced structural alterations, including template matching score, secondary structure composition, surface charge distribution, radius of gyration, accessibility of post-translational modification sites, and structure-based function prediction. We identified examples of how alternative splicing induced clear changes in each of these properties. Structural similarity between isoforms largely correlated with degree of sequence identity, but we identified a subset of isoforms with low structural similarity despite high sequence similarity. Exon skipping and alternative last exons tended to increase the surface charge and radius of gyration. Splicing also buried or exposed numerous post-translational modification sites, most notably among the isoforms of BAX. Functional prediction nominated numerous functional differences among isoforms of the same gene, with loss of function compared to the reference predominating. Finally, we used single-cell RNA-seq data from the Tabula Sapiens to determine the cell types in which each structure is expressed. Our work represents an important resource for studying the structure and function of splice isoforms across the cell types of the human body.
Collapse
Affiliation(s)
- Yuxuan Song
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Matthew J. O’Meara
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Joshua D. Welch
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
67
|
Smith C, Kitzman JO. Benchmarking splice variant prediction algorithms using massively parallel splicing assays. Genome Biol 2023; 24:294. [PMID: 38129864 PMCID: PMC10734170 DOI: 10.1186/s13059-023-03144-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult. Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes. RESULTS We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms' concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Deep learning-based predictors trained on gene model annotations achieve the best overall performance at distinguishing disruptive and neutral variants, and controlling for overall call rate genome-wide, SpliceAI and Pangolin have superior sensitivity. Finally, our results highlight two practical considerations when scoring variants genome-wide: finding an optimal score cutoff, and the substantial variability introduced by differences in gene model annotation, and we suggest strategies for optimal splice effect prediction in the face of these issues. CONCLUSION SpliceAI and Pangolin show the best overall performance among predictors tested, however, improvements in splice effect prediction are still needed especially within exons.
Collapse
Affiliation(s)
- Cathy Smith
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Jacob O Kitzman
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
68
|
Horste EL, Fansler MM, Cai T, Chen X, Mitschka S, Zhen G, Lee FCY, Ule J, Mayr C. Subcytoplasmic location of translation controls protein output. Mol Cell 2023; 83:4509-4523.e11. [PMID: 38134885 PMCID: PMC11146010 DOI: 10.1016/j.molcel.2023.11.025] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 08/15/2023] [Accepted: 11/21/2023] [Indexed: 12/24/2023]
Abstract
The cytoplasm is highly compartmentalized, but the extent and consequences of subcytoplasmic mRNA localization in non-polarized cells are largely unknown. We determined mRNA enrichment in TIS granules (TGs) and the rough endoplasmic reticulum (ER) through particle sorting and isolated cytosolic mRNAs by digitonin extraction. When focusing on genes that encode non-membrane proteins, we observed that 52% have transcripts enriched in specific compartments. Compartment enrichment correlates with a combinatorial code based on mRNA length, exon length, and 3' UTR-bound RNA-binding proteins. Compartment-biased mRNAs differ in the functional classes of their encoded proteins: TG-enriched mRNAs encode low-abundance proteins with strong enrichment of transcription factors, whereas ER-enriched mRNAs encode large and highly expressed proteins. Compartment localization is an important determinant of mRNA and protein abundance, which is supported by reporter experiments showing that redirecting cytosolic mRNAs to the ER increases their protein expression. In summary, the cytoplasm is functionally compartmentalized by local translation environments.
Collapse
Affiliation(s)
- Ellen L Horste
- Gerstner Sloan Kettering Graduate School of Biomedical Sciences, New York, NY 10065, USA; Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Mervin M Fansler
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA; Tri-Institutional Training Program in Computational Biology and Medicine, Weill-Cornell Graduate College, New York, NY 10021, USA
| | - Ting Cai
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Xiuzhen Chen
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Sibylle Mitschka
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Gang Zhen
- Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA
| | - Flora C Y Lee
- UK Dementia Research Institute, King's College London, London SE5 9NU, UK; The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
| | - Jernej Ule
- UK Dementia Research Institute, King's College London, London SE5 9NU, UK; The Francis Crick Institute, 1 Midland Road, London NW1 1AT, UK
| | - Christine Mayr
- Gerstner Sloan Kettering Graduate School of Biomedical Sciences, New York, NY 10065, USA; Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY 10065, USA; Tri-Institutional Training Program in Computational Biology and Medicine, Weill-Cornell Graduate College, New York, NY 10021, USA.
| |
Collapse
|
69
|
Ma JG, O’Neill MJ, Richardson E, Thomson KL, Ingles J, Muhammad A, Solus JF, Davogustto G, Anderson KC, Benjamin Shoemaker M, Stergachis AB, Floyd BJ, Dunn K, Parikh VN, Chubb H, Perrin MJ, Roden DM, Vandenberg JI, Ng CA, Glazer AM. Multi-site validation of a functional assay to adjudicate SCN5A Brugada Syndrome-associated variants. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.19.23299592. [PMID: 38196587 PMCID: PMC10775332 DOI: 10.1101/2023.12.19.23299592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Brugada Syndrome (BrS) is an inheritable arrhythmia condition that is associated with rare, loss-of-function variants in the cardiac sodium channel gene, SCN5A. Interpreting the pathogenicity of SCN5A missense variants is challenging and ~79% of SCN5A missense variants in ClinVar are currently classified as Variants of Uncertain Significance (VUS). An in vitro SCN5A-BrS automated patch clamp assay was generated for high-throughput functional studies of NaV1.5. The assay was independently studied at two separate research sites - Vanderbilt University Medical Center and Victor Chang Cardiac Research Institute - revealing strong correlations, including peak INa density (R2=0.86). The assay was calibrated according to ClinGen Sequence Variant Interpretation recommendations using high-confidence variant controls (n=49). Normal and abnormal ranges of function were established based on the distribution of benign variant assay results. The assay accurately distinguished benign controls (24/25) from pathogenic controls (23/24). Odds of Pathogenicity values derived from the experimental results yielded 0.042 for normal function (BS3 criterion) and 24.0 for abnormal function (PS3 criterion), resulting in up to strong evidence for both ACMG criteria. The calibrated assay was then used to study SCN5A VUS observed in four families with BrS and other arrhythmia phenotypes associated with SCN5A loss-of-function. The assay revealed loss-of-function for three of four variants, enabling reclassification to likely pathogenic. This validated APC assay provides clinical-grade functional evidence for the reclassification of current VUS and will aid future SCN5A-BrS variant classification.
Collapse
Affiliation(s)
- Joanne G. Ma
- Mark Cowley Lidwill Research Program in Cardiac Electrophysiology, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Darlinghurst, NSW, Australia
| | | | - Ebony Richardson
- Clinical Genomics Laboratory, Centre for Population Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia and Murdoch Children Research Institute, Melbourne, Australia
| | - Kate L. Thomson
- Oxford Genetics Laboratories, Churchill Hospital, Oxford, UK
| | - Jodie Ingles
- Clinical Genomics Laboratory, Centre for Population Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia and Murdoch Children Research Institute, Melbourne, Australia
| | - Ayesha Muhammad
- Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Joseph F. Solus
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Giovanni Davogustto
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Katherine C. Anderson
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - M. Benjamin Shoemaker
- Division of Cardiovascular Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Andrew B. Stergachis
- University of Washington School of Medicine, Department of Medicine, Seattle, WA, USA
| | - Brendan J. Floyd
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, USA
| | - Kyla Dunn
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, USA
| | - Victoria N. Parikh
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, USA
| | - Henry Chubb
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, USA
| | - Mark J. Perrin
- Department of Genomic Medicine, Royal Melbourne Hospital, Victoria, Australia
| | - Dan M. Roden
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Departments of Medicine, Pharmacology, and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jamie I. Vandenberg
- Mark Cowley Lidwill Research Program in Cardiac Electrophysiology, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Darlinghurst, NSW, Australia
| | - Chai-Ann Ng
- Mark Cowley Lidwill Research Program in Cardiac Electrophysiology, Victor Chang Cardiac Research Institute, Darlinghurst, NSW, Australia
- School of Clinical Medicine, UNSW Sydney, Darlinghurst, NSW, Australia
| | - Andrew M. Glazer
- Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART), Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
70
|
McCarley SC, Murphy DA, Thompson J, Shovlin CL. Pharmacogenomic Considerations for Anticoagulant Prescription in Patients with Hereditary Haemorrhagic Telangiectasia. J Clin Med 2023; 12:7710. [PMID: 38137783 PMCID: PMC10744266 DOI: 10.3390/jcm12247710] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/10/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023] Open
Abstract
Hereditary haemorrhagic telangiectasia (HHT) is a vascular dysplasia that commonly results in bleeding but with frequent indications for therapeutic anticoagulation. Our aims were to advance the understanding of drug-specific intolerance and evaluate if there was an indication for pharmacogenomic testing. Genes encoding proteins involved in the absorption, distribution, metabolism, and excretion of warfarin, heparin, and direct oral anticoagulants (DOACs) apixaban, rivaroxaban, edoxaban, and dabigatran were identified and examined. Linkage disequilibrium with HHT genes was excluded, before variants within these genes were examined following whole genome sequencing of general and HHT populations. The 44 genes identified included 5/17 actionable pharmacogenes with guidelines. The 76,156 participants in the Genome Aggregation Database v3.1.2 had 28,446 variants, including 9668 missense substitutions and 1076 predicted loss-of-function (frameshift, nonsense, and consensus splice site) variants, i.e., approximately 1 in 7.9 individuals had a missense substitution, and 1 in 71 had a loss-of-function variant. Focusing on the 17 genes relevant to usually preferred DOACs, similar variant profiles were identified in HHT patients. With HHT patients at particular risk of haemorrhage when undergoing anticoagulant treatment, we explore how pre-emptive pharmacogenomic testing, alongside HHT gene testing, may prove beneficial in reducing the risk of bleeding and conclude that HHT patients are well placed to be at the vanguard of personalised prescribing.
Collapse
Affiliation(s)
- Sarah C. McCarley
- National Heart and Lung Institute, Imperial College London, London W12 0NN, UK; (S.C.M.); (J.T.)
| | - Daniel A. Murphy
- Pharmacy Department, Imperial College Healthcare NHS Trust, London W2 1NY, UK;
- Social, Genetic and Envionmental Determinants of Health Theme, NIHR Imperial Biomedical Research Centre, London W2 1NY, UK
| | - Jack Thompson
- National Heart and Lung Institute, Imperial College London, London W12 0NN, UK; (S.C.M.); (J.T.)
| | - Claire L. Shovlin
- National Heart and Lung Institute, Imperial College London, London W12 0NN, UK; (S.C.M.); (J.T.)
- Social, Genetic and Envionmental Determinants of Health Theme, NIHR Imperial Biomedical Research Centre, London W2 1NY, UK
- Specialist Medicine, Hammersmith Hospital, Imperial College Healthcare NHS Trust, London W12 0HS, UK
| |
Collapse
|
71
|
Radford EJ, Tan HK, Andersson MHL, Stephenson JD, Gardner EJ, Ironfield H, Waters AJ, Gitterman D, Lindsay S, Abascal F, Martincorena I, Kolesnik-Taylor A, Ng-Cordell E, Firth HV, Baker K, Perry JRB, Adams DJ, Gerety SS, Hurles ME. Saturation genome editing of DDX3X clarifies pathogenicity of germline and somatic variation. Nat Commun 2023; 14:7702. [PMID: 38057330 PMCID: PMC10700591 DOI: 10.1038/s41467-023-43041-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 10/30/2023] [Indexed: 12/08/2023] Open
Abstract
Loss-of-function of DDX3X is a leading cause of neurodevelopmental disorders (NDD) in females. DDX3X is also a somatically mutated cancer driver gene proposed to have tumour promoting and suppressing effects. We perform saturation genome editing of DDX3X, testing in vitro the functional impact of 12,776 nucleotide variants. We identify 3432 functionally abnormal variants, in three distinct classes. We train a machine learning classifier to identify functionally abnormal variants of NDD-relevance. This classifier has at least 97% sensitivity and 99% specificity to detect variants pathogenic for NDD, substantially out-performing in silico predictors, and resolving up to 93% of variants of uncertain significance. Moreover, functionally-abnormal variants can account for almost all of the excess nonsynonymous DDX3X somatic mutations seen in DDX3X-driven cancers. Systematic maps of variant effects generated in experimentally tractable cell types have the potential to transform clinical interpretation of both germline and somatic disease-associated variation.
Collapse
Affiliation(s)
- Elizabeth J Radford
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
- Department of Paediatrics, University of Cambridge, Level 8, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - Hong-Kee Tan
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | | | | | - Eugene J Gardner
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | | | | | | | | | | | | | | | - Elise Ng-Cordell
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
- Department of Psychology, University of British Columbia, Vancouver, Canada
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - Kate Baker
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
- Department of Medical Genetics, University of Cambridge, Cambridge, UK
| | - John R B Perry
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | | | | | | |
Collapse
|
72
|
Zhang Q, Shao M. Transcript assembly and annotations: Bias and adjustment. PLoS Comput Biol 2023; 19:e1011734. [PMID: 38127855 PMCID: PMC10769104 DOI: 10.1371/journal.pcbi.1011734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 01/05/2024] [Accepted: 12/04/2023] [Indexed: 12/23/2023] Open
Abstract
Transcript annotations play a critical role in gene expression analysis as they serve as a reference for quantifying isoform-level expression. The two main sources of annotations are RefSeq and Ensembl/GENCODE, but discrepancies between their methodologies and information resources can lead to significant differences. It has been demonstrated that the choice of annotation can have a significant impact on gene expression analysis. Furthermore, transcript assembly is closely linked to annotations, as assembling large-scale available RNA-seq data is an effective data-driven way to construct annotations, and annotations are often served as benchmarks to evaluate the accuracy of assembly methods. However, the influence of different annotations on transcript assembly is not yet fully understood. We investigate the impact of annotations on transcript assembly. Surprisingly, we observe that opposite conclusions can arise when evaluating assemblers with different annotations. To understand this striking phenomenon, we compare the structural similarity of annotations at various levels and find that the primary structural difference across annotations occurs at the intron-chain level. Next, we examine the biotypes of annotated and assembled transcripts and uncover a significant bias towards annotating and assembling transcripts with intron retentions, which explains above the contradictory conclusions. We develop a standalone tool, available at https://github.com/Shao-Group/irtool, that can be combined with an assembler to generate an assembly without intron retentions. We evaluate the performance of such a pipeline and offer guidance to select appropriate assembling tools for different application scenarios.
Collapse
Affiliation(s)
- Qimin Zhang
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Mingfu Shao
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
73
|
Dondi A, Lischetti U, Jacob F, Singer F, Borgsmüller N, Coelho R, Heinzelmann-Schwarz V, Beisel C, Beerenwinkel N. Detection of isoforms and genomic alterations by high-throughput full-length single-cell RNA sequencing in ovarian cancer. Nat Commun 2023; 14:7780. [PMID: 38012143 PMCID: PMC10682465 DOI: 10.1038/s41467-023-43387-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 11/07/2023] [Indexed: 11/29/2023] Open
Abstract
Understanding the complex background of cancer requires genotype-phenotype information in single-cell resolution. Here, we perform long-read single-cell RNA sequencing (scRNA-seq) on clinical samples from three ovarian cancer patients presenting with omental metastasis and increase the PacBio sequencing depth to 12,000 reads per cell. Our approach captures 152,000 isoforms, of which over 52,000 were not previously reported. Isoform-level analysis accounting for non-coding isoforms reveals 20% overestimation of protein-coding gene expression on average. We also detect cell type-specific isoform and poly-adenylation site usage in tumor and mesothelial cells, and find that mesothelial cells transition into cancer-associated fibroblasts in the metastasis, partly through the TGF-β/miR-29/Collagen axis. Furthermore, we identify gene fusions, including an experimentally validated IGF2BP2::TESPA1 fusion, which is misclassified as high TESPA1 expression in matched short-read data, and call mutations confirmed by targeted NGS cancer gene panel results. With these findings, we envision long-read scRNA-seq to become increasingly relevant in oncology and personalized medicine.
Collapse
Affiliation(s)
- Arthur Dondi
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland
| | - Ulrike Lischetti
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland.
- University Hospital Basel and University of Basel, Ovarian Cancer Research, Department of Biomedicine, Hebelstrasse 20, 4031, Basel, Switzerland.
| | - Francis Jacob
- University Hospital Basel and University of Basel, Ovarian Cancer Research, Department of Biomedicine, Hebelstrasse 20, 4031, Basel, Switzerland
| | - Franziska Singer
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland
- ETH Zurich, NEXUS Personalized Health Technologies, Wagistrasse 18, 8952, Schlieren, Switzerland
| | - Nico Borgsmüller
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland
| | - Ricardo Coelho
- University Hospital Basel and University of Basel, Ovarian Cancer Research, Department of Biomedicine, Hebelstrasse 20, 4031, Basel, Switzerland
| | - Viola Heinzelmann-Schwarz
- University Hospital Basel and University of Basel, Ovarian Cancer Research, Department of Biomedicine, Hebelstrasse 20, 4031, Basel, Switzerland
- University Hospital Basel, Gynecological Cancer Center, Spitalstrasse 21, 4031, Basel, Switzerland
| | - Christian Beisel
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland.
| | - Niko Beerenwinkel
- ETH Zurich, Department of Biosystems Science and Engineering, Mattenstrasse 26, 4058, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, Mattenstrasse 26, 4058, Basel, Switzerland.
| |
Collapse
|
74
|
Sanchez-Mete L, Mosciatti L, Casadio M, Vittori L, Martayan A, Stigliano V. MUTYH-associated polyposis: Is it time to change upper gastrointestinal surveillance? A single-center case series and a literature overview. World J Gastrointest Oncol 2023; 15:1891-1899. [DOI: 10.4251/wjgo.v15.i11.1891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/28/2023] [Accepted: 06/13/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND The presence of Spigelman stage (SS) IV duodenal polyposis is considered the most significant risk factor for duodenal cancer in patients with MUTYH-associated polyposis (MAP). However, advanced SS disease is rarely reported in MAP patients, and no clear recommendations on small bowel (SB) surveillance have been proposed in this patient setting.
AIM To research more because that case reports of duodenal cancers in MAP suggest that they may develop in the absence of advanced benign SS disease and often involve the distal portion of the duodenum.
METHODS We describe a series of MAP patients followed up at the Regina Elena National Cancer Institute of Rome (Italy). A literature overview on previously reported SB cancers in MAP is also provided.
RESULTS We identified two (6%) SB adenocarcinomas with no previous history of duodenal polyposis. Our observations, supported by literature evidence, suggest that the formula for staging duodenal polyposis and predicting risk factors for distal duodenum and jejunal cancer may need to be adjusted to take this into account rather than focusing solely on the presence or absence of SS IV disease.
CONCLUSION Our study emphasizes the need for further studies to define appropriate upper gastrointestinal surveillance programs in MAP patients.
Collapse
Affiliation(s)
- Lupe Sanchez-Mete
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Lorenzo Mosciatti
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Marco Casadio
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Luigi Vittori
- Department of Radiological, Oncological and Pathological Sciences, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Aline Martayan
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| | - Vittoria Stigliano
- Gastroenterology and Digestive Endoscopy, Regina Elena National Cancer Institute, IRCCS, Rome 00144, Italy
| |
Collapse
|
75
|
Zhang P, Chaldebas M, Ogishi M, Al Qureshah F, Ponsin K, Feng Y, Rinchai D, Milisavljevic B, Han JE, Moncada-Vélez M, Keles S, Schröder B, Stenson PD, Cooper DN, Cobat A, Boisson B, Zhang Q, Boisson-Dupuis S, Abel L, Casanova JL. Genome-wide detection of human intronic AG-gain variants located between splicing branchpoints and canonical splice acceptor sites. Proc Natl Acad Sci U S A 2023; 120:e2314225120. [PMID: 37931111 PMCID: PMC10655562 DOI: 10.1073/pnas.2314225120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 10/02/2023] [Indexed: 11/08/2023] Open
Abstract
Human genetic variants that introduce an AG into the intronic region between the branchpoint (BP) and the canonical splice acceptor site (ACC) of protein-coding genes can disrupt pre-mRNA splicing. Using our genome-wide BP database, we delineated the BP-ACC segments of all human introns and found extreme depletion of AG/YAG in the [BP+8, ACC-4] high-risk region. We developed AGAIN as a genome-wide computational approach to systematically and precisely pinpoint intronic AG-gain variants within the BP-ACC regions. AGAIN identified 350 AG-gain variants from the Human Gene Mutation Database, all of which alter splicing and cause disease. Among them, 74% created new acceptor sites, whereas 31% resulted in complete exon skipping. AGAIN also predicts the protein-level products resulting from these two consequences. We performed AGAIN on our exome/genomes database of patients with severe infectious diseases but without known genetic etiology and identified a private homozygous intronic AG-gain variant in the antimycobacterial gene SPPL2A in a patient with mycobacterial disease. AGAIN also predicts a retention of six intronic nucleotides that encode an in-frame stop codon, turning AG-gain into stop-gain. This allele was then confirmed experimentally to lead to loss of function by disrupting splicing. We further showed that AG-gain variants inside the high-risk region led to misspliced products, while those outside the region did not, by two case studies in genes STAT1 and IRF7. We finally evaluated AGAIN on our 14 paired exome-RNAseq samples and found that 82% of AG-gain variants in high-risk regions showed evidence of missplicing. AGAIN is publicly available from https://hgidsoft.rockefeller.edu/AGAIN and https://github.com/casanova-lab/AGAIN.
Collapse
Affiliation(s)
- Peng Zhang
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Matthieu Chaldebas
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Masato Ogishi
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Fahd Al Qureshah
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Khoren Ponsin
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Yi Feng
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Darawan Rinchai
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Baptiste Milisavljevic
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Ji Eun Han
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Marcela Moncada-Vélez
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
| | - Sevgi Keles
- Division of Pediatric Allergy and Immunology, Necmettin Erbakan University, Meram Medical Faculty, Konya42080, Turkey
| | - Bernd Schröder
- Institute of Physiological Chemistry, Technische Universität Dresden, Dresden01307, Germany
| | - Peter D. Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, CardiffCF14 4XN, United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, CardiffCF14 4XN, United Kingdom
| | - Aurélie Cobat
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Bertrand Boisson
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Qian Zhang
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Stéphanie Boisson-Dupuis
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Laurent Abel
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
| | - Jean-Laurent Casanova
- St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY10065
- Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Paris75015, France
- Paris Cité University, Imagine Institute, Paris75015, France
- Department of Pediatrics, Necker Hospital for Sick Children, Paris75015, France
- HHMI, New York, NY10065
| |
Collapse
|
76
|
Shinder I, Hu R, Ji HJ, Chao KH, Pertea M. EASTR: Identifying and eliminating systematic alignment errors in multi-exon genes. Nat Commun 2023; 14:7223. [PMID: 37940654 PMCID: PMC10632439 DOI: 10.1038/s41467-023-43017-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 10/30/2023] [Indexed: 11/10/2023] Open
Abstract
Accurate alignment of transcribed RNA to reference genomes is a critical step in the analysis of gene expression, which in turn has broad applications in biomedical research and in the basic sciences. We reveal that widely used splice-aware aligners, such as STAR and HISAT2, can introduce erroneous spliced alignments between repeated sequences, leading to the inclusion of falsely spliced transcripts in RNA-seq experiments. In some cases, the 'phantom' introns resulting from these errors make their way into widely-used genome annotation databases. To address this issue, we present EASTR (Emending Alignments of Spliced Transcript Reads), a software tool that detects and removes falsely spliced alignments or transcripts from alignment and annotation files. EASTR improves the accuracy of spliced alignments across diverse species, including human, maize, and Arabidopsis thaliana, by detecting sequence similarity between intron-flanking regions. We demonstrate that applying EASTR before transcript assembly substantially reduces false positive introns, exons, and transcripts, improving the overall accuracy of assembled transcripts. Additionally, we show that EASTR's application to reference annotation databases can detect and correct likely cases of mis-annotated transcripts.
Collapse
Affiliation(s)
- Ida Shinder
- Cross Disciplinary Graduate Program in Biomedical Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
| | - Richard Hu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Hyun Joo Ji
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kuan-Hao Chao
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
77
|
Sun KY, Bai X, Chen S, Bao S, Kapoor M, Zhang C, Backman J, Joseph T, Maxwell E, Mitra G, Gorovits A, Mansfield A, Boutkov B, Gokhale S, Habegger L, Marcketta A, Locke A, Kessler MD, Sharma D, Staples J, Bovijn J, Gelfman S, Gioia AD, Rajagopal V, Lopez A, Varela JR, Alegre J, Berumen J, Tapia-Conyer R, Kuri-Morales P, Torres J, Emberson J, Collins R, Cantor M, Thornton T, Kang HM, Overton J, Shuldiner AR, Cremona ML, Nafde M, Baras A, Abecasis G, Marchini J, Reid JG, Salerno W, Balasubramanian S. A deep catalog of protein-coding variation in 985,830 individuals. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.09.539329. [PMID: 37214792 PMCID: PMC10197621 DOI: 10.1101/2023.05.09.539329] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.
Collapse
Affiliation(s)
| | | | - Siying Chen
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Suying Bao
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | - Adam Locke
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | | | | | | | | | | - Jesus Alegre
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Jaime Berumen
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Roberto Tapia-Conyer
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Pablo Kuri-Morales
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM)
| | - Jason Torres
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan Emberson
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Rory Collins
- Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | | | | | | | | | | | | | - Mona Nafde
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | | |
Collapse
|
78
|
Cross NCP, Ernst T, Branford S, Cayuela JM, Deininger M, Fabarius A, Kim DDH, Machova Polakova K, Radich JP, Hehlmann R, Hochhaus A, Apperley JF, Soverini S. European LeukemiaNet laboratory recommendations for the diagnosis and management of chronic myeloid leukemia. Leukemia 2023; 37:2150-2167. [PMID: 37794101 PMCID: PMC10624636 DOI: 10.1038/s41375-023-02048-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 09/13/2023] [Accepted: 09/20/2023] [Indexed: 10/06/2023]
Abstract
From the laboratory perspective, effective management of patients with chronic myeloid leukemia (CML) requires accurate diagnosis, assessment of prognostic markers, sequential assessment of levels of residual disease and investigation of possible reasons for resistance, relapse or progression. Our scientific and clinical knowledge underpinning these requirements continues to evolve, as do laboratory methods and technologies. The European LeukemiaNet convened an expert panel to critically consider the current status of genetic laboratory approaches to help diagnose and manage CML patients. Our recommendations focus on current best practice and highlight the strengths and pitfalls of commonly used laboratory tests.
Collapse
Affiliation(s)
| | - Thomas Ernst
- Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
| | - Susan Branford
- Centre for Cancer Biology and SA Pathology, Adelaide, SA, Australia
| | - Jean-Michel Cayuela
- Laboratory of Hematology, University Hospital Saint-Louis, AP-HP and EA3518, Université Paris Cité, Paris, France
| | | | - Alice Fabarius
- III. Medizinische Klinik, Medizinische Fakultät Mannheim, Universität Heidelberg, Mannheim, Germany
| | - Dennis Dong Hwan Kim
- Department of Medical Oncology and Hematology, Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, Canada
| | | | | | - Rüdiger Hehlmann
- III. Medizinische Klinik, Medizinische Fakultät Mannheim, Universität Heidelberg, Mannheim, Germany
- ELN Foundation, Weinheim, Germany
| | - Andreas Hochhaus
- Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
| | - Jane F Apperley
- Centre for Haematology, Imperial College London, London, UK
- Department of Clinical Haematology, Imperial College Healthcare NHS Trust, London, UK
| | - Simona Soverini
- Department of Medical and Surgical Sciences, Institute of Hematology "Lorenzo e Ariosto Seràgnoli", University of Bologna, Bologna, Italy
| |
Collapse
|
79
|
Varabyou A, Sommer MJ, Erdogdu B, Shinder I, Minkin I, Chao KH, Park S, Heinz J, Pockrandt C, Shumate A, Rincon N, Puiu D, Steinegger M, Salzberg SL, Pertea M. CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure. Genome Biol 2023; 24:249. [PMID: 37904256 PMCID: PMC10614308 DOI: 10.1186/s13059-023-03088-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
CHESS 3 represents an improved human gene catalog based on nearly 10,000 RNA-seq experiments across 54 body sites. It significantly improves current genome annotation by integrating the latest reference data and algorithms, machine learning techniques for noise filtering, and new protein structure prediction methods. CHESS 3 contains 41,356 genes, including 19,839 protein-coding genes and 158,377 transcripts, with 14,863 protein-coding transcripts not in other catalogs. It includes all MANE transcripts and at least one transcript for most RefSeq and GENCODE genes. On the CHM13 human genome, the CHESS 3 catalog contains an additional 129 protein-coding genes. CHESS 3 is available at http://ccb.jhu.edu/chess .
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
| | - Markus J Sommer
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Ida Shinder
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Cross Disciplinary Graduate Program in Biomedical Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Ilia Minkin
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Kuan-Hao Chao
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Sukhwan Park
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Jakob Heinz
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Christopher Pockrandt
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Alaina Shumate
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Natalia Rincon
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Daniela Puiu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of Engineering, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
80
|
Ljungdahl A, Kohani S, Page NF, Wells ES, Wigdor EM, Dong S, Sanders SJ. AlphaMissense is better correlated with functional assays of missense impact than earlier prediction algorithms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.24.562294. [PMID: 37961354 PMCID: PMC10634779 DOI: 10.1101/2023.10.24.562294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Missense variants that alter a single amino acid in the encoded protein contribute to many human disorders but pose a substantial challenge in interpretation. Though these variants can be reliably identified through sequencing, distinguishing the clinically significant ones remains difficult, such that "Variants of Unknown Significance" outnumber those classified as "Pathogenic" or "Likely Pathogenic." Numerous in silico approaches have been developed to predict the functional impact of missense variants to inform clinical interpretation, the latest being AlphaMissense, which uses artificial intelligence methods trained on predicted protein structure. To independently assess the performance of AlphaMissense and 38 other predictors of missense severity, we compared predictions to data from multiplexed assays of variant effect (MAVE). MAVE experiments generate almost every possible individual amino acid change in a gene and measure their functional impact using a high-throughput assay. Assessing 17,696 variants across five genes (DDX3X, MSH2, PTEN, KCNQ4, and BRCA1), we find that AlphaMissense is consistently one of the top five algorithms based on correlation with functional impact and is the best-correlated algorithm for two genes. We conclude that AlphaMissense represents the current best-in-class predictor by this metric; however, the improvement over other algorithms is modest. We note that multiple missense predictors, including AlphaMissense, appear to overcall variants as pathogenic despite minimal functional impact and that substantially more high-quality training data, including consistently analyzed patient cohorts and MAVE analyses, are required to improve accuracy.
Collapse
Affiliation(s)
- Alicia Ljungdahl
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Sayeh Kohani
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
| | - Nicholas F. Page
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Eloise S. Wells
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
| | - Emilie M. Wigdor
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
| | - Shan Dong
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Stephan J. Sanders
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
- New York Genome Center, New York, NY 10013, USA
| |
Collapse
|
81
|
Kubota N, Takeda R, Kobayashi J, Hidaka E, Nishi E, Takano K, Wakui K. Reanalysis of Chromosomal Microarray Data Using a Smaller Copy Number Variant Call Threshold Identifies Four Cases with Heterozygous Multiexon Deletions of ARID1B, EHMT1, and FOXP1 Genes. Mol Syndromol 2023; 14:394-404. [PMID: 37901861 PMCID: PMC10601822 DOI: 10.1159/000530252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 03/16/2023] [Indexed: 10/31/2023] Open
Abstract
Introduction Chromosomal microarray (CMA) is a highly accurate and established method for detecting copy number variations (CNVs) in clinical genetic testing. CNVs are important etiological factors for disorders such as intellectual disability, developmental delay, and multiple congenital anomalies. Recently developed analytical methods have facilitated the identification of smaller CNVs. Therefore, reanalyzing CMA data using a smaller CNV calling threshold may yield useful information. However, this method was left to the discretion of each institution. Methods We reanalyzed the CMA data of 131 patients using a smaller CNV call threshold: 50 kb 50 probes for gain and 25 kb 25 probes for loss. We interpreted the reanalyzed CNVs based on the most recently available information. In the reanalysis, we filtered the data using the Clinical Genome Resource dosage sensitivity gene list as an index to quickly and efficiently check morbid genes. Results The number of copy number loss was approximately 20 times greater, and copy number gain was approximately three times greater compared to those in the previous analysis. We detected new likely pathogenic CNVs in four participants: a 236.5 kb loss within ARID1B, a 50.6 kb loss including EHMT1, a 46.5 kb loss including EHMT1, and an 89.1 kb loss within the FOXP1 gene. Conclusion The method employed in this study is simple and effective for CMA data reanalysis using a smaller CNV call threshold. Thus, this method is efficient for both ongoing and repeated analyses. This study may stimulate further discussion of reanalysis methodology in clinical laboratories.
Collapse
Affiliation(s)
- Noriko Kubota
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
| | - Ryojun Takeda
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
- Division of Medical Genetics, Nagano Children’s Hospital, Azumino, Japan
| | - Jun Kobayashi
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
| | - Eiko Hidaka
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
| | - Eriko Nishi
- Division of Medical Genetics, Nagano Children’s Hospital, Azumino, Japan
| | - Kyoko Takano
- Division of Medical Genetics, Nagano Children’s Hospital, Azumino, Japan
- Department of Medical Genetics, Shinshu University School of Medicine, Matsumoto, Japan
- Center for Medical Genetics, Shinshu University Hospital, Matsumoto, Japan
| | - Keiko Wakui
- Life Science Research Center, Nagano Children’s Hospital, Azumino, Japan
- Department of Medical Genetics, Shinshu University School of Medicine, Matsumoto, Japan
- Center for Medical Genetics, Shinshu University Hospital, Matsumoto, Japan
| |
Collapse
|
82
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. Nature 2023; 622:41-47. [PMID: 37794265 PMCID: PMC10575709 DOI: 10.1038/s41586-023-06490-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 07/27/2023] [Indexed: 10/06/2023]
Abstract
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, Sao Paulo, Brazil
| | | | - Francisco M De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
- Tempus Labs, Chicago, IL, USA
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Royston, UK
| | - Artemis G Hatzigeorgiou
- Department of Computer Science and Biomedical Informatics, Universithy of Thessaly, Lamia, Greece
- Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
- Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland
- Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
- Department for BioMedical Research, University of Bern, Bern, Switzerland
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Human Technopole, Milan, Italy.
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
83
|
Lang M, Kazdal D, Mohr I, Anamaterou C. Differences and similarities of GTF2I mutated thymomas in different Eurasian ethnic groups. Transl Lung Cancer Res 2023; 12:1842-1844. [PMID: 37854159 PMCID: PMC10579828 DOI: 10.21037/tlcr-23-396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 09/06/2023] [Indexed: 10/20/2023]
Affiliation(s)
- Matthias Lang
- Department of General, Visceral, and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
| | - Daniel Kazdal
- Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany
- Translational Lung Research Center (TLRC) Heidelberg, German Center for Lung Research (DZL), Heidelberg, Germany
| | - Isabelle Mohr
- Department of Internal Medicine IV, Department of Gastroenterology, University Hospital Heidelberg, Heidelberg, Germany
| | | |
Collapse
|
84
|
Martin-Geary AC, Blakes AJM, Dawes R, Findlay SD, Lord J, Walker S, Talbot-Martin J, Wieder N, D’Souza EN, Fernandes M, Hilton S, Lahiri N, Campbell C, Jenkinson S, DeGoede CGEL, Anderson ER, Burge CB, Sanders SJ, Ellingford J, Baralle D, Banka S, Whiffin N. Systematic identification of disease-causing promoter and untranslated region variants in 8,040 undiagnosed individuals with rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.12.23295416. [PMID: 37745552 PMCID: PMC10516070 DOI: 10.1101/2023.09.12.23295416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Background Both promoters and untranslated regions (UTRs) have critical regulatory roles, yet variants in these regions are largely excluded from clinical genetic testing due to difficulty in interpreting pathogenicity. The extent to which these regions may harbour diagnoses for individuals with rare disease is currently unknown. Methods We present a framework for the identification and annotation of potentially deleterious proximal promoter and UTR variants in known dominant disease genes. We use this framework to annotate de novo variants (DNVs) in 8,040 undiagnosed individuals in the Genomics England 100,000 genomes project, which were subject to strict region-based filtering, clinical review, and validation studies where possible. In addition, we performed region and variant annotation-based burden testing in 7,862 unrelated probands against matched unaffected controls. Results We prioritised eleven DNVs and identified an additional variant overlapping one of the eleven. Ten of these twelve variants (82%) are in genes that are a strong match to the individual's phenotype and six had not previously been identified. Through burden testing, we did not observe a significant enrichment of potentially deleterious promoter and/or UTR variants in individuals with rare disease collectively across any of our region or variant annotations. Conclusions Overall, we demonstrate the value of screening promoters and UTRs to uncover additional diagnoses for previously undiagnosed individuals with rare disease and provide a framework for doing so without dramatically increasing interpretation burden.
Collapse
Affiliation(s)
- Alexandra C Martin-Geary
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Alexander J M Blakes
- Manchester Centre for Genomic Medicine, Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Ruebena Dawes
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Scott D Findlay
- Department of Biology, Massachusetts Institute of Technology, Cambridge, USA
| | | | | | | | - Nechama Wieder
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Elston N D’Souza
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Maria Fernandes
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
| | - Sarah Hilton
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Nayana Lahiri
- St George’s, University of London & St George’s University Hospitals NHS Foundation Trust, Institute of Molecular and Clinical Sciences, London, SW17 0QT, UK
| | - Christopher Campbell
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Sarah Jenkinson
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Christian G E L DeGoede
- Department of Paediatric Neurology, Clinical research Facility, Lancashire Teaching Hospitals NHS Trust
- Manchester Metropolitan University
| | - Emily R Anderson
- Liverpool Centre for Genomic Medicine, Liverpool Women’s Hospital, Liverpool, UK
| | | | - Stephan J Sanders
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA 94158, USA
- New York Genome Center, New York, NY, USA
| | - Jamie Ellingford
- Manchester Centre for Genomic Medicine, Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Diana Baralle
- School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Siddharth Banka
- Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester M13 9WL, UK
| | - Nicola Whiffin
- Big Data Institute, University of Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, UK
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
85
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.08.23295253. [PMID: 37732177 PMCID: PMC10508837 DOI: 10.1101/2023.09.08.23295253] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.
Collapse
Affiliation(s)
- Jayoung Ryu
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Sam Barkal
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tian Yu
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Yunzhuo Zhou
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Matthew Francoeur
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Quang Vinh Phan
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhijian Li
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Manuel Tognon
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science Department, University of Verona, Verona, Italy
| | - Lara Brown
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael I. Love
- Department of Genetics, Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC
| | - Guillaume Lettre
- Montreal Heart Institute, Montréal, QC H1T 1C8, Canada
- Faculté de Médecine, Université de Montréal, Montréal, QC H3T 1J4, Canada
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Christopher A. Cassa
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Richard I. Sherwood
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
86
|
Bohn E, Lau TTY, Wagih O, Masud T, Merico D. A curated census of pathogenic and likely pathogenic UTR variants and evaluation of deep learning models for variant effect prediction. Front Mol Biosci 2023; 10:1257550. [PMID: 37745687 PMCID: PMC10517338 DOI: 10.3389/fmolb.2023.1257550] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Variants in 5' and 3' untranslated regions (UTR) contribute to rare disease. While predictive algorithms to assist in classifying pathogenicity can potentially be highly valuable, the utility of these tools is often unclear, as it depends on carefully selected training and validation conditions. To address this, we developed a high confidence set of pathogenic (P) and likely pathogenic (LP) variants and assessed deep learning (DL) models for predicting their molecular effects. Methods: 3' and 5' UTR variants documented as P or LP (P/LP) were obtained from ClinVar and refined by reviewing the annotated variant effect and reassessing evidence of pathogenicity following published guidelines. Prediction scores from sequence-based DL models were compared between three groups: P/LP variants acting though the mechanism for which the model was designed (model-matched), those operating through other mechanisms (model-mismatched), and putative benign variants. PhyloP was used to compare conservation scores between P/LP and putative benign variants. Results: 295 3' and 188 5' UTR variants were obtained from ClinVar, of which 26 3' and 68 5' UTR variants were classified as P/LP. Predictions by DL models achieved statistically significant differences when comparing modelmatched P/LP variants to both putative benign variants and modelmismatched P/LP variants, as well as when comparing all P/LP variants to putative benign variants. PhyloP conservation scores were significantly higher among P/LP compared to putative benign variants for both the 3' and 5' UTR. Discussion: In conclusion, we present a high-confidence set of P/LP 3' and 5' UTR variants spanning a range of mechanisms and supported by detailed pathogenicity and molecular mechanism evidence curation. Predictions from DL models further substantiate these classifications. These datasets will support further development and validation of DL algorithms designed to predict the functional impact of variants that may be implicated in rare disease.
Collapse
Affiliation(s)
- Emma Bohn
- Deep Genomics Inc., Toronto, ON, Canada
| | | | | | | | - Daniele Merico
- Deep Genomics Inc., Toronto, ON, Canada
- The Centre for Applied Genomics, Hospital for Sick Children, Toronto, ON, Canada
| |
Collapse
|
87
|
Kerimov N, Tambets R, Hayhurst JD, Rahu I, Kolberg P, Raudvere U, Kuzmin I, Chowdhary A, Vija A, Teras HJ, Kanai M, Ulirsch J, Ryten M, Hardy J, Guelfi S, Trabzuni D, Kim-Hellmuth S, Rayner W, Finucane H, Peterson H, Mosaku A, Parkinson H, Alasoo K. eQTL Catalogue 2023: New datasets, X chromosome QTLs, and improved detection and visualisation of transcript-level QTLs. PLoS Genet 2023; 19:e1010932. [PMID: 37721944 PMCID: PMC10538656 DOI: 10.1371/journal.pgen.1010932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 09/28/2023] [Accepted: 08/22/2023] [Indexed: 09/20/2023] Open
Abstract
The eQTL Catalogue is an open database of uniformly processed human molecular quantitative trait loci (QTLs). We are continuously updating the resource to further increase its utility for interpreting genetic associations with complex traits. Over the past two years, we have increased the number of uniformly processed studies from 21 to 31 and added X chromosome QTLs for 19 compatible studies. We have also implemented Leafcutter to directly identify splice-junction usage QTLs in all RNA sequencing datasets. Finally, to improve the interpretability of transcript-level QTLs, we have developed static QTL coverage plots that visualise the association between the genotype and average RNA sequencing read coverage in the region for all 1.7 million fine mapped associations. To illustrate the utility of these updates to the eQTL Catalogue, we performed colocalisation analysis between vitamin D levels in the UK Biobank and all molecular QTLs in the eQTL Catalogue. Although most GWAS loci colocalised both with eQTLs and transcript-level QTLs, we found that visual inspection could sometimes be used to distinguish primary splicing QTLs from those that appear to be secondary consequences of large-effect gene expression QTLs. While these visually confirmed primary splicing QTLs explain just 6/53 of the colocalising signals, they are significantly less pleiotropic than eQTLs and identify a prioritised causal gene in 4/6 cases.
Collapse
Affiliation(s)
- Nurlan Kerimov
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ralf Tambets
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - James D. Hayhurst
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Ida Rahu
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Peep Kolberg
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Uku Raudvere
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Ivan Kuzmin
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Anshika Chowdhary
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Andreas Vija
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Hans J. Teras
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jacob Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Mina Ryten
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - John Hardy
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Sebastian Guelfi
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Daniah Trabzuni
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, United Kingdom
| | - Sarah Kim-Hellmuth
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
- Department of Pediatrics, Dr. von Hauner Children’s Hospital, University Hospital LMU Munich, Munich, Germany
| | - William Rayner
- Institute of Translational Genomics, Helmholtz Munich, Neuherberg, Germany
| | - Hilary Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Hedi Peterson
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | - Abayomi Mosaku
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Helen Parkinson
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Kaur Alasoo
- Institute of Computer Science, University of Tartu, Tartu, Estonia
- Open Targets, South Building, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
88
|
Hayesmoore JB, Bhuiyan ZA, Coviello DA, du Sart D, Edwards M, Iascone M, Morris-Rosendahl DJ, Sheils K, van Slegtenhorst M, Thomson KL. EMQN: Recommendations for genetic testing in inherited cardiomyopathies and arrhythmias. Eur J Hum Genet 2023; 31:1003-1009. [PMID: 37443332 PMCID: PMC10474043 DOI: 10.1038/s41431-023-01421-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/21/2023] [Accepted: 06/22/2023] [Indexed: 07/15/2023] Open
Abstract
Inherited cardiomyopathies and arrhythmias (ICAs) are a prevalent and clinically heterogeneous group of genetic disorders that are associated with increased risk of sudden cardiac death and heart failure. Making a genetic diagnosis can inform the management of patients and their at-risk relatives and, as such, molecular genetic testing is now considered an integral component of the clinical care pathway. However, ICAs are characterised by high genetic and allelic heterogeneity, incomplete / age-related penetrance, and variable expressivity. Therefore, despite our improved understanding of the genetic basis of these conditions, and significant technological advances over the past two decades, identifying and recognising the causative genotype remains challenging. As clinical genetic testing for ICAs becomes more widely available, it is increasingly important for clinical laboratories to consolidate existing knowledge and experience to inform and improve future practice. These recommendations have been compiled to help clinical laboratories navigate the challenges of ICAs and thereby facilitate best practice and consistency in genetic test provision for this group of disorders. General recommendations on internal and external quality control, referral, analysis, result interpretation, and reporting are described. Also included are appendices that provide specific information pertinent to genetic testing for hypertrophic, dilated, and arrhythmogenic right ventricular cardiomyopathies, long QT syndrome, Brugada syndrome, and catecholaminergic polymorphic ventricular tachycardia.
Collapse
Affiliation(s)
- Jesse B Hayesmoore
- Oxford Regional Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Zahurul A Bhuiyan
- Division of Genetic Medicine, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne, Switzerland
| | | | - Desirée du Sart
- Biological Sciences and Genomics, Monash University, Melbourne, VIC, Australia
| | - Matthew Edwards
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | - Maria Iascone
- Laboratorio di Genetica Medica, ASST Papa Giovanni XXIII, Bergamo, Italy
| | - Deborah J Morris-Rosendahl
- Clinical Genetics and Genomics Laboratory, Royal Brompton and Harefield Hospitals, Guy's and St. Thomas' NHS Foundation Trust, London, UK
| | | | | | - Kate L Thomson
- Oxford Regional Genetics Laboratories, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
| |
Collapse
|
89
|
Lee H, Greer SU, Pavlichin DS, Zhou B, Urban AE, Weissman T, Ji HP. Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. CELL REPORTS METHODS 2023; 3:100543. [PMID: 37671027 PMCID: PMC10475782 DOI: 10.1016/j.crmeth.2023.100543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 04/14/2023] [Accepted: 07/06/2023] [Indexed: 09/07/2023]
Abstract
The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.
Collapse
Affiliation(s)
- HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Stephanie U. Greer
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Dmitri S. Pavlichin
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Bo Zhou
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Alexander E. Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Tsachy Weissman
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Electrical Engineering, Stanford University, Palo Alto, CA 94304, USA
| |
Collapse
|
90
|
Korbecki J, Bosiacki M, Chlubek D, Baranowska-Bosiacka I. Bioinformatic Analysis of the CXCR2 Ligands in Cancer Processes. Int J Mol Sci 2023; 24:13287. [PMID: 37686093 PMCID: PMC10487711 DOI: 10.3390/ijms241713287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 08/23/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023] Open
Abstract
Human CXCR2 has seven ligands, i.e., CXCL1, CXCL2, CXCL3, CXCL5, CXCL6, CXCL7, and CXCL8/IL-8-chemokines with nearly identical properties. However, no available study has compared the contribution of all CXCR2 ligands to cancer progression. That is why, in this study, we conducted a bioinformatic analysis using the GEPIA, UALCAN, and TIMER2.0 databases to investigate the role of CXCR2 ligands in 31 different types of cancer, including glioblastoma, melanoma, and colon, esophageal, gastric, kidney, liver, lung, ovarian, pancreatic, and prostate cancer. We focused on the differences in the regulation of expression (using the Tfsitescan and miRDB databases) and analyzed mutation types in CXCR2 ligand genes in cancers (using the cBioPortal). The data showed that the effect of CXCR2 ligands on prognosis depends on the type of cancer. CXCR2 ligands were associated with EMT, angiogenesis, recruiting neutrophils to the tumor microenvironment, and the count of M1 macrophages. The regulation of the expression of each CXCR2 ligand was different and, thus, each analyzed chemokine may have a different function in cancer processes. Our findings suggest that each type of cancer has a unique pattern of CXCR2 ligand involvement in cancer progression, with each ligand having a unique regulation of expression.
Collapse
Affiliation(s)
- Jan Korbecki
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
- Department of Anatomy and Histology, Collegium Medicum, University of Zielona Góra, Zyty 28 St., 65-046 Zielona Góra, Poland
| | - Mateusz Bosiacki
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
- Department of Functional Diagnostics and Physical Medicine, Faculty of Health Sciences, Pomeranian Medical University in Szczecin, Żołnierska Str. 54, 71-210 Szczecin, Poland
| | - Dariusz Chlubek
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
| | - Irena Baranowska-Bosiacka
- Department of Biochemistry and Medical Chemistry, Pomeranian Medical University in Szczecin, Powstańców Wlkp. 72, 70-111 Szczecin, Poland; (J.K.); (M.B.); (D.C.)
| |
Collapse
|
91
|
Abstract
DNA sequencing has revolutionized medicine over recent decades. However, analysis of large structural variation and repetitive DNA, a hallmark of human genomes, has been limited by short-read technology, with read lengths of 100-300 bp. Long-read sequencing (LRS) permits routine sequencing of human DNA fragments tens to hundreds of kilobase pairs in size, using both real-time sequencing by synthesis and nanopore-based direct electronic sequencing. LRS permits analysis of large structural variation and haplotypic phasing in human genomes and has enabled the discovery and characterization of rare pathogenic structural variants and repeat expansions. It has also recently enabled the assembly of a complete, gapless human genome that includes previously intractable regions, such as highly repetitive centromeres and homologous acrocentric short arms. With the addition of protocols for targeted enrichment, direct epigenetic DNA modification detection, and long-range chromatin profiling, LRS promises to launch a new era of understanding of genetic diversity and pathogenic mutations in human populations.
Collapse
Affiliation(s)
- Peter E Warburton
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Robert P Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; ,
- Center for Advanced Genomics Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
92
|
Foreman J, Perrett D, Mazaika E, Hunt SE, Ware JS, Firth HV. DECIPHER: Improving Genetic Diagnosis Through Dynamic Integration of Genomic and Clinical Data. Annu Rev Genomics Hum Genet 2023; 24:151-176. [PMID: 37285546 PMCID: PMC7615097 DOI: 10.1146/annurev-genom-102822-100509] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
DECIPHER (Database of Genomic Variation and Phenotype in Humans Using Ensembl Resources) shares candidate diagnostic variants and phenotypic data from patients with genetic disorders to facilitate research and improve the diagnosis, management, and therapy of rare diseases. The platform sits at the boundary between genomic research and the clinical community. DECIPHER aims to ensure that the most up-to-date data are made rapidly available within its interpretation interfaces to improve clinical care. Newly integrated cardiac case-control data that provide evidence of gene-disease associations and inform variant interpretation exemplify this mission. New research resources are presented in a format optimized for use by a broad range of professionals supporting the delivery of genomic medicine. The interfaces within DECIPHER integrate and contextualize variant and phenotypic data, helping to determine a robust clinico-molecular diagnosis for rare-disease patients, which combines both variant classification and clinical fit. DECIPHER supports discovery research, connecting individuals within the rare-disease community to pursue hypothesis-driven research.
Collapse
Affiliation(s)
- Julia Foreman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Daniel Perrett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
- Wellcome Sanger Institute, Hinxton, United Kingdom
| | - Erica Mazaika
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom; ,
| | - James S Ware
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Imperial College London, London, United Kingdom; ,
- Royal Brompton and Harefield Hospitals, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, United Kingdom
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge, United Kingdom;
| |
Collapse
|
93
|
Brovkina MV, Chapman MA, Holding ML, Clowney EJ. Emergence and influence of sequence bias in evolutionarily malleable, mammalian tandem arrays. BMC Biol 2023; 21:179. [PMID: 37612705 PMCID: PMC10463633 DOI: 10.1186/s12915-023-01673-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 08/01/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND The radiation of mammals at the extinction of the dinosaurs produced a plethora of new forms-as diverse as bats, dolphins, and elephants-in only 10-20 million years. Behind the scenes, adaptation to new niches is accompanied by extensive innovation in large families of genes that allow animals to contact the environment, including chemosensors, xenobiotic enzymes, and immune and barrier proteins. Genes in these "outward-looking" families are allelically diverse among humans and exhibit tissue-specific and sometimes stochastic expression. RESULTS Here, we show that these tandem arrays of outward-looking genes occupy AT-biased isochores and comprise the "tissue-specific" gene class that lack CpG islands in their promoters. Models of mammalian genome evolution have not incorporated the sharply different functions and transcriptional patterns of genes in AT- versus GC-biased regions. To examine the relationship between gene family expansion, sequence content, and allelic diversity, we use population genetic data and comparative analysis. First, we find that AT bias can emerge during evolutionary expansion of gene families in cis. Second, human genes in AT-biased isochores or with GC-poor promoters experience relatively low rates of de novo point mutation today but are enriched for non-synonymous variants. Finally, we find that isochores containing gene clusters exhibit low rates of recombination. CONCLUSIONS Our analyses suggest that tolerance of non-synonymous variation and low recombination are two forces that have produced the depletion of GC bases in outward-facing gene arrays. In turn, high AT content exerts a profound effect on their chromatin organization and transcriptional regulation.
Collapse
Affiliation(s)
- Margarita V Brovkina
- Graduate Program in Cellular and Molecular Biology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Margaret A Chapman
- Neurosciences Graduate Program, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - E Josephine Clowney
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI, USA.
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
94
|
Ahles A, Engelhardt S. Genetic Variants of Adrenoceptors. Handb Exp Pharmacol 2023. [PMID: 37578621 DOI: 10.1007/164_2023_676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Adrenoceptors are class A G-protein-coupled receptors grouped into three families (α1-, α2-, and β-adrenoceptors), each one including three members. All nine corresponding adrenoceptor genes display genetic variation in their coding and adjacent non-coding genomic region. Coding variants, i.e., nucleotide exchanges within the transcribed and translated receptor sequence, may result in a difference in amino acid sequence thus altering receptor function and signaling. Such variants have been intensely studied in vitro in overexpression systems and addressed in candidate-gene studies for distinct clinical parameters. In recent years, large cohorts were analyzed in genome-wide association studies (GWAS), where variants are detected as significant in context with specific traits. These studies identified two of the in-depth characterized 18 coding variants in adrenoceptors as repeatedly statistically significant genetic risk factors - p.Arg389Gly in the β1- and p.Thr164Ile in the β2-adrenoceptor, along with 56 variants in the non-coding regions adjacent to the adrenoceptor gene loci, the functional role of which is largely unknown at present. This chapter summarizes current knowledge on the two coding variants in adrenoceptors that have been consistently validated in GWAS and provides a prospective overview on the numerous non-coding variants more recently attributed to adrenoceptor gene loci.
Collapse
Affiliation(s)
- Andrea Ahles
- Institute of Pharmacology and Toxicology, Technical University of Munich (TUM), Munich, Germany
| | - Stefan Engelhardt
- Institute of Pharmacology and Toxicology, Technical University of Munich (TUM), Munich, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site Munich Heart Alliance, Munich, Germany.
| |
Collapse
|
95
|
Abstract
Within the next decade, the genomes of 1.8 million eukaryotic species will be sequenced. Identifying genes in these sequences is essential to understand the biology of the species. This is challenging due to the transcriptional complexity of eukaryotic genomes, which encode hundreds of thousands of transcripts of multiple types. Among these, a small set of protein-coding mRNAs play a disproportionately large role in defining phenotypes. Due to their sequence conservation, orthology can be established, making it possible to define the universal catalog of eukaryotic protein-coding genes. This catalog should substantially contribute to uncovering the genomic events underlying the emergence of eukaryotic phenotypes. This piece briefly reviews the basics of protein-coding gene prediction, discusses challenges in finalizing annotation of the human genome, and proposes strategies for producing annotations across the eukaryotic Tree of Life. This lays the groundwork for obtaining the catalog of all genes-the Earth's code of life.
Collapse
Affiliation(s)
- Roderic Guigó
- Bioinformatics and Genomics, Center for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology (BIST), Dr. Aiguader 88, 08003 Barcelona, Catalonia
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia
| |
Collapse
|
96
|
Wang L. Reference-guided search for open reading frames. NATURE COMPUTATIONAL SCIENCE 2023; 3:667-668. [PMID: 38177317 DOI: 10.1038/s43588-023-00497-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
Affiliation(s)
- Liguo Wang
- Division of Computational Biology, Mayo Clinic College of Medicine and Science, Rochester, MN, USA.
| |
Collapse
|
97
|
Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023; 22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open
Abstract
The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.
Collapse
Affiliation(s)
- E H Bowler-Barnett
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Fan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - J Luo
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M Magrane
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - S Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
98
|
Varabyou A, Erdogdu B, Salzberg SL, Pertea M. Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage. NATURE COMPUTATIONAL SCIENCE 2023; 3:700-708. [PMID: 38098813 PMCID: PMC10718564 DOI: 10.1038/s43588-023-00496-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 07/05/2023] [Indexed: 12/17/2023]
Abstract
ORFanage is a system designed to assign open reading frames (ORFs) to known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Steven L. Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
99
|
Chao KH, Mao A, Salzberg SL, Pertea M. Splam: a deep-learning-based splice site predictor that improves spliced alignments. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550754. [PMID: 37546880 PMCID: PMC10402160 DOI: 10.1101/2023.07.27.550754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The process of splicing messenger RNA to remove introns plays a central role in creating genes and gene variants. Here we describe Splam, a novel method for predicting splice junctions in DNA based on deep residual convolutional neural networks. Unlike some previous models, Splam looks at a relatively limited window of 400 base pairs flanking each splice site, motivated by the observation that the biological process of splicing relies primarily on signals within this window. Additionally, Splam introduces the idea of training the network on donor and acceptor pairs together, based on the principle that the splicing machinery recognizes both ends of each intron at once. We compare Splam's accuracy to recent state-of-the-art splice site prediction methods, particularly SpliceAI, another method that uses deep neural networks. Our results show that Splam is consistently more accurate than SpliceAI, with an overall accuracy of 96% at predicting human splice junctions. Splam generalizes even to non-human species, including distant ones like the flowering plant Arabidopsis thaliana. Finally, we demonstrate the use of Splam on a novel application: processing the spliced alignments of RNA-seq data to identify and eliminate errors. We show that when used in this manner, Splam yields substantial improvements in the accuracy of downstream transcriptome analysis of both poly(A) and ribo-depleted RNA-seq libraries. Overall, Splam offers a faster and more accurate approach to detecting splice junctions, while also providing a reliable and efficient solution for cleaning up erroneous spliced alignments.
Collapse
Affiliation(s)
- Kuan-Hao Chao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alan Mao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Steven L Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
100
|
Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María M, Adams MS, Balderrama-Gutierrez G, Behera AK, Gonzalez JM, Hunt T, Lagarde J, Liang CE, Li H, Jerryd Meade M, Moraga Amador DA, Prjibelski AD, Birol I, Bostan H, Brooks AM, Hasan Çelik M, Chen Y, Du MR, Felton C, Göke J, Hafezqorani S, Herwig R, Kawaji H, Lee J, Liang Li J, Lienhard M, Mikheenko A, Mulligan D, Ming Nip K, Pertea M, Ritchie ME, Sim AD, Tang AD, Kei Wan Y, Wang C, Wong BY, Yang C, Barnes I, Berry A, Capella S, Dhillon N, Fernandez-Gonzalez JM, Ferrández-Peral L, Garcia-Reyero N, Goetz S, Hernández-Ferrer C, Kondratova L, Liu T, Martinez-Martin A, Menor C, Mestre-Tomás J, Mudge JM, Panayotova NG, Paniagua A, Repchevsky D, Rouchka E, Saint-John B, Sapena E, Sheynkman L, Laird Smith M, Suner MM, Takahashi H, Youngworth IA, Carninci P, Denslow ND, Guigó R, Hunter ME, Tilgner HU, Wold BJ, Vollmers C, Frankish A, Fai Au K, Sheynkman GM, Mortazavi A, Conesa A, Brooks AN. Systematic assessment of long-read RNA-seq methods for transcript identification and quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550582. [PMID: 37546854 PMCID: PMC10402094 DOI: 10.1101/2023.07.25.550582] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.
Collapse
Affiliation(s)
- Francisco J. Pardo-Palacios
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
- These authors contributed equally to this work
| | - Dingjie Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
- These authors contributed equally to this work
| | - Fairlie Reese
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- These authors contributed equally to this work
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Sílvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- These authors contributed equally to this work
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
- These authors contributed equally to this work
| | - Jane E. Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Maite De María
- Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA
- Center for Environmental and Human Toxicology, University of Florida, Gainesville, USA
- These authors contributed equally to this work
| | - Matthew S. Adams
- Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Gabriela Balderrama-Gutierrez
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
- These authors contributed equally to this work
| | - Amit K. Behera
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Jose M. Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
- These authors contributed equally to this work
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Flomics Biotech, Dr Aiguader 88, Barcelona 08003, Spain
- These authors contributed equally to this work
| | - Cindy E. Liang
- Molecular Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, USA
- These authors contributed equally to this work
| | - Haoran Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
- These authors contributed equally to this work
| | - Marcus Jerryd Meade
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
- These authors contributed equally to this work
| | - David A. Moraga Amador
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
- These authors contributed equally to this work
| | - Andrey D. Prjibelski
- Department of Computer Science, University of Helsinki, Helsinki, Finland
- Center for Bioinformatics and Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
- These authors contributed equally to this work
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Hamed Bostan
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Ashley M. Brooks
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Muhammed Hasan Çelik
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Ying Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Mei R,M. Du
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Colette Felton
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Jonathan Göke
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore, Singapore
| | - Saber Hafezqorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Ralf Herwig
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Hideya Kawaji
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Joseph Lee
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Jian Liang Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, USA
| | - Matthias Lienhard
- Department Computational Molecular Biology, Max-Planck-Institute for Molecular Genetics, Berlin, Germany
| | - Alla Mikheenko
- Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK
| | - Dennis Mulligan
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - Mihaela Pertea
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Matthew E. Ritchie
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
- Department of Medical Biology, The University of Melbourne, Parkville, Australia
| | - Andre D. Sim
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Alison D. Tang
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Yuk Kei Wan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Changqing Wang
- Walter and Eliza Hall Institute of Medical Research, Parkville, Australia
| | - Brandon Y. Wong
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, USA
| | - Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, Canada
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Namrita Dhillon
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | | | - Luis Ferrández-Peral
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Natàlia Garcia-Reyero
- Environmental Laboratory, US Army Engineer Research & Development Center, Vicksburg, USA
| | | | | | | | | | | | | | - Jorge Mestre-Tomás
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | - Jonathan M. Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nedka G. Panayotova
- Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, USA
| | - Alejandro Paniagua
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
| | | | - Eric Rouchka
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
| | - Brandon Saint-John
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Enrique Sapena
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, UK
| | - Leon Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
| | - Melissa Laird Smith
- Department of Biochemistry & Molecular Genetics, University of Louisville, Louisville, USA
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Hazuki Takahashi
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
| | | | - Piero Carninci
- Center for Integrative Medical Sciences, Laboratory for Transcriptome Technology, RIKEN, Yokohama, Japan
- Human Technopole, Milano, Italy
| | - Nancy D. Denslow
- Department of Physiological Sciences, College of Veterinary Medicine, University of Florida, Gainesville, USA
- Center for Environmental and Human Toxicology, Department of Physiological Sciences,, University of Florida, Gainesville, USA
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Margaret E. Hunter
- U.S. Geological Survey, Wetland and Aquatic Research Center, Gainesville, USA
| | - Hagen U. Tilgner
- Brain and Mind Research Institute and Center for Neurogenetics, Weill Cornell Medicine, New York City, USA
| | - Barbara J. Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, USA
| | - Christopher Vollmers
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kin Fai Au
- Department of Biomedical Informatics, The Ohio State University, Columbus, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, USA
| | - Gloria M. Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, USA
- Center for Public Health Genomics
- UVA Cancer Center, University of Virginia, Charlottesville, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California, Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California, Irvine, Irvine, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
- Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, USA
| | - Angela N. Brooks
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, USA
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, USA
| |
Collapse
|