1
|
Kashyap SS, Kaur S, Devgan RK, Singh S, Singh J, Kaur M. Impact of 5' Near Gene Variants of Mannose Binding Lectin (MBL2) on Breast Cancer Risk. Biochem Genet 2024:10.1007/s10528-024-10894-3. [PMID: 39060643 DOI: 10.1007/s10528-024-10894-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Accepted: 07/18/2024] [Indexed: 07/28/2024]
Abstract
The immune system plays a bifaceted role in tumour development through modulation of inflammation. MBL binds to damage-associated molecular patterns and induces inflammation through the activation of complement pathway. Dysregulated inflammation plays a major role in breast cancer pathogenesis, thereby suggesting its contribution towards breast cancer risk. Literature asserts single-nucleotide polymorphisms (SNPs) modulating serum MBL levels. Therefore, studying MBL2 SNPs in breast cancer might provide valuable insight in the disease pathogenesis. The present case-control association study aimed to elucidate the association between MBL2 5' near gene SNPs and breast cancer risk. Breast cancer patients were recruited from Government Medical College, G.N.D. Hospital, Amritsar. The age- and gender-matched genetically unrelated healthy individuals, from adjoining regions, with no history of malignancy up to three generations were recruited as controls. The SNPs of MBL2 from the 5' near gene region with putative functional significance were selected based upon the in silico analysis and literature review. The genotypic, allelic and haplotype frequencies for the studied variants were assessed and compared in the study participants by ARMS-PCR and PCR-RFLP. No difference in allelic, genotypic and haplotype frequencies was reported for rs7096206, rs7084554 and rs11003125 in both the participant groups. rs7084554 (CC) was found to confer risk towards hormone receptor-positive breast cancer. An intermediate LD was observed between rs7084554 and rs11003125. The study reports association between MBL2 variant (rs7084554) and hormone receptor-positive breast cancer risk. Further research in this direction might validate the findings.
Collapse
Affiliation(s)
- Shreya Singh Kashyap
- Department of Human Genetics, Guru Nanak Dev University, Amritsar, Punjab, 143005, India
| | - Surmeet Kaur
- Department of Human Genetics, Guru Nanak Dev University, Amritsar, Punjab, 143005, India
| | - Rajiv Kumar Devgan
- Department of Radiotherapy and Oncology, Government Medical College, G.N.D. Hospital, Amritsar, Punjab, India
| | - Sumitoj Singh
- Surgery Unit II, Government Medical College, G.N.D. Hospital, Amritsar, Punjab, 143001, India
| | - Jatinder Singh
- Department of Molecular Biology and Biochemistry, Guru Nanak Dev University, Amritsar, Punjab, 143005, India
| | - Manpreet Kaur
- Department of Human Genetics, Guru Nanak Dev University, Amritsar, Punjab, 143005, India.
| |
Collapse
|
2
|
Kin K, Bhogale S, Zhu L, Thomas D, Bertol J, Zheng WJ, Sinha S, Fakhouri WD. Sequence-to-expression approach to identify etiological non-coding DNA variations in P53 and cMYC-driven diseases. Hum Mol Genet 2024:ddae109. [PMID: 39017605 DOI: 10.1093/hmg/ddae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 06/08/2024] [Accepted: 07/11/2024] [Indexed: 07/18/2024] Open
Abstract
Disease risk prediction based on genomic sequence and transcriptional profile can improve disease screening and prevention. Despite identifying many disease-associated DNA variants, distinguishing deleterious non-coding DNA variations remains poor for most common diseases. In this study, we designed in vitro experiments to uncover the significance of occupancy and competitive binding between P53 and cMYC on common target genes. Analyzing publicly available ChIP-seq data for P53 and cMYC in embryonic stem cells showed that ~344-366 regions are co-occupied, and on average, two cis-overlapping motifs (CisOMs) per region were identified, suggesting that co-occupancy is evolutionarily conserved. Using U2OS and Raji cells untreated and treated with doxorubicin to increase P53 protein level while potentially reducing cMYC level, ChIP-seq analysis illustrated that around 16 to 922 genomic regions were co-occupied by P53 and cMYC, and substitutions of cMYC signals by P53 were detected post doxorubicin treatment. Around 187 expressed genes near co-occupied regions were altered at mRNA level according to RNA-seq data analysis. We utilized a computational motif-matching approach to illustrate that changes in predicted P53 binding affinity in CisOMs of co-occupied elements significantly correlate with alterations in reporter gene expression. We performed a similar analysis using SNPs mapped in CisOMs for P53 and cMYC from ChIP-seq data, and expression of target genes from GTEx portal. We found significant correlation between change in cMYC-motif binding affinity in CisOMs and altered expression. Our study brings us closer to developing a generally applicable approach to filter etiological non-coding variations associated with common diseases.
Collapse
Affiliation(s)
- Katherine Kin
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston, 7500 Cambridge St, Houston, TX 77054, United States
| | - Shounak Bhogale
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, 600 S Mathews Ave, Urbana, IL 61801, United States
| | - Lisha Zhu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St #600, Houston, TX 77030, United States
| | - Derrick Thomas
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston, 7500 Cambridge St, Houston, TX 77054, United States
| | - Jessica Bertol
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston, 7500 Cambridge St, Houston, TX 77054, United States
| | - W Jim Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin St #600, Houston, TX 77030, United States
| | - Saurabh Sinha
- Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, 600 S Mathews Ave, Urbana, IL 61801, United States
- Wallace H. Coulter Department of Biomedical Engineering at Georgia Tech and Emory University, Georgia Institute of Technology, North Avenue Atlanta, GA 30332, United States
| | - Walid D Fakhouri
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston, 7500 Cambridge St, Houston, TX 77054, United States
- Department of Pediatrics, McGovern Medical School, University of Texas Health Science Center at Houston, 6431 Fannin St, Houston, TX 77030, United States
| |
Collapse
|
3
|
Giovannetti A, Lazzari S, Mangoni M, Traversa A, Mazza T, Parisi C, Caputo V. Exploring non-coding genetic variability in ACE2: Functional annotation and in vitro validation of regulatory variants. Gene 2024; 915:148422. [PMID: 38570058 DOI: 10.1016/j.gene.2024.148422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 02/23/2024] [Accepted: 03/13/2024] [Indexed: 04/05/2024]
Abstract
The surge in human whole-genome sequencing data has facilitated the study of non-coding region variations, yet understanding their biological significance remains a challenge. We used a computational workflow to assess the regulatory potential of non-coding variants, with a particular focus on the Angiotensin Converting Enzyme 2 (ACE2) gene. This gene is crucial in physiological processes and serves as the entry point for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus causing coronavirus disease 19 (COVID-19). In our analysis, using data from the gnomAD population database and functional annotation, we identified 17 significant Single Nucleotide Variants (SNVs) in ACE2, particularly in its enhancers, promoters, and 3' untranslated regions (UTRs). We found preliminary evidence supporting the regulatory impact of some of these variants on ACE2 expression. Our detailed examination of two SNVs, rs147718775 and rs140394675, in the ACE2 promoter revealed that these co-occurring SNVs, when mutated, significantly enhance promoter activity, suggesting a possible increase in specific ACE2 isoform expression. This method proves effective in identifying and interpreting impactful non-coding variants, aiding in further studies and enhancing understanding of molecular bases of monogenic and complex traits.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Clinical Genomics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| | - Manuel Mangoni
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Alice Traversa
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy; Dipartimento di Scienze della Vita, della Salute e delle Professioni Sanitarie, Università degli Studi "Link Campus University", Via del Casale di San Pio V 44, 00165 Roma, Italy.
| | - Tommaso Mazza
- Bioinformatics Laboratory, Fondazione IRCCS Casa Sollievo della Sofferenza, Viale Cappuccini, snc, 71013 S. Giovanni Rotondo (FG), Italy.
| | - Chiara Parisi
- Institute of Biochemistry and Cell Biology, CNR-National Research Council, Via Ercole Ramarini, 32, 00015 Monterotondo Scalo (RM), Italy.
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Viale Regina Elena, 324, 00161 Rome, Italy.
| |
Collapse
|
4
|
Jin W, Xia Y, Thela SR, Liu Y, Chen L. In silico generation and augmentation of regulatory variants from massively parallel reporter assay using conditional variational autoencoder. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600715. [PMID: 38979263 PMCID: PMC11230389 DOI: 10.1101/2024.06.25.600715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. Massively parallel reporter assays (MPRAs), which are an in vitro high-throughput method, can simultaneously test thousands of variants by evaluating the existence of allele specific regulatory activity. Nevertheless, the identified labelled variants by MPRAs, which shows differential allelic regulatory effects on the gene expression are usually limited to the scale of hundreds, limiting their potential to be used as the training set for achieving a robust genome-wide prediction. To address the limitation, we propose a deep generative model, MpraVAE, to in silico generate and augment the training sample size of labelled variants. By benchmarking on several MPRA datasets, we demonstrate that MpraVAE significantly improves the prediction performance for MPRA regulatory variants compared to the baseline method, conventional data augmentation approaches as well as existing variant scoring methods. Taking autoimmune diseases as one example, we apply MpraVAE to perform a genome-wide prediction of regulatory variants and find that predicted regulatory variants are more enriched than background variants in enhancers, active histone marks, open chromatin regions in immune-related cell types, and chromatin states associated with promoter, enhancer activity and binding sites of cMyC and Pol II that regulate gene expression. Importantly, predicted regulatory variants are found to link immune-related genes by leveraging chromatin loop and accessible chromatin, demonstrating the importance of MpraVAE in genetic and gene discovery for complex traits.
Collapse
Affiliation(s)
- Weijia Jin
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yi Xia
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Sai Ritesh Thela
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603, USA
| |
Collapse
|
5
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
6
|
Iñiguez-Muñoz S, Llinàs-Arias P, Ensenyat-Mendez M, Bedoya-López AF, Orozco JIJ, Cortés J, Roy A, Forsberg-Nilsson K, DiNome ML, Marzese DM. Hidden secrets of the cancer genome: unlocking the impact of non-coding mutations in gene regulatory elements. Cell Mol Life Sci 2024; 81:274. [PMID: 38902506 DOI: 10.1007/s00018-024-05314-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 12/07/2023] [Accepted: 06/06/2024] [Indexed: 06/22/2024]
Abstract
Discoveries in the field of genomics have revealed that non-coding genomic regions are not merely "junk DNA", but rather comprise critical elements involved in gene expression. These gene regulatory elements (GREs) include enhancers, insulators, silencers, and gene promoters. Notably, new evidence shows how mutations within these regions substantially influence gene expression programs, especially in the context of cancer. Advances in high-throughput sequencing technologies have accelerated the identification of somatic and germline single nucleotide mutations in non-coding genomic regions. This review provides an overview of somatic and germline non-coding single nucleotide alterations affecting transcription factor binding sites in GREs, specifically involved in cancer biology. It also summarizes the technologies available for exploring GREs and the challenges associated with studying and characterizing non-coding single nucleotide mutations. Understanding the role of GRE alterations in cancer is essential for improving diagnostic and prognostic capabilities in the precision medicine era, leading to enhanced patient-centered clinical outcomes.
Collapse
Affiliation(s)
- Sandra Iñiguez-Muñoz
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Pere Llinàs-Arias
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Miquel Ensenyat-Mendez
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Andrés F Bedoya-López
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain
| | - Javier I J Orozco
- Saint John's Cancer Institute, Providence Saint John's Health Center, Santa Monica, CA, USA
| | - Javier Cortés
- International Breast Cancer Center (IBCC), Pangaea Oncology, Quiron Group, 08017, Barcelona, Spain
- Medica Scientia Innovation Research SL (MEDSIR), 08018, Barcelona, Spain
- Faculty of Biomedical and Health Sciences, Department of Medicine, Universidad Europea de Madrid, 28670, Madrid, Spain
| | - Ananya Roy
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Karin Forsberg-Nilsson
- Department of Immunology, Genetics and Pathology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- University of Nottingham Biodiscovery Institute, Nottingham, UK
| | - Maggie L DiNome
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA
| | - Diego M Marzese
- Cancer Epigenetics Laboratory at the Cancer Cell Biology Group, Institut d'Investigació Sanitària Illes Balears (IdISBa), Palma, Spain.
- Department of Surgery, Duke University School of Medicine, Durham, NC, USA.
| |
Collapse
|
7
|
Wong EWP, Sahin M, Yang R, Lee U, Zhan YA, Misra R, Tomas F, Alomran N, Polyzos A, Lee CJ, Trieu T, Fundichely AM, Wiesner T, Rosowicz A, Cheng S, Liu C, Lallo M, Merghoub T, Hamard PJ, Koche R, Khurana E, Apostolou E, Zheng D, Chen Y, Leslie CS, Chi P. TAD hierarchy restricts poised LTR activation and loss of TAD hierarchy promotes LTR co-option in cancer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596845. [PMID: 38895201 PMCID: PMC11185511 DOI: 10.1101/2024.05.31.596845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Transposable elements (TEs) are abundant in the human genome, and they provide the sources for genetic and functional diversity. The regulation of TEs expression and their functional consequences in physiological conditions and cancer development remain to be fully elucidated. Previous studies suggested TEs are repressed by DNA methylation and chromatin modifications. The effect of 3D chromatin topology on TE regulation remains elusive. Here, by integrating transcriptome and 3D genome architecture studies, we showed that haploinsufficient loss of NIPBL selectively activates alternative promoters at the long terminal repeats (LTRs) of the TE subclasses. This activation occurs through the reorganization of topologically associating domain (TAD) hierarchical structures and recruitment of proximal enhancers. These observations indicate that TAD hierarchy restricts transcriptional activation of LTRs that already possess open chromatin features. In cancer, perturbation of the hierarchical chromatin topology can lead to co-option of LTRs as functional alternative promoters in a context-dependent manner and drive aberrant transcriptional activation of novel oncogenes and other divergent transcripts. These data uncovered a new layer of regulatory mechanism of TE expression beyond DNA and chromatin modification in human genome. They also posit the TAD hierarchy dysregulation as a novel mechanism for alternative promoter-mediated oncogene activation and transcriptional diversity in cancer, which may be exploited therapeutically.
Collapse
|
8
|
Kim Y, Jeong M, Koh IG, Kim C, Lee H, Kim JH, Yurko R, Kim IB, Park J, Werling DM, Sanders SJ, An JY. CWAS-Plus: estimating category-wide association of rare noncoding variation from whole-genome sequencing data with cell-type-specific functional data. Brief Bioinform 2024; 25:bbae323. [PMID: 38966948 PMCID: PMC11224609 DOI: 10.1093/bib/bbae323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 06/13/2024] [Accepted: 06/18/2024] [Indexed: 07/06/2024] Open
Abstract
Variants in cis-regulatory elements link the noncoding genome to human pathology; however, detailed analytic tools for understanding the association between cell-level brain pathology and noncoding variants are lacking. CWAS-Plus, adapted from a Python package for category-wide association testing (CWAS), enhances noncoding variant analysis by integrating both whole-genome sequencing (WGS) and user-provided functional data. With simplified parameter settings and an efficient multiple testing correction method, CWAS-Plus conducts the CWAS workflow 50 times faster than CWAS, making it more accessible and user-friendly for researchers. Here, we used a single-nuclei assay for transposase-accessible chromatin with sequencing to facilitate CWAS-guided noncoding variant analysis at cell-type-specific enhancers and promoters. Examining autism spectrum disorder WGS data (n = 7280), CWAS-Plus identified noncoding de novo variant associations in transcription factor binding sites within conserved loci. Independently, in Alzheimer's disease WGS data (n = 1087), CWAS-Plus detected rare noncoding variant associations in microglia-specific regulatory elements. These findings highlight CWAS-Plus's utility in genomic disorders and scalability for processing large-scale WGS data and in multiple-testing corrections. CWAS-Plus and its user manual are available at https://github.com/joonan-lab/cwas/ and https://cwas-plus.readthedocs.io/en/latest/, respectively.
Collapse
Affiliation(s)
- Yujin Kim
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Minwoo Jeong
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - In Gyeong Koh
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Chanhee Kim
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Hyeji Lee
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Jae Hyun Kim
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| | - Ronald Yurko
- Department of Statistics and Data Science, Carnegie Mellon University, 5000 Forbes Avenue, Squirrel Hill North, Pittsburgh, PA 15213, United States
| | - Il Bin Kim
- Department of Psychiatry, CHA Gangnam Medical Center, CHA University School of Medicine, 566 Nonhyon-ro, Gangnam-gu, Seoul 06135, Republic of Korea
| | - Jeongbin Park
- School of Biomedical Convergence Engineering, Pusan National University, 49 Busandaehak-ro, Mulgeum-eup, Yangsan-si, Gyeongsangnam-do, 50612, Republic of Korea
| | - Donna M Werling
- Laboratory of Genetics, University of Wisconsin-Madison, 425-g Henry Mall, Madison, WI 53706, Unite States
| | - Stephan J Sanders
- Department of Paediatrics, Institute of Developmental and Regenerative Medicine, University of Oxford, Old Road Campus, Roosevelt Dr, Headington, Oxford OX3 7TY, United Kingdom
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, 1651 4th Street, San Francisco, CA 94158, United States
| | - Joon-Yong An
- Department of Integrated Biomedical and Life Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, 145 Anam-ro, Seongbuk-ku, Seoul 02841, Republic of Korea
| |
Collapse
|
9
|
Li G, Wu J, Wang X. Predicting functional UTR variants by integrating region-specific features. Brief Bioinform 2024; 25:bbae248. [PMID: 38783704 PMCID: PMC11116830 DOI: 10.1093/bib/bbae248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/30/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
The untranslated region (UTR) of messenger ribonucleic acid (mRNA), including the 5'UTR and 3'UTR, plays a critical role in regulating gene expression and translation. Variants within the UTR can lead to changes associated with human traits and diseases; however, computational prediction of UTR variant effect is challenging. Current noncoding variant prediction mainly focuses on the promoters and enhancers, neglecting the unique sequence of the UTR and thereby limiting their predictive accuracy. In this study, using consolidated datasets of UTR variants from disease databases and large-scale experimental data, we systematically analyzed more than 50 region-specific features of UTR, including functional elements, secondary structure, sequence composition and site conservation. Our analysis reveals that certain features, such as C/G-related sequence composition in 5'UTR and A/T-related sequence composition in 3'UTR, effectively differentiate between nonfunctional and functional variant sets, unveiling potential sequence determinants of functional UTR variants. Leveraging these insights, we developed two classification models to predict functional UTR variants using machine learning, achieving an area under the curve (AUC) value of 0.94 for 5'UTR and 0.85 for 3'UTR, outperforming all existing methods. Our models will be valuable for enhancing clinical interpretation of genetic variants, facilitating the prediction and management of disease risk.
Collapse
Affiliation(s)
- Guangyu Li
- State Key Laboratory of Common Mechanism Research for Major Diseases; Center for bioinformatics, National Infrastructures for Translational Medicine, Institute of Clinical Medicine and Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 1 Shuai Fu Yuan, Dongcheng District, Beijing 100005, China
| | - Jiayu Wu
- State Key Laboratory of Common Mechanism Research for Major Diseases; Center for bioinformatics, National Infrastructures for Translational Medicine, Institute of Clinical Medicine and Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 1 Shuai Fu Yuan, Dongcheng District, Beijing 100005, China
| | - Xiaoyue Wang
- State Key Laboratory of Common Mechanism Research for Major Diseases; Center for bioinformatics, National Infrastructures for Translational Medicine, Institute of Clinical Medicine and Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, 1 Shuai Fu Yuan, Dongcheng District, Beijing 100005, China
| |
Collapse
|
10
|
Muhammad SS, Shoaib M, Pervez MT. An Integrated Framework for Analysis and Prediction of Impact of Single Nucleotide Polymorphism Associated with Human Diseases. Evol Bioinform Online 2024; 20:11769343241249916. [PMID: 38737438 PMCID: PMC11088291 DOI: 10.1177/11769343241249916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 04/10/2024] [Indexed: 05/14/2024] Open
Abstract
Single nucleotide polymorphisms are most common type of genetic variation in human genome. Analyzing genetic variants can help us better understand the genetic basis of diseases and develop predictive models which are useful to identify individuals who are at increased risk for certain diseases. Several SNP analysis tools have already been developed. For running these tools, the user needs to collect data from various databases. Secondly, often researchers have to use multiple variant analysis tools for cross validating their results and increase confidence in their findings. Extracting data from multiple databases and running multiple tools at a time, increases complexity and time required for analysis. There are some web-based tools that integrate multiple genetic variant databases and provide variant annotations for a few tools. These approaches have some limitations such as retrieving annotation information, filtering common pathogenic variants. The proposed web-based tool, namely IPSNP: An Integrated Platform for Predicting Impact of SNPs is written in Django which is a python-based framework. It uses RESTful API of MyVariant.info to extract annotation information of variants associated with a given gene, rsID, HGVS format variants specified in a VCF file for 29 tools. The results are in the form of a CSV file of predictions (1) derived from the consensus decision, (2) a file having annotations for the variants associated with the given gene, (3) a file showing variants declared as pathogenic commonly by the selected tools, and (4) a CSV file containing chromosome coordinates based on GRCh37 and GRCh38 genome assemblies, rsIDs and proteomic data, so that users may use tools of their choice and avoiding manual parameter collection for each tool. IPSNP is a valuable resource for researchers and clinicians and it can help to save time and effort in discovering the novel disease-associated variants and the development of personalized treatments.
Collapse
Affiliation(s)
- Syed Shah Muhammad
- Department of Computer Science, University of Engineering & Technology, Lahore, Punjab, Pakistan
| | - Muhammad Shoaib
- Department of Computer Science, University of Engineering & Technology, Lahore, Punjab, Pakistan
| | - Muhammad Tariq Pervez
- Department of Biological Sciences, Virtual University of Pakistan, Lahore, Punjab, Pakistan
| |
Collapse
|
11
|
Rabi LT, Valente DZ, de Souza Teixeira E, Peres KC, de Oliveira Almeida M, Bufalo NE, Ward LS. Potential new cancer biomarkers revealed by quantum chemistry associated with bioinformatics in the study of selectin polymorphisms. Heliyon 2024; 10:e28830. [PMID: 38586333 PMCID: PMC10998122 DOI: 10.1016/j.heliyon.2024.e28830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 03/22/2024] [Accepted: 03/26/2024] [Indexed: 04/09/2024] Open
Abstract
Understanding the complex mechanisms involved in diseases caused by or related to important genetic variants has led to the development of clinically useful biomarkers. However, the increasing number of described variants makes it difficult to identify variants worthy of investigation, and poses challenges to their validation. We combined publicly available datasets and open source robust bioinformatics tools with molecular quantum chemistry methods to investigate the involvement of selectins, important molecules in the cell adhesion process that play a fundamental role in the cancer metastasis process. We applied this strategy to investigate single nucleotide variants (SNPs) in the intronic and UTR regions and missense SNPs with amino acid changes in the SELL, SELP, SELE, and SELPLG genes. We then focused on thyroid cancer, seeking these SNPs potential to identify biomarkers for susceptibility, diagnosis, prognosis, and therapeutic targets. We demonstrated that SELL gene polymorphisms rs2229569, rs1131498, rs4987360, rs4987301 and rs2205849; SELE gene polymorphisms rs1534904 and rs5368; rs3917777, rs2205894 and rs2205893 of SELP gene; and rs7138370, rs7300972 and rs2228315 variants of SELPLG gene may produce important alterations in the DNA structure and consequent changes in the morphology and function of the corresponding proteins. In conclusion, we developed a strategy that may save valuable time and resources in future investigations, as we were able to provide a solid foundation for the selection of selectin gene variants that may become important biomarkers and deserve further investigation in cancer patients. Large-scale clinical studies in different ethnic populations and laboratory experiments are needed to validate our results.
Collapse
Affiliation(s)
- Larissa Teodoro Rabi
- Laboratory of Cancer Molecular Genetics, Faculty of Medical Sciences, State University of Campinas (UNI-CAMP), Campinas, SP, Brazil
- .Department of Biomedicine, Nossa Senhora do Patrocínio University Center (CEUNSP), Itu, SP, Brazil
- Institute of Health Sciences, Paulista University (UNIP), Campinas, SP, Brazil
| | - Davi Zanoni Valente
- Laboratory of Cancer Molecular Genetics, Faculty of Medical Sciences, State University of Campinas (UNI-CAMP), Campinas, SP, Brazil
| | - Elisangela de Souza Teixeira
- Laboratory of Cancer Molecular Genetics, Faculty of Medical Sciences, State University of Campinas (UNI-CAMP), Campinas, SP, Brazil
| | - Karina Colombera Peres
- Laboratory of Cancer Molecular Genetics, Faculty of Medical Sciences, State University of Campinas (UNI-CAMP), Campinas, SP, Brazil
- Department of Medicine, Max Planck University Center, Campinas, SP, Brazil
| | | | - Natassia Elena Bufalo
- Laboratory of Cancer Molecular Genetics, Faculty of Medical Sciences, State University of Campinas (UNI-CAMP), Campinas, SP, Brazil
- Department of Medicine, Max Planck University Center, Campinas, SP, Brazil
- Department of Medicine, São Leopoldo Mandic and Research Center, Campinas, SP, Brazil
| | - Laura Sterian Ward
- Laboratory of Cancer Molecular Genetics, Faculty of Medical Sciences, State University of Campinas (UNI-CAMP), Campinas, SP, Brazil
| |
Collapse
|
12
|
Kim Y, Jeong M, Koh IG, Kim C, Lee H, Kim JH, Yurko R, Kim IB, Park J, Werling DM, Sanders SJ, An JY. CWAS-Plus: Estimating category-wide association of rare noncoding variation from whole-genome sequencing data with cell-type-specific functional data. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305828. [PMID: 38699372 PMCID: PMC11065022 DOI: 10.1101/2024.04.15.24305828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Variants in cis-regulatory elements link the noncoding genome to human brain pathology; however, detailed analytic tools for understanding the association between cell-level brain pathology and noncoding variants are lacking. CWAS-Plus, adapted from a Python package for category-wide association testing (CWAS) employs both whole-genome sequencing and user-provided functional data to enhance noncoding variant analysis, with a faster and more efficient execution of the CWAS workflow. Here, we used single-nuclei assay for transposase-accessible chromatin with sequencing to facilitate CWAS-guided noncoding variant analysis at cell-type specific enhancers and promoters. Examining autism spectrum disorder whole-genome sequencing data (n = 7,280), CWAS-Plus identified noncoding de novo variant associations in transcription factor binding sites within conserved loci. Independently, in Alzheimer's disease whole-genome sequencing data (n = 1,087), CWAS-Plus detected rare noncoding variant associations in microglia-specific regulatory elements. These findings highlight CWAS-Plus's utility in genomic disorders and scalability for processing large-scale whole-genome sequencing data and in multiple-testing corrections. CWAS-Plus and its user manual are available at https://github.com/joonan-lab/cwas/ and https://cwas-plus.readthedocs.io/en/latest/, respectively.
Collapse
Affiliation(s)
- Yujin Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Minwoo Jeong
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, 02841, Republic of Korea
| | - In Gyeong Koh
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Chanhee Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Hyeji Lee
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Jae Hyun Kim
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
| | - Ronald Yurko
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
| | - Il Bin Kim
- Department of Psychiatry, CHA Gangnam Medical Center, CHA University School of Medicine, Seoul, 06135, Republic of Korea
| | - Jeongbin Park
- School of Biomedical Convergence Engineering, Pusan National University, Busan, 50612, Republic of Korea
| | - Donna M. Werling
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Stephan J. Sanders
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, OX3 7TY, UK
- Department of Psychiatry and Behavioral Sciences, UCSF Weill Institute for Neurosciences, University of California, San Francisco, CA 94158, USA
| | - Joon-Yong An
- Department of Integrated Biomedical and Life Science, Korea University, Seoul, 02841, Republic of Korea
- L-HOPE Program for Community-Based Total Learning Health Systems, Korea University, Seoul, 02841, Republic of Korea
- School of Biosystem and Biomedical Science, College of Health Science, Korea University, Seoul, 02841, Republic of Korea
| |
Collapse
|
13
|
Nizomov J, Jin W, Xia Y, Liu Y, Li Z, Chen L. MPRAVarDB: an online database and web server for exploring regulatory effects of genetic variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.02.587790. [PMID: 38617248 PMCID: PMC11014600 DOI: 10.1101/2024.04.02.587790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Massively parallel reporter assay (MPRA) is an important technology to evaluate the impact of genetic variants on gene regulation. Here, we present MPRAVarDB, an online database and web server, for exploring regulatory effects of genetic variants. MPRAVarDB harbors 18 MPRA experiments designed to assess the regulatory effects of genetic variants associated with GWAS loci, eQTLs and various genomic features, resulting in a total of 242,818 variants tested across more than 30 cell lines and 30 human diseases or traits. MPRAVarDB empowers the query of MPRA variants by genomic region, disease and cell line or by any combination of these query terms. Notably, MPRAVarDB offers a suite of pretrained machine learning models tailored to the specific disease and cell line, facilitating the genome-wide prediction of regulatory variants. MPRAVarDB is friendly to use, and users only need a few clicks to receive query and prediction results.
Collapse
Affiliation(s)
- Javlon Nizomov
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| | - Weijia Jin
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| | - Yi Xia
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202
| | - Zhigang Li
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL, 32603
| |
Collapse
|
14
|
Nakamura T, Ueda J, Mizuno S, Honda K, Kazuno AA, Yamamoto H, Hara T, Takata A. Topologically associating domains define the impact of de novo promoter variants on autism spectrum disorder risk. CELL GENOMICS 2024; 4:100488. [PMID: 38280381 PMCID: PMC10879036 DOI: 10.1016/j.xgen.2024.100488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/24/2023] [Accepted: 01/02/2024] [Indexed: 01/29/2024]
Abstract
Whole-genome sequencing (WGS) studies of autism spectrum disorder (ASD) have demonstrated the roles of rare promoter de novo variants (DNVs). However, most promoter DNVs in ASD are not located immediately upstream of known ASD genes. In this study analyzing WGS data of 5,044 ASD probands, 4,095 unaffected siblings, and their parents, we show that promoter DNVs within topologically associating domains (TADs) containing ASD genes are significantly and specifically associated with ASD. An analysis considering TADs as functional units identified specific TADs enriched for promoter DNVs in ASD and indicated that common variants in these regions also confer ASD heritability. Experimental validation using human induced pluripotent stem cells (iPSCs) showed that likely deleterious promoter DNVs in ASD can influence multiple genes within the same TAD, resulting in overall dysregulation of ASD-associated genes. These results highlight the importance of TADs and gene-regulatory mechanisms in better understanding the genetic architecture of ASD.
Collapse
Affiliation(s)
- Takumi Nakamura
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Junko Ueda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Shota Mizuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Kurara Honda
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - An-A Kazuno
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan
| | - Hirona Yamamoto
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8654, Japan
| | - Tomonori Hara
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Department of Organ Anatomy, Tohoku University Graduate School of Medicine, 2-1 Seiryo-machi, Aoba-ku, Sendai, Miyagi 980-8575, Japan
| | - Atsushi Takata
- Laboratory for Molecular Pathology of Psychiatric Disorders, RIKEN Center for Brain Science, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan; Research Institute for Diseases of Old Age, Juntendo University Graduate School of Medicine, 2-1-1 Hongo, Bunkyo-ku, Tokyo 113-8421, Japan.
| |
Collapse
|
15
|
Brown EA, Kales S, Boyle MJ, Vitti J, Kotliar D, Schaffner S, Tewhey R, Sabeti PC. Three linked variants have opposing regulatory effects on isovaleryl-CoA dehydrogenase gene expression. Hum Mol Genet 2024; 33:270-283. [PMID: 37930192 PMCID: PMC10800014 DOI: 10.1093/hmg/ddad177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 10/03/2023] [Accepted: 10/09/2023] [Indexed: 11/07/2023] Open
Abstract
While genome-wide association studies (GWAS) and positive selection scans identify genomic loci driving human phenotypic diversity, functional validation is required to discover the variant(s) responsible. We dissected the IVD gene locus-which encodes the isovaleryl-CoA dehydrogenase enzyme-implicated by selection statistics, multiple GWAS, and clinical genetics as important to function and fitness. We combined luciferase assays, CRISPR/Cas9 genome-editing, massively parallel reporter assays (MPRA), and a deletion tiling MPRA strategy across regulatory loci. We identified three regulatory variants, including an indel, that may underpin GWAS signals for pulmonary fibrosis and testosterone, and that are linked on a positively selected haplotype in the Japanese population. These regulatory variants exhibit synergistic and opposing effects on IVD expression experimentally. Alleles at these variants lie on a haplotype tagged by the variant most strongly associated with IVD expression and metabolites, but with no functional evidence itself. This work demonstrates how comprehensive functional investigation and multiple technologies are needed to discover the true genetic drivers of phenotypic diversity.
Collapse
Affiliation(s)
- Elizabeth A Brown
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
| | - Susan Kales
- The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, United States
| | - Michael James Boyle
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
| | - Joseph Vitti
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
| | - Dylan Kotliar
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
| | - Steve Schaffner
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
| | - Ryan Tewhey
- The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, United States
| | - Pardis C Sabeti
- The Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
- Broad Institute of MIT and Harvard, 75 Ames Street, Cambridge, MA 02142, United States
- Howard Hughes Medical Institute, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
| |
Collapse
|
16
|
Wang Z, Zhao G, Zhu Z, Wang Y, Xiang X, Zhang S, Luo T, Zhou Q, Qiu J, Tang B, Xia K, Li B, Li J. VarCards2: an integrated genetic and clinical database for ACMG-AMP variant-interpretation guidelines in the human whole genome. Nucleic Acids Res 2024; 52:D1478-D1489. [PMID: 37956311 PMCID: PMC10767961 DOI: 10.1093/nar/gkad1061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/21/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
VarCards, an online database, combines comprehensive variant- and gene-level annotation data to streamline genetic counselling for coding variants. Recognising the increasing clinical relevance of non-coding variations, there has been an accelerated development of bioinformatics tools dedicated to interpreting non-coding variations, including single-nucleotide variants and copy number variations. Regrettably, most tools remain as either locally installed databases or command-line tools dispersed across diverse online platforms. Such a landscape poses inconveniences and challenges for genetic counsellors seeking to utilise these resources without advanced bioinformatics expertise. Consequently, we developed VarCards2, which incorporates nearly nine billion artificially generated single-nucleotide variants (including those from mitochondrial DNA) and compiles vital annotation information for genetic counselling based on ACMG-AMP variant-interpretation guidelines. These annotations include (I) functional effects; (II) minor allele frequencies; (III) comprehensive function and pathogenicity predictions covering all potential variants, such as non-synonymous substitutions, non-canonical splicing variants, and non-coding variations and (IV) gene-level information. Furthermore, VarCards2 incorporates 368 820 266 documented short insertions and deletions and 2 773 555 documented copy number variations, complemented by their corresponding annotation and prediction tools. In conclusion, VarCards2, by integrating over 150 variant- and gene-level annotation sources, significantly enhances the efficiency of genetic counselling and can be freely accessed at http://www.genemed.tech/varcards2/.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Guihu Zhao
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Zhaopo Zhu
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Yijing Wang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Xudong Xiang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Shiyu Zhang
- Xiangya School of Medicine, Central South University, Changsha, Hunan 410013, China
| | - Tengfei Luo
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Qiao Zhou
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jian Qiu
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Beisha Tang
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, & Multi-Omics Research Center for Brain Disorders, The First Affiliated Hospital, University of South China, Hengyang, Hunan, China
| | - Kun Xia
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Bin Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Jinchen Li
- National Clinical Research Center for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Center for Medical Genetics & Hunan Key Laboratory, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
- Bioinformatics Center, Furong Laboratory & Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| |
Collapse
|
17
|
Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, Castro-Mondragon J, Ferenc K, Kumar V, Lemma RB, Lucas J, Chèneby J, Baranasic D, Khan A, Fornes O, Gundersen S, Johansen M, Hovig E, Lenhard B, Sandelin A, Wasserman W, Parcy F, Mathelier A. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2024; 52:D174-D182. [PMID: 37962376 PMCID: PMC10767809 DOI: 10.1093/nar/gkad1059] [Citation(s) in RCA: 40] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/31/2023] [Indexed: 11/15/2023] Open
Abstract
JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Rafael Riudavets-Puig
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Romain Blanc-Mathieu
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Katalin Ferenc
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Vipin Kumar
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Roza Berhanu Lemma
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Jérémy Lucas
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Jeanne Chèneby
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Damir Baranasic
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta, 10000 Zagreb, Croatia
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Sveinung Gundersen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Morten Johansen
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, DK2200 Copenhagen N, Denmark
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - François Parcy
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrs, F-38054, Grenoble, France
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
18
|
Hariprakash JM, Salviato E, La Mastra F, Sebestyén E, Tagliaferri I, Silva RS, Lucini F, Farina L, Cinquanta M, Rancati I, Riboni M, Minardi SP, Roz L, Gorini F, Lanzuolo C, Casola S, Ferrari F. Leveraging Tissue-Specific Enhancer-Target Gene Regulatory Networks Identifies Enhancer Somatic Mutations That Functionally Impact Lung Cancer. Cancer Res 2024; 84:133-153. [PMID: 37855660 PMCID: PMC10758689 DOI: 10.1158/0008-5472.can-23-1129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/29/2023] [Accepted: 10/17/2023] [Indexed: 10/20/2023]
Abstract
Enhancers are noncoding regulatory DNA regions that modulate the transcription of target genes, often over large distances along with the genomic sequence. Enhancer alterations have been associated with various pathological conditions, including cancer. However, the identification and characterization of somatic mutations in noncoding regulatory regions with a functional effect on tumorigenesis and prognosis remain a major challenge. Here, we present a strategy for detecting and characterizing enhancer mutations in a genome-wide analysis of patient cohorts, across three lung cancer subtypes. Lung tissue-specific enhancers were defined by integrating experimental data and public epigenomic profiles, and the genome-wide enhancer-target gene regulatory network of lung cells was constructed by integrating chromatin three-dimensional architecture data. Lung cancers possessed a similar mutation burden at tissue-specific enhancers and exons but with differences in their mutation signatures. Functionally relevant alterations were prioritized on the basis of the pathway-level integration of the effect of a mutation and the frequency of mutations on individual enhancers. The genes enriched for mutated enhancers converged on the regulation of key biological processes and pathways relevant to tumor biology. Recurrent mutations in individual enhancers also affected the expression of target genes, with potential relevance for patient prognosis. Together, these findings show that noncoding regulatory mutations have a potential relevance for cancer pathogenesis and can be exploited for patient classification. SIGNIFICANCE Mapping enhancer-target gene regulatory interactions and analyzing enhancer mutations at the level of their target genes and pathways reveal convergence of recurrent enhancer mutations on biological processes involved in tumorigenesis and prognosis.
Collapse
Affiliation(s)
| | - Elisa Salviato
- IFOM-ETS, the AIRC Institute of Molecular Oncology, Milan, Italy
| | | | - Endre Sebestyén
- IFOM-ETS, the AIRC Institute of Molecular Oncology, Milan, Italy
| | | | | | - Federica Lucini
- IFOM-ETS, the AIRC Institute of Molecular Oncology, Milan, Italy
| | - Lorenzo Farina
- IFOM-ETS, the AIRC Institute of Molecular Oncology, Milan, Italy
| | | | - Ilaria Rancati
- IFOM-ETS, the AIRC Institute of Molecular Oncology, Milan, Italy
| | | | | | - Luca Roz
- Fondazione IRCCS—Istituto Nazionale Tumori, Milan, Italy
| | - Francesca Gorini
- INGM, National Institute of Molecular Genetics “Romeo ed Enrica Invernizzi,” Milan, Italy
| | - Chiara Lanzuolo
- INGM, National Institute of Molecular Genetics “Romeo ed Enrica Invernizzi,” Milan, Italy
- Institute of Biomedical Technologies, National Research Council (ITB-CNR), Segrate, Italy
| | - Stefano Casola
- IFOM-ETS, the AIRC Institute of Molecular Oncology, Milan, Italy
| | - Francesco Ferrari
- IFOM-ETS, the AIRC Institute of Molecular Oncology, Milan, Italy
- Institute of Molecular Genetics “Luigi Luca Cavalli-Sforza,” National Research Council (IGM-CNR), Pavia, Italy
| |
Collapse
|
19
|
Godoy PM, Oyedeji A, Mudd JL, Morikis VA, Zarov AP, Longmore GD, Fields RC, Kaufman CK. Functional analysis of recurrent CDC20 promoter variants in human melanoma. Commun Biol 2023; 6:1216. [PMID: 38030698 PMCID: PMC10686982 DOI: 10.1038/s42003-023-05526-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023] Open
Abstract
Small nucleotide variants in non-coding regions of the genome can alter transcriptional regulation, leading to changes in gene expression which can activate oncogenic gene regulatory networks. Melanoma is heavily burdened by non-coding variants, representing over 99% of total genetic variation, including the well-characterized TERT promoter mutation. However, the compendium of regulatory non-coding variants is likely still functionally under-characterized. We developed a pipeline to identify hotspots, i.e. recurrently mutated regions, in melanoma containing putatively functional non-coding somatic variants that are located within predicted melanoma-specific regulatory regions. We identified hundreds of statistically significant hotspots, including the hotspot containing the TERT promoter variants, and focused on a hotspot in the promoter of CDC20. We found that variants in the promoter of CDC20, which putatively disrupt an ETS motif, lead to lower transcriptional activity in reporter assays. Using CRISPR/Cas9, we generated an indel in the CDC20 promoter in human A375 melanoma cell lines and observed decreased expression of CDC20, changes in migration capabilities, increased growth of xenografts, and an altered transcriptional state previously associated with a more proliferative and less migratory state. Overall, our analysis prioritized several recurrent functional non-coding variants that, through downregulation of CDC20, led to perturbation of key melanoma phenotypes.
Collapse
Affiliation(s)
- Paula M Godoy
- Division of Medical Oncology, Department of Medicine and Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Abimbola Oyedeji
- Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA
- Siteman Cancer Center, Washington University in Saint Louis, St. Louis, MO, USA
| | - Jacqueline L Mudd
- Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA
- Siteman Cancer Center, Washington University in Saint Louis, St. Louis, MO, USA
| | - Vasilios A Morikis
- Departments of Medicine (Oncology) and Cell Biology and Physiology and the ICCE Institute, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Anna P Zarov
- Division of Medical Oncology, Department of Medicine and Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Gregory D Longmore
- Siteman Cancer Center, Washington University in Saint Louis, St. Louis, MO, USA
- Departments of Medicine (Oncology) and Cell Biology and Physiology and the ICCE Institute, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Ryan C Fields
- Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA
- Siteman Cancer Center, Washington University in Saint Louis, St. Louis, MO, USA
| | - Charles K Kaufman
- Division of Medical Oncology, Department of Medicine and Department of Developmental Biology, Washington University School of Medicine, St. Louis, MO, USA.
- Siteman Cancer Center, Washington University in Saint Louis, St. Louis, MO, USA.
| |
Collapse
|
20
|
Zhang MC, Tian S, Fu D, Wang L, Cheng S, Yi HM, Jiang XF, Song Q, Zhao Y, He Y, Li JF, Mu RJ, Fang H, Yu H, Xiong H, Li B, Chen SJ, Xu PP, Zhao WL. Genetic subtype-guided immunochemotherapy in diffuse large B cell lymphoma: The randomized GUIDANCE-01 trial. Cancer Cell 2023; 41:1705-1716.e5. [PMID: 37774697 DOI: 10.1016/j.ccell.2023.09.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/25/2023] [Accepted: 09/05/2023] [Indexed: 10/01/2023]
Abstract
We report the results of GUIDANCE-01 (NCT04025593), a randomized, phase II trial of R-CHOP alone or combined with targeted agents (R-CHOP-X) guided by genetic subtyping of newly diagnosed, intermediate-risk, or high-risk diffuse large B cell lymphoma (DLBCL). A total of 128 patients were randomized 1:1 to receive R-CHOP-X or R-CHOP. The study achieved the primary endpoint, showing significantly higher complete response rate with R-CHOP-X than R-CHOP (88% vs. 66%, p = 0.003), with overall response rate of 92% vs. 73% (p = 0.005). Two-year progression-free survival rates were 88% vs. 63% (p < 0.001), and 2-year overall survival rates were 94% vs. 77% (p = 0.001). Meanwhile, post hoc RNA-sequencing validated our simplified genetic subtyping algorithm and previously established lymphoma microenvironment subtypes. Our findings highlight the efficacy and safety of R-CHOP-X, a mechanism-based tailored therapy, which dually targeted genetic and microenvironmental alterations in patients with newly diagnosed DLBCL.
Collapse
Affiliation(s)
- Mu-Chen Zhang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shuang Tian
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Di Fu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Li Wang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China; Pôle de Recherches Sino-Français en Science du Vivant et Génomique, Laboratory of Molecular Pathology, Shanghai, China
| | - Shu Cheng
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Hong-Mei Yi
- Department of Pathology, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xu-Feng Jiang
- Department of Nuclear Medicine, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Qi Song
- Department of Radiology, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yan Zhao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yang He
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jian-Feng Li
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Rong-Ji Mu
- Clinical Research Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Hai Fang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Hao Yu
- Department of Research and Development, Shanghai Righton Biotechnology Co. Ltd, Shanghai, China
| | - Hui Xiong
- Department of Research and Development, Shanghai Righton Biotechnology Co. Ltd, Shanghai, China
| | - Biao Li
- Department of Nuclear Medicine, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Sai-Juan Chen
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Peng-Peng Xu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Wei-Li Zhao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China; Pôle de Recherches Sino-Français en Science du Vivant et Génomique, Laboratory of Molecular Pathology, Shanghai, China.
| |
Collapse
|
21
|
Tan W, Shen Y. Multimodal learning of noncoding variant effects using genome sequence and chromatin structure. Bioinformatics 2023; 39:btad541. [PMID: 37669132 PMCID: PMC10502240 DOI: 10.1093/bioinformatics/btad541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 08/28/2023] [Accepted: 09/04/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events. RESULTS We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised "zero-shot" learning or supervised "few-shot" learning. AVAILABILITY AND IMPLEMENTATION Codes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777.
Collapse
Affiliation(s)
- Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX 77843, United States
- Institute of Biosciences and Technology and Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, TX 77030, United States
| |
Collapse
|
22
|
Xu S, Cai G, Zhu Y, Gu X, Wu J, Cheng X, Bao J, Yu H, Zhang L. A Combination of BRAF and EZH1/SPOP/ZNF148 Three-Gene Mutational Classifier Improves Benign Call Rate in Indeterminate Thyroid Nodules. Endocr Pathol 2023; 34:323-332. [PMID: 37572175 DOI: 10.1007/s12022-023-09782-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/05/2023] [Indexed: 08/14/2023]
Abstract
Reliable preoperative diagnosis of thyroid nodules remained challenging because of the inconclusiveness of fine-needle aspiration (FNA) cytology. In the present study, 583 formalin-fixed paraffin embedded (FFPE) thyroid nodule tissues and 161 FNA specimens were enrolled retrospectively. Then BRAF V600E, EZH1 Q571R, SPOP P94R, and ZNF148 mutations among these samples were identified using Sanger sequencing. Based on this four-gene genomic classifier, we proposed a two-step modality to diagnose thyroid nodules to differentiate benign and malignant thyroid nodules. In the FFPE group, thyroid cancers were effectively diagnosed in 37.7% (220/583) of neoplasms by the primary BRAF V600E testing, and 15.7% (57/363) of thyroid nodules could be further determined as benign by subsequent EZH1 Q571R, SPOP P94R, and ZNF148 (we called them "ESZ") mutation testing. In the FNA group, 161 BRAF wild-type specimens were classified according to The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC). A total of 7 mutated samples fell within Bethesda categories III-IV, and the mutation rate of "ESZ" in Bethesda III-IV categories was 8.4%. The two-step genomic classifier could further improve thyroid nodule diagnosis, which may inform more optimal patient management.
Collapse
Affiliation(s)
- Shichen Xu
- NHC Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular Nuclear Medicine, Jiangsu Institute of Nuclear Medicine, 20 Qian Rong Road, Wuxi , Jiangsu, 214063, China
| | - Gangming Cai
- Clinical Molecular Biology Laboratory, Jiangyuan Hospital Affiliated to Jiangsu Institute of Nuclear Medicine, Wuxi , Jiangsu, 214063, China
| | - Yun Zhu
- Department of Pathology, Jiangyuan Hospital Affiliated to Jiangsu Institute of Nuclear Medicine, Wuxi , Jiangsu, 214063, China
| | - Xiaobo Gu
- Clinical Molecular Biology Laboratory, Jiangyuan Hospital Affiliated to Jiangsu Institute of Nuclear Medicine, Wuxi , Jiangsu, 214063, China
| | - Jing Wu
- NHC Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular Nuclear Medicine, Jiangsu Institute of Nuclear Medicine, 20 Qian Rong Road, Wuxi , Jiangsu, 214063, China
| | - Xian Cheng
- NHC Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular Nuclear Medicine, Jiangsu Institute of Nuclear Medicine, 20 Qian Rong Road, Wuxi , Jiangsu, 214063, China
| | - Jiandong Bao
- NHC Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular Nuclear Medicine, Jiangsu Institute of Nuclear Medicine, 20 Qian Rong Road, Wuxi , Jiangsu, 214063, China
| | - Huixin Yu
- NHC Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular Nuclear Medicine, Jiangsu Institute of Nuclear Medicine, 20 Qian Rong Road, Wuxi , Jiangsu, 214063, China
| | - Li Zhang
- NHC Key Laboratory of Nuclear Medicine, Jiangsu Key Laboratory of Molecular Nuclear Medicine, Jiangsu Institute of Nuclear Medicine, 20 Qian Rong Road, Wuxi , Jiangsu, 214063, China.
- Department of Radiopharmaceuticals, School of Pharmacy, Nanjing Medical University, Nanjing, 211166, China.
- School of Life Science and Technology, Southeast University, Nanjing, 210096, China.
| |
Collapse
|
23
|
Yang M, Ali O, Bjørås M, Wang J. Identifying functional regulatory mutation blocks by integrating genome sequencing and transcriptome data. iScience 2023; 26:107266. [PMID: 37520692 PMCID: PMC10371843 DOI: 10.1016/j.isci.2023.107266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 04/05/2023] [Accepted: 06/28/2023] [Indexed: 08/01/2023] Open
Abstract
Millions of single nucleotide variants (SNVs) exist in the human genome; however, it remains challenging to identify functional SNVs associated with diseases. We propose a non-encoding SNVs analysis tool bpb3, BayesPI-BAR version 3, aiming to identify the functional mutation blocks (FMBs) by integrating genome sequencing and transcriptome data. The identified FMBs display high frequency SNVs, significant changes in transcription factors (TFs) binding affinity and are nearby the regulatory regions of differentially expressed genes. A two-level Bayesian approach with a biophysical model for protein-DNA interactions is implemented, to compute TF-DNA binding affinity changes based on clustered position weight matrices (PWMs) from over 1700 TF-motifs. The epigenetic data, such as the DNA methylome can also be integrated to scan FMBs. By testing the datasets from follicular lymphoma and melanoma, bpb3 automatically and robustly identifies FMBs, demonstrating that bpb3 can provide insight into patho-mechanisms, and therapeutic targets from transcriptomic and genomic data.
Collapse
Affiliation(s)
- Mingyi Yang
- Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Medical Biochemistry, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Omer Ali
- Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Oslo, Norway
- Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Magnar Bjørås
- Department of Microbiology, Oslo University Hospital and University of Oslo, Oslo, Norway
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Junbai Wang
- Department of Clinical Molecular Biology (EpiGen), Akershus University Hospital and University of Oslo, Lørenskog, Norway
| |
Collapse
|
24
|
Vellichirammal NN, Tan YD, Xiao P, Eudy J, Shats O, Kelly D, Desler M, Cowan K, Guda C. The mutational landscape of a US Midwestern breast cancer cohort reveals subtype-specific cancer drivers and prognostic markers. Hum Genomics 2023; 17:64. [PMID: 37454130 PMCID: PMC10349437 DOI: 10.1186/s40246-023-00511-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 07/11/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND Female breast cancer remains the second leading cause of cancer-related death in the USA. The heterogeneity in the tumor morphology across the cohort and within patients can lead to unpredictable therapy resistance, metastasis, and clinical outcome. Hence, supplementing classic pathological markers with intrinsic tumor molecular markers can help identify novel molecular subtypes and the discovery of actionable biomarkers. METHODS We conducted a large multi-institutional genomic analysis of paired normal and tumor samples from breast cancer patients to profile the complex genomic architecture of breast tumors. Long-term patient follow-up, therapeutic regimens, and treatment response for this cohort are documented using the Breast Cancer Collaborative Registry. The majority of the patients in this study were at tumor stage 1 (51.4%) and stage 2 (36.3%) at the time of diagnosis. Whole-exome sequencing data from 554 patients were used for mutational profiling and identifying cancer drivers. RESULTS We identified 54 tumors having at least 1000 mutations and 185 tumors with less than 100 mutations. Tumor mutational burden varied across the classified subtypes, and the top ten mutated genes include MUC4, MUC16, PIK3CA, TTN, TP53, NBPF10, NBPF1, CDC27, AHNAK2, and MUC2. Patients were classified based on seven biological and tumor-specific parameters, including grade, stage, hormone receptor status, histological subtype, Ki67 expression, lymph node status, race, and mutational profiles compared across different subtypes. Mutual exclusion of mutations in PIK3CA and TP53 was pronounced across different tumor grades. Cancer drivers specific to each subtype include TP53, PIK3CA, CDC27, CDH1, STK39, CBFB, MAP3K1, and GATA3, and mutations associated with patient survival were identified in our cohort. CONCLUSIONS This extensive study has revealed tumor burden, driver genes, co-occurrence, mutual exclusivity, and survival effects of mutations on a US Midwestern breast cancer cohort, paving the way for developing personalized therapeutic strategies.
Collapse
Affiliation(s)
| | - Yuan-De Tan
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA
| | - Peng Xiao
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA
| | - James Eudy
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA
| | - Oleg Shats
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE, USA
- Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, USA
| | - David Kelly
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE, USA
- Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, USA
| | - Michelle Desler
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE, USA
- Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, USA
| | - Kenneth Cowan
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE, USA
- Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, USA
| | - Chittibabu Guda
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA.
- Center for Biomedical Informatics Research and Innovation, University of Nebraska Medical Center, Omaha, NE, 68198, USA.
- Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, USA.
| |
Collapse
|
25
|
Kin K, Bhogale S, Zhu L, Thomas D, Bertol J, Zheng WJ, Sinha S, Fakhouri WD. Sequence-to-expression approach to identify etiological non-coding DNA variations in P53 and cMYC-driven diseases. RESEARCH SQUARE 2023:rs.3.rs-3037310. [PMID: 37503250 PMCID: PMC10371153 DOI: 10.21203/rs.3.rs-3037310/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Background and methods Disease risk prediction based on DNA sequence and transcriptional profile can improve disease screening, prevention, and potential therapeutic approaches by revealing contributing genetic factors and altered regulatory networks. Despite identifying many disease-associated DNA variants through genome-wide association studies, distinguishing deleterious non-coding DNA variations remains poor for most common diseases. We previously reported that non-coding variations disrupting cis-overlapping motifs (CisOMs) of opposing transcription factors significantly affect enhancer activity. We designed in vitro experiments to uncover the significance of the co-occupancy and competitive binding and inhibition between P53 and cMYC on common target gene expression. Results Analyzing publicly available ChIP-seq data for P53 and cMYC in human embryonic stem cells and mouse embryonic cells showed that ~ 344-366 genomic regions are co-occupied by P53 and cMYC. We identified, on average, two CisOMs per region, suggesting that co-occupancy is evolutionarily conserved in vertebrates. Our data showed that treating U2OS cells with doxorubicin increased P53 protein level while reducing cMYC level. In contrast, no change in protein levels was observed in Raji cells. ChIP-seq analysis illustrated that 16-922 genomic regions were co-occupied by P53 and cMYC before and after treatment, and substitutions of cMYC signals by P53 were detected after doxorubicin treatment in U2OS. Around 187 expressed genes near co-occupied regions were altered at mRNA level according to RNA-seq data. We utilized a computational motif-matching approach to determine that changes in predicted P53 binding affinity by DNA variations in CisOMs of co-occupied elements significantly correlate with alterations in reporter gene expression. We performed a similar analysis using SNPs mapped in CisOMs for P53 and cMYC from ChIP-seq data in U2OS and Raji, and expression of target genes from the GTEx portal. Conclusions We found a significant correlation between change in motif-predicted cMYC binding affinity by SNPs in CisOMs and altered gene expression. Our study brings us closer to developing a generally applicable approach to filter etiological non-coding variations associated with P53 and cMYC-dependent diseases.
Collapse
Affiliation(s)
- Katherine Kin
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston
| | | | - Lisha Zhu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston
| | - Derrick Thomas
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston
| | - Jessica Bertol
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston
| | - W Jim Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston
| | - Saurabh Sinha
- The Wallace H. Coulter Department of Biomedical Engineering
| | - Walid D Fakhouri
- Department of Diagnostic and Biomedical Sciences, Center for Craniofacial Research, School of Dentistry, University of Texas Health Science Center at Houston
| |
Collapse
|
26
|
Chen Z, Yang Z, Zhu L, Gao P, Matsubara T, Kanaya S, Altaf-Ul-Amin M. Learning vector quantized representation for cancer subtypes identification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 236:107543. [PMID: 37100024 DOI: 10.1016/j.cmpb.2023.107543] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 02/13/2023] [Accepted: 04/07/2023] [Indexed: 05/21/2023]
Abstract
BACKGROUND AND OBJECTIVE Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality while they impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. METHODS This paper proposes to leverage a recent strong generative model, Vector-Quantized Variational AutoEncoder, to tackle the data issues and extract discrete representations that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. RESULTS Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the proposed clustering results can significantly and robustly improve prognosis over prevalent subtyping systems. CONCLUSION Our proposal does not impose strict assumptions on data distribution; while, its latent features are better representations of the transcriptomic data in different cancer subtypes, capable of yielding superior clustering performance with any mainstream clustering method.
Collapse
Affiliation(s)
- Zheng Chen
- Graduate School of Engineering Science, Osaka University, Japan.
| | - Ziwei Yang
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan
| | - Lingwei Zhu
- Department of Computing Science, University of Alberta, Canada
| | - Peng Gao
- Institute for Quantitative Biosciences, University of Tokyo, Japan
| | | | - Shigehiko Kanaya
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan; Data Science Center, Nara Insitute of Science and Technology, Japan
| | - Md Altaf-Ul-Amin
- Graduate School of Science and Technology, Nara Institute of Science and Technology, Japan
| |
Collapse
|
27
|
Kumar S, Gerstein M. Unified views on variant impact across many diseases. Trends Genet 2023; 39:442-450. [PMID: 36858880 PMCID: PMC10192142 DOI: 10.1016/j.tig.2023.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 02/02/2023] [Accepted: 02/02/2023] [Indexed: 03/03/2023]
Abstract
Genomic studies of human disorders are often performed by distinct research communities (i.e., focused on rare diseases, common diseases, or cancer). Despite underlying differences in the mechanistic origin of different disease categories, these studies share the goal of identifying causal genomic events that are critical for the clinical manifestation of the disease phenotype. Moreover, these studies face common challenges, including understanding the complex genetic architecture of the disease, deciphering the impact of variants on multiple scales, and interpreting noncoding mutations. Here, we highlight these challenges in depth and argue that properly addressing them will require a more unified vocabulary and approach across disease communities. Toward this goal, we present a unified perspective on relating variant impact to various genomic disorders.
Collapse
Affiliation(s)
- Sushant Kumar
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada; Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Computer Science, Yale University, New Haven, CT 06520, USA; Department of Statistics & Data Science, Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
28
|
Shi FY, Wang Y, Huang D, Liang Y, Liang N, Chen XW, Gao G. Computational Assessment of the Expression-modulating Potential for Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:662-673. [PMID: 34890839 PMCID: PMC10787178 DOI: 10.1016/j.gpb.2021.10.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 10/13/2021] [Accepted: 11/01/2021] [Indexed: 06/13/2023]
Abstract
Large-scale genome-wide association studies (GWAS) and expression quantitative trait locus (eQTL) studies have identified multiple non-coding variants associated with genetic diseases by affecting gene expression. However, pinpointing causal variants effectively and efficiently remains a serious challenge. Here, we developed CARMEN, a novel algorithm to identify functional non-coding expression-modulating variants. Multiple evaluations demonstrated CARMEN's superior performance over state-of-the-art tools. Applying CARMEN to GWAS and eQTL datasets further pinpointed several causal variants other than the reported lead single-nucleotide polymorphisms (SNPs). CARMEN scales well with the massive datasets, and is available online as a web server at http://carmen.gao-lab.org.
Collapse
Affiliation(s)
- Fang-Yuan Shi
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Yu Wang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Dong Huang
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China
| | - Yu Liang
- Human Aging Research Institute, School of Life Science, Nanchang University, Nanchang 330031, China
| | - Nan Liang
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China
| | - Xiao-Wei Chen
- State Key Laboratory of Membrane Biology, Institute of Molecular Medicine, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Ge Gao
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center (BIOPIC) & Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), Peking University, Beijing 100871, China.
| |
Collapse
|
29
|
Wang Z, Zhao G, Li B, Fang Z, Chen Q, Wang X, Luo T, Wang Y, Zhou Q, Li K, Xia L, Zhang Y, Zhou X, Pan H, Zhao Y, Wang Y, Wang L, Guo J, Tang B, Xia K, Li J. Performance Comparison of Computational Methods for the Prediction of the Function and Pathogenicity of Non-coding Variants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:649-661. [PMID: 35272052 PMCID: PMC10787016 DOI: 10.1016/j.gpb.2022.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 12/28/2021] [Accepted: 02/27/2022] [Indexed: 06/14/2023]
Abstract
Non-coding variants in the human genome significantly influence human traits and complex diseases via their regulation and modification effects. Hence, an increasing number of computational methods are developed to predict the effects of variants in human non-coding sequences. However, it is difficult for inexperienced users to select appropriate computational methods from dozens of available methods. To solve this issue, we assessed 12 performance metrics of 24 methods on four independent non-coding variant benchmark datasets: (1) rare germline variants from clinical relevant sequence variants (ClinVar), (2) rare somatic variants from Catalogue Of Somatic Mutations In Cancer (COSMIC), (3) common regulatory variants from curated expression quantitative trait locus (eQTL) data, and (4) disease-associated common variants from curated genome-wide association studies (GWAS). All 24 tested methods performed differently under various conditions, indicating varying strengths and weaknesses under different scenarios. Importantly, the performance of existing methods was acceptable for rare germline variants from ClinVar with the area under the receiver operating characteristic curve (AUROC) of 0.4481-0.8033 and poor for rare somatic variants from COSMIC (AUROC = 0.4984-0.7131), common regulatory variants from curated eQTL data (AUROC = 0.4837-0.6472), and disease-associated common variants from curated GWAS (AUROC = 0.4766-0.5188). We also compared the prediction performance of 24 methods for non-coding de novo mutations in autism spectrum disorder, and found that the combined annotation-dependent depletion (CADD) and context-dependent tolerance score (CDTS) methods showed better performance. Summarily, we assessed the performance of 24 computational methods under diverse scenarios, providing preliminary advice for proper tool selection and guiding the development of new techniques in interpreting non-coding variants.
Collapse
Affiliation(s)
- Zheng Wang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Guihu Zhao
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Bin Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Zhenghuan Fang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qian Chen
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xiaomeng Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Tengfei Luo
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yijing Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Qiao Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kuokuo Li
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Lu Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Yi Zhang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Xun Zhou
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Hongxu Pan
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yuwen Zhao
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Yige Wang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Lin Wang
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China; Reproductive Medicine Center, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Jifeng Guo
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Beisha Tang
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China
| | - Kun Xia
- Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China
| | - Jinchen Li
- National Clinical Research Centre for Geriatric Disorders, Department of Geriatrics, Xiangya Hospital, Central South University, Changsha 410008, China; Department of Neurology, Xiangya Hospital, Central South University, Changsha 410008, China; Centre for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha 410008, China.
| |
Collapse
|
30
|
Fu C, Ngo J, Zhang S, Lu L, Miron A, Schafer S, Gage FH, Jin F, Schumacher FR, Wynshaw-Boris A. Novel correlative analysis identifies multiple genomic variations impacting ASD with macrocephaly. Hum Mol Genet 2023; 32:1589-1606. [PMID: 36519762 PMCID: PMC10162433 DOI: 10.1093/hmg/ddac300] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 12/06/2022] [Accepted: 12/07/2022] [Indexed: 12/23/2022] Open
Abstract
Autism spectrum disorders (ASD) display both phenotypic and genetic heterogeneity, impeding the understanding of ASD and development of effective means of diagnosis and potential treatments. Genes affected by genomic variations for ASD converge in dozens of gene ontologies (GOs), but the relationship between the variations at the GO level have not been well elucidated. In the current study, multiple types of genomic variations were mapped to GOs and correlations among GOs were measured in ASD and control samples. Several ASD-unique GO correlations were found, suggesting the importance of co-occurrence of genomic variations in genes from different functional categories in ASD etiology. Combined with experimental data, several variations related to WNT signaling, neuron development, synapse morphology/function and organ morphogenesis were found to be important for ASD with macrocephaly, and novel co-occurrence patterns of them in ASD patients were found. Furthermore, we applied this gene ontology correlation analysis method to find genomic variations that contribute to ASD etiology in combination with changes in gene expression and transcription factor binding, providing novel insights into ASD with macrocephaly and a new methodology for the analysis of genomic variation.
Collapse
Affiliation(s)
- Chen Fu
- Department of Genetics and Genomic Science, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Justine Ngo
- Department of Genetics and Genomic Science, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Shanshan Zhang
- Department of Genetics and Genomic Science, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Leina Lu
- Department of Genetics and Genomic Science, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Alexander Miron
- Department of Genetics and Genomic Science, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Simon Schafer
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Fred H Gage
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Fulai Jin
- Department of Genetics and Genomic Science, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Fredrick R Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Anthony Wynshaw-Boris
- Department of Genetics and Genomic Science, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| |
Collapse
|
31
|
Rozowsky J, Gao J, Borsari B, Yang YT, Galeev T, Gürsoy G, Epstein CB, Xiong K, Xu J, Li T, Liu J, Yu K, Berthel A, Chen Z, Navarro F, Sun MS, Wright J, Chang J, Cameron CJF, Shoresh N, Gaskell E, Drenkow J, Adrian J, Aganezov S, Aguet F, Balderrama-Gutierrez G, Banskota S, Corona GB, Chee S, Chhetri SB, Cortez Martins GC, Danyko C, Davis CA, Farid D, Farrell NP, Gabdank I, Gofin Y, Gorkin DU, Gu M, Hecht V, Hitz BC, Issner R, Jiang Y, Kirsche M, Kong X, Lam BR, Li S, Li B, Li X, Lin KZ, Luo R, Mackiewicz M, Meng R, Moore JE, Mudge J, Nelson N, Nusbaum C, Popov I, Pratt HE, Qiu Y, Ramakrishnan S, Raymond J, Salichos L, Scavelli A, Schreiber JM, Sedlazeck FJ, See LH, Sherman RM, Shi X, Shi M, Sloan CA, Strattan JS, Tan Z, Tanaka FY, Vlasova A, Wang J, Werner J, Williams B, Xu M, Yan C, Yu L, Zaleski C, Zhang J, Ardlie K, Cherry JM, Mendenhall EM, Noble WS, Weng Z, Levine ME, Dobin A, Wold B, Mortazavi A, Ren B, Gillis J, Myers RM, Snyder MP, Choudhary J, Milosavljevic A, Schatz MC, Bernstein BE, Guigó R, Gingeras TR, Gerstein M. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 2023; 186:1493-1511.e40. [PMID: 37001506 PMCID: PMC10074325 DOI: 10.1016/j.cell.2023.02.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 10/16/2022] [Accepted: 02/10/2023] [Indexed: 04/03/2023]
Abstract
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
Collapse
Affiliation(s)
- Joel Rozowsky
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Yucheng T Yang
- Institute of Science and Technology for Brain-Inspired Intelligence; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence; MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Timur Galeev
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Gamze Gürsoy
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Kun Xiong
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jinrui Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Tianxiao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jason Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Keyang Yu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Ana Berthel
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Zhanlin Chen
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA
| | - Fabio Navarro
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Maxwell S Sun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Justin Chang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Christopher J F Cameron
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Noam Shoresh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jorg Drenkow
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jessika Adrian
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Sergey Aganezov
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | | | | | | | | | - Sora Chee
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Surya B Chhetri
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Gabriel Conte Cortez Martins
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Cassidy Danyko
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Carrie A Davis
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Daniel Farid
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Idan Gabdank
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Yoel Gofin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - David U Gorkin
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Mengting Gu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Vivian Hecht
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin C Hitz
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Robbyn Issner
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Melanie Kirsche
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Xiangmeng Kong
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Bonita R Lam
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Shantao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Bian Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Xiqi Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Khine Zin Lin
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, CHN
| | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Ran Meng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Jonathan Mudge
- European Bioinformatics Institute, Cambridge, Cambridgeshire, GB
| | | | - Chad Nusbaum
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ioann Popov
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Henry E Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Yunjiang Qiu
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Srividya Ramakrishnan
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Joe Raymond
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Leonidas Salichos
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Biological and Chemical Sciences, New York Institute of Technology, Old Westbury, NY, USA
| | - Alexandra Scavelli
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jacob M Schreiber
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Fritz J Sedlazeck
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Lei Hoon See
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Rachel M Sherman
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Xu Shi
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Minyi Shi
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cricket Alicia Sloan
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - J Seth Strattan
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Zhen Tan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Forrest Y Tanaka
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Anna Vlasova
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Comparative Genomics Group, Life Science Programme, Barcelona Supercomputing Centre, Barcelona, Spain; Institute of Research in Biomedicine, Barcelona, Spain
| | - Jun Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jonathan Werner
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Min Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Chengfei Yan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Lu Yu
- Institute of Cancer Research, London, UK
| | - Christopher Zaleski
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, USA
| | | | - J Michael Cherry
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | | | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Morgan E Levine
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
| | - Alexander Dobin
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Jesse Gillis
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Department of Physiology, University of Toronto, Toronto, ON, Canada
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Michael P Snyder
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | | | | | - Michael C Schatz
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Bradley E Bernstein
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Roderic Guigó
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
| | - Thomas R Gingeras
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Mark Gerstein
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Statistics and Data Science, Yale University, New Haven, CT, USA; Department of Computer Science, Yale University, New Haven, CT, USA.
| |
Collapse
|
32
|
Kirsche M, Prabhu G, Sherman R, Ni B, Battle A, Aganezov S, Schatz MC. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat Methods 2023; 20:408-417. [PMID: 36658279 PMCID: PMC10006329 DOI: 10.1038/s41592-022-01753-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 12/15/2022] [Indexed: 01/21/2023]
Abstract
The availability of long reads is revolutionizing studies of structural variants (SVs). However, because SVs vary across individuals and are discovered through imprecise read technologies and methods, they can be difficult to compare. Addressing this, we present Jasmine and Iris ( https://github.com/mkirsche/Jasmine/ ), for fast and accurate SV refinement, comparison and population analysis. Using an SV proximity graph, Jasmine outperforms six widely used comparison methods, including reducing the rate of Mendelian discordance in trio datasets by more than fivefold, and reveals a set of high-confidence de novo SVs confirmed by multiple technologies. We also present a unified callset of 122,813 SVs and 82,379 indels from 31 samples of diverse ancestry sequenced with long reads. We genotype these variants in 1,317 samples from the 1000 Genomes Project and the Genotype-Tissue Expression project with DNA and RNA-sequencing data and assess their widespread impact on gene expression, including within medically relevant genes.
Collapse
Affiliation(s)
- Melanie Kirsche
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Gautam Prabhu
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Rachel Sherman
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Bohan Ni
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
33
|
Tomkova M, Tomek J, Chow J, McPherson JD, Segal DJ, Hormozdiari F. Dr.Nod: computational framework for discovery of regulatory non-coding drivers in tissue-matched distal regulatory elements. Nucleic Acids Res 2023; 51:e23. [PMID: 36625266 PMCID: PMC9976879 DOI: 10.1093/nar/gkac1251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 12/07/2022] [Accepted: 12/19/2022] [Indexed: 01/11/2023] Open
Abstract
The discovery of cancer driver mutations is a fundamental goal in cancer research. While many cancer driver mutations have been discovered in the protein-coding genome, research into potential cancer drivers in the non-coding regions showed limited success so far. Here, we present a novel comprehensive framework Dr.Nod for detection of non-coding cis-regulatory candidate driver mutations that are associated with dysregulated gene expression using tissue-matched enhancer-gene annotations. Applying the framework to data from over 1500 tumours across eight tissues revealed a 4.4-fold enrichment of candidate driver mutations in regulatory regions of known cancer driver genes. An overarching conclusion that emerges is that the non-coding driver mutations contribute to cancer by significantly altering transcription factor binding sites, leading to upregulation of tissue-matched oncogenes and down-regulation of tumour-suppressor genes. Interestingly, more than half of the detected cancer-promoting non-coding regulatory driver mutations are over 20 kb distant from the cancer-associated genes they regulate. Our results show the importance of tissue-matched enhancer-gene maps, functional impact of mutations, and complex background mutagenesis model for the prediction of non-coding regulatory drivers. In conclusion, our study demonstrates that non-coding mutations in enhancers play a previously underappreciated role in cancer and dysregulation of clinically relevant target genes.
Collapse
Affiliation(s)
- Marketa Tomkova
- Department of Biochemistry and Molecular Medicine, University of California, Davis, CA 95616, USA.,Ludwig Cancer Research, University of Oxford, Oxford, OX3 7DQ, UK.,UC Davis Genome Center, University of California, Davis, CA 95616, USA
| | - Jakub Tomek
- Department of Pharmacology, University of California, Davis, CA 95616, USA
| | - Julie Chow
- Department of Biochemistry and Molecular Medicine, University of California, Davis, CA 95616, USA
| | - John D McPherson
- Department of Biochemistry and Molecular Medicine, University of California, Davis, CA 95616, USA
| | - David J Segal
- Department of Biochemistry and Molecular Medicine, University of California, Davis, CA 95616, USA.,UC Davis Genome Center, University of California, Davis, CA 95616, USA.,UC Davis MIND Institute, University of California, Davis, CA 95616, USA
| | - Fereydoun Hormozdiari
- Department of Biochemistry and Molecular Medicine, University of California, Davis, CA 95616, USA.,UC Davis Genome Center, University of California, Davis, CA 95616, USA.,UC Davis MIND Institute, University of California, Davis, CA 95616, USA
| |
Collapse
|
34
|
Larson NB, Oberg AL, Adjei AA, Wang L. A Clinician's Guide to Bioinformatics for Next-Generation Sequencing. J Thorac Oncol 2023; 18:143-157. [PMID: 36379355 PMCID: PMC9870988 DOI: 10.1016/j.jtho.2022.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 10/31/2022] [Accepted: 11/05/2022] [Indexed: 11/15/2022]
Abstract
Next-generation sequencing (NGS) technologies are high-throughput methods for DNA sequencing and have become a widely adopted tool in cancer research. The sheer amount and variety of data generated by NGS assays require sophisticated computational methods and bioinformatics expertise. In this review, we provide background details of NGS technology and basic bioinformatics concepts for the clinician investigator interested in cancer research applications, with a focus on DNA-based approaches. We introduce the general principles of presequencing library preparation, postsequencing alignment, and variant calling. We also highlight the common variant annotations and NGS applications for other molecular data types. Finally, we briefly discuss the revealed utility of NGS methods in NSCLC research and study design considerations for research studies that aim to leverage NGS technologies for clinical care.
Collapse
Affiliation(s)
- Nicholas Bradley Larson
- Division of Clinical Trials and Biostatistics, Department of Quantitative Health Sciences, Mayo Clinic College of Medicine and Science, Rochester, Minnesota.
| | - Ann L Oberg
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic College of Medicine and Science, Rochester, Minnesota
| | - Alex A Adjei
- Taussig Cancer Institute, Cleveland Clinic, Cleveland, Ohio
| | - Liguo Wang
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic College of Medicine and Science, Rochester, Minnesota
| |
Collapse
|
35
|
Labani M, Beheshti A, Argha A, Alinejad-Rokny H. A Comprehensive Investigation of Genomic Variants in Prostate Cancer Reveals 30 Putative Regulatory Variants. Int J Mol Sci 2023; 24:ijms24032472. [PMID: 36768794 PMCID: PMC9916892 DOI: 10.3390/ijms24032472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/18/2023] [Accepted: 01/23/2023] [Indexed: 01/31/2023] Open
Abstract
Prostate cancer (PC) is the most frequently diagnosed non-skin cancer in the world. Previous studies have shown that genomic alterations represent the most common mechanism for molecular alterations responsible for the development and progression of PC. This highlights the importance of identifying functional genomic variants for early detection in high-risk PC individuals. Great efforts have been made to identify common protein-coding genetic variations; however, the impact of non-coding variations, including regulatory genetic variants, is not well understood. Identification of these variants and the underlying target genes will be a key step in improving the detection and treatment of PC. To gain an understanding of the functional impact of genetic variants, and in particular, regulatory variants in PC, we developed an integrative pipeline (AGV) that uses whole genome/exome sequences, GWAS SNPs, chromosome conformation capture data, and ChIP-Seq signals to investigate the potential impact of genomic variants on the underlying target genes in PC. We identified 646 putative regulatory variants, of which 30 significantly altered the expression of at least one protein-coding gene. Our analysis of chromatin interactions data (Hi-C) revealed that the 30 putative regulatory variants could affect 131 coding and non-coding genes. Interestingly, our study identified the 131 protein-coding genes that are involved in disease-related pathways, including Reactome and MSigDB, for most of which targeted treatment options are currently available. Notably, our analysis revealed several non-coding RNAs, including RP11-136K7.2 and RAMP2-AS1, as potential enhancer elements of the protein-coding genes CDH12 and EZH1, respectively. Our results provide a comprehensive map of genomic variants in PC and reveal their potential contribution to prostate cancer progression and development.
Collapse
Affiliation(s)
- Mahdieh Labani
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
- Data Analytic Lab, Department of Computing, Macquarie University, Sydney, NSW 2109, Australia
| | - Amin Beheshti
- Data Analytic Lab, Department of Computing, Macquarie University, Sydney, NSW 2109, Australia
| | - Ahmadreza Argha
- The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
- UNSW Data Science Hub, The University of New South Wales, Sydney, NSW 2052, Australia
- Health Data Analytics Program, Centre for Applied AI, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
36
|
Zhou H, Arapoglou T, Li X, Li Z, Zheng X, Moore J, Asok A, Kumar S, Blue E, Buyske S, Cox N, Felsenfeld A, Gerstein M, Kenny E, Li B, Matise T, Philippakis A, Rehm HL, Sofia HJ, Snyder G, Weng Z, Neale B, Sunyaev S, Lin X. FAVOR: functional annotation of variants online resource and annotator for variation across the human genome. Nucleic Acids Res 2023; 51:D1300-D1311. [PMID: 36350676 PMCID: PMC9825437 DOI: 10.1093/nar/gkac966] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/25/2022] [Accepted: 10/14/2022] [Indexed: 11/11/2022] Open
Abstract
Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.
Collapse
Affiliation(s)
- Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Theodore Arapoglou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Xiuwen Zheng
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Jill Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | | | - Sushant Kumar
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Elizabeth E Blue
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Steven Buyske
- Department of Statistics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Nancy Cox
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Eimear Kenny
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Tara Matise
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Anthony Philippakis
- Data Science Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Heidi J Sofia
- National Human Genome Research Institute, Bethesda, DC, USA
| | - Grace Snyder
- National Human Genome Research Institute, Bethesda, DC, USA
| | | | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Benjamin Neale
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Shamil R Sunyaev
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
37
|
Niazi Y, Paramasivam N, Blocka J, Kumar A, Huhn S, Schlesner M, Weinhold N, Sijmons R, De Jong M, Durie B, Goldschmidt H, Hemminki K, Försti A. Investigation of Rare Non-Coding Variants in Familial Multiple Myeloma. Cells 2022; 12:cells12010096. [PMID: 36611892 PMCID: PMC9818386 DOI: 10.3390/cells12010096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 12/16/2022] [Accepted: 12/22/2022] [Indexed: 12/28/2022] Open
Abstract
Multiple myeloma (MM) is a plasma cell malignancy whereby a single clone of plasma cells over-propagates in the bone marrow, resulting in the increased production of monoclonal immunoglobulin. While the complex genetic architecture of MM is well characterized, much less is known about germline variants predisposing to MM. Genome-wide sequencing approaches in MM families have started to identify rare high-penetrance coding risk alleles. In addition, genome-wide association studies have discovered several common low-penetrance risk alleles, which are mainly located in the non-coding genome. Here, we further explored the genetic basis in familial MM within the non-coding genome in whole-genome sequencing data. We prioritized and characterized 150 upstream, 5' untranslated region (UTR) and 3' UTR variants from 14 MM families, including 20 top-scoring variants. These variants confirmed previously implicated biological pathways in MM development. Most importantly, protein network and pathway enrichment analyses also identified 10 genes involved in mitogen-activated protein kinase (MAPK) signaling pathways, which have previously been established as important MM pathways.
Collapse
Affiliation(s)
- Yasmeen Niazi
- Hopp Children’s Cancer Center (KiTZ), 69120 Heidelberg, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), 69120 Heidelberg, Germany
- Correspondence: (Y.N.); (K.H.)
| | - Nagarajan Paramasivam
- Computational Oncology, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany
| | - Joanna Blocka
- Department of Internal Medicine V, University of Heidelberg, 69120 Heidelberg, Germany
- Department of Medical Oncology, Jerome Lipper Multiple Myeloma Center, Dana-Farber Cancer Institute, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Abhishek Kumar
- Institute of Bioinformatics, International Technology Park, Bangalore 560066, India
- Manipal Academy of Higher Education (MAHE), Manipal 576104, India
| | - Stefanie Huhn
- Department of Internal Medicine V, University of Heidelberg, 69120 Heidelberg, Germany
- National Center for Tumor Diseases Heidelberg (NCT), 69120 Heidelberg, Germany
| | - Matthias Schlesner
- Bioinformatics and Omics Data Analytics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Niels Weinhold
- Department of Internal Medicine V, University of Heidelberg, 69120 Heidelberg, Germany
| | - Rolf Sijmons
- University Medical Center Groningen, University of Groningen, 9712 Groningen, The Netherlands
| | - Mirjam De Jong
- University Medical Center Groningen, University of Groningen, 9712 Groningen, The Netherlands
| | - Brian Durie
- Cedars Sinai Cancer Center, Los Angeles, CA 90048, USA
| | - Hartmut Goldschmidt
- Computational Oncology, Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT), 69120 Heidelberg, Germany
- Department of Internal Medicine V, University of Heidelberg, 69120 Heidelberg, Germany
| | - Kari Hemminki
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
- Faculty of Medicine and Biomedical Center in Pilsen, Charles University in Prague, 323 00 Pilsen, Czech Republic
- Correspondence: (Y.N.); (K.H.)
| | - Asta Försti
- Hopp Children’s Cancer Center (KiTZ), 69120 Heidelberg, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), 69120 Heidelberg, Germany
| |
Collapse
|
38
|
Integrative Meta-Analysis of Huntington's Disease Transcriptome Landscape. Genes (Basel) 2022; 13:genes13122385. [PMID: 36553652 PMCID: PMC9777612 DOI: 10.3390/genes13122385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 11/24/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
Huntington's disease (HD) is a neurodegenerative disorder with autosomal dominant inheritance caused by glutamine expansion in the Huntingtin gene (HTT). Striatal projection neurons (SPNs) in HD are more vulnerable to cell death. The executive striatal population is directly connected with the Brodmann Area (BA9), which is mainly involved in motor functions. Analyzing the disease samples from BA9 from the SRA database provides insights related to neuron degeneration, which helps to identify a promising therapeutic strategy. Most gene expression studies examine the changes in expression and associated biological functions. In this study, we elucidate the relationship between variants and their effect on gene/downstream transcript expression. We computed gene and transcript abundance and identified variants from RNA-seq data using various pipelines. We predicted the effect of genome-wide association studies (GWAS)/novel variants on regulatory functions. We found that many variants affect the histone acetylation pattern in HD, thereby perturbing the transcription factor networks. Interestingly, some variants affect miRNA binding as well as their downstream gene expression. Tissue-specific network analysis showed that mitochondrial, neuroinflammation, vasculature, and angiogenesis-related genes are disrupted in HD. From this integrative omics analysis, we propose that abnormal neuroinflammation acts as a two-edged sword that indirectly affects the vasculature and associated energy metabolism. Rehabilitation of blood-brain barrier functionality and energy metabolism may secure the neuron from cell death.
Collapse
|
39
|
Wang Y, Chen L. DeepPerVar: a multi-modal deep learning framework for functional interpretation of genetic variants in personal genome. Bioinformatics 2022; 38:5340-5351. [PMID: 36271868 PMCID: PMC9750124 DOI: 10.1093/bioinformatics/btac696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 09/04/2022] [Accepted: 10/20/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Understanding the functional consequence of genetic variants, especially the non-coding ones, is important but particularly challenging. Genome-wide association studies (GWAS) or quantitative trait locus analyses may be subject to limited statistical power and linkage disequilibrium, and thus are less optimal to pinpoint the causal variants. Moreover, most existing machine-learning approaches, which exploit the functional annotations to interpret and prioritize putative causal variants, cannot accommodate the heterogeneity of personal genetic variations and traits in a population study, targeting a specific disease. RESULTS By leveraging paired whole-genome sequencing data and epigenetic functional assays in a population study, we propose a multi-modal deep learning framework to predict genome-wide quantitative epigenetic signals by considering both personal genetic variations and traits. The proposed approach can further evaluate the functional consequence of non-coding variants on an individual level by quantifying the allelic difference of predicted epigenetic signals. By applying the approach to the ROSMAP cohort studying Alzheimer's disease (AD), we demonstrate that the proposed approach can accurately predict quantitative genome-wide epigenetic signals and in key genomic regions of AD causal genes, learn canonical motifs reported to regulate gene expression of AD causal genes, improve the partitioning heritability analysis and prioritize putative causal variants in a GWAS risk locus. Finally, we release the proposed deep learning model as a stand-alone Python toolkit and a web server. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/DeepPerVar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ye Wang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Li Chen
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| |
Collapse
|
40
|
Castro-Mondragon JA, Aure M, Lingjærde O, Langerød A, Martens JWM, Børresen-Dale AL, Kristensen V, Mathelier A. Cis-regulatory mutations associate with transcriptional and post-transcriptional deregulation of gene regulatory programs in cancers. Nucleic Acids Res 2022; 50:12131-12148. [PMID: 36477895 PMCID: PMC9757053 DOI: 10.1093/nar/gkac1143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 11/03/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022] Open
Abstract
Most cancer alterations occur in the noncoding portion of the human genome, where regulatory regions control gene expression. The discovery of noncoding mutations altering the cells' regulatory programs has been limited to few examples with high recurrence or high functional impact. Here, we show that transcription factor binding sites (TFBSs) have similar mutation loads to those in protein-coding exons. By combining cancer somatic mutations in TFBSs and expression data for protein-coding and miRNA genes, we evaluate the combined effects of transcriptional and post-transcriptional alterations on the regulatory programs in cancers. The analysis of seven TCGA cohorts culminates with the identification of protein-coding and miRNA genes linked to mutations at TFBSs that are associated with a cascading trans-effect deregulation on the cells' regulatory programs. Our analyses of cis-regulatory mutations associated with miRNAs recurrently predict 12 mature miRNAs (derived from 7 precursors) associated with the deregulation of their target gene networks. The predictions are enriched for cancer-associated protein-coding and miRNA genes and highlight cis-regulatory mutations associated with the dysregulation of key pathways associated with carcinogenesis. By combining transcriptional and post-transcriptional regulation of gene expression, our method predicts cis-regulatory mutations related to the dysregulation of key gene regulatory networks in cancer patients.
Collapse
Affiliation(s)
- Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Miriam Ragle Aure
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway,Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Ole Christian Lingjærde
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway,Centre for Bioinformatics, Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway,KG Jebsen Centre for B-cell malignancies, Institute for Clinical Medicine, University of Oslo, Ullernchausseen 70, N-0372 Oslo, Norway
| | - Anita Langerød
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
| | - John W M Martens
- Erasmus MC Cancer Institute and Cancer Genomics Netherlands, University Medical Center Rotterdam, Department of Medical Oncology, 3015GD Rotterdam, The Netherlands
| | - Anne-Lise Børresen-Dale
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
| | - Vessela N Kristensen
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway,Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| | | |
Collapse
|
41
|
He Z, Liu L, Belloy ME, Le Guen Y, Sossin A, Liu X, Qi X, Ma S, Gyawali PK, Wyss-Coray T, Tang H, Sabatti C, Candès E, Greicius MD, Ionita-Laza I. GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nat Commun 2022; 13:7209. [PMID: 36418338 PMCID: PMC9684164 DOI: 10.1038/s41467-022-34932-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 11/09/2022] [Indexed: 11/27/2022] Open
Abstract
Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.
Collapse
Affiliation(s)
- Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA. .,Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
| | - Linxi Liu
- Department of Statistics, University of Pittsburgh, Pittsburgh, PA, 15260, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA.,Institut du Cerveau - Paris Brain Institute - ICM, Paris, 75013, France
| | - Aaron Sossin
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Xinran Qi
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Shiyang Ma
- Department of Biostatistics, Columbia University, New York, NY, 10032, USA
| | - Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Tony Wyss-Coray
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
| | - Chiara Sabatti
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA.,Department of Mathematics, Stanford University, Stanford, CA, 94305, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, 94305, USA
| | | |
Collapse
|
42
|
Lee J, Lee J, Jeon S, Lee J, Jang I, Yang JO, Park S, Lee B, Choi J, Choi BO, Gee HY, Oh J, Jang IJ, Lee S, Baek D, Koh Y, Yoon SS, Kim YJ, Chae JH, Park WY, Bhak JH, Choi M. A database of 5305 healthy Korean individuals reveals genetic and clinical implications for an East Asian population. Exp Mol Med 2022; 54:1862-1871. [PMID: 36323850 PMCID: PMC9628380 DOI: 10.1038/s12276-022-00871-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/21/2022] [Accepted: 08/08/2022] [Indexed: 11/29/2022] Open
Abstract
Despite substantial advances in disease genetics, studies to date have largely focused on individuals of European descent. This limits further discoveries of novel functional genetic variants in other ethnic groups. To alleviate the paucity of East Asian population genome resources, we established the Korean Variant Archive 2 (KOVA 2), which is composed of 1896 whole-genome sequences and 3409 whole-exome sequences from healthy individuals of Korean ethnicity. This is the largest genome database from the ethnic Korean population to date, surpassing the 1909 Korean individuals deposited in gnomAD. The variants in KOVA 2 displayed all the known genetic features of those from previous genome databases, and we compiled data from Korean-specific runs of homozygosity, positively selected intervals, and structural variants. In doing so, we found loci, such as the loci of ADH1A/1B and UHRF1BP1, that are strongly selected in the Korean population relative to other East Asian populations. Our analysis of allele ages revealed a correlation between variant functionality and evolutionary age. The data can be browsed and downloaded from a public website ( https://www.kobic.re.kr/kova/ ). We anticipate that KOVA 2 will serve as a valuable resource for genetic studies involving East Asian populations.
Collapse
Affiliation(s)
- Jeongeun Lee
- grid.31501.360000 0004 0470 5905Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, 03080 Republic of Korea
| | - Jean Lee
- grid.31501.360000 0004 0470 5905Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Sungwon Jeon
- grid.42687.3f0000 0004 0381 814XDepartment of Biomedical Engineering, College of Information and Biotechnology, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919 Republic of Korea
| | - Jeongha Lee
- grid.31501.360000 0004 0470 5905Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Insu Jang
- grid.249967.70000 0004 0636 3099Korea BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141 Republic of Korea
| | - Jin Ok Yang
- grid.249967.70000 0004 0636 3099Korea BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141 Republic of Korea ,grid.37172.300000 0001 2292 0500Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Republic of Korea
| | - Soojin Park
- grid.31501.360000 0004 0470 5905Department of Pediatrics, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Byungwook Lee
- grid.249967.70000 0004 0636 3099Korea BioInformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141 Republic of Korea
| | - Jinwook Choi
- grid.31501.360000 0004 0470 5905Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, 03080 Republic of Korea ,grid.31501.360000 0004 0470 5905Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| | - Byung-Ok Choi
- grid.264381.a0000 0001 2181 989XDepartment of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, 06351 Republic of Korea
| | - Heon Yung Gee
- grid.15444.300000 0004 0470 5454Department of Pharmacology, Brain Korea 21 PLUS Project for Medical Sciences, Yonsei University College of Medicine, Seoul, 03722 Republic of Korea
| | - Jaeseong Oh
- grid.31501.360000 0004 0470 5905Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, Seoul, 03080 Republic of Korea
| | - In-Jin Jang
- grid.31501.360000 0004 0470 5905Department of Clinical Pharmacology and Therapeutics, Seoul National University College of Medicine and Hospital, Seoul, 03080 Republic of Korea
| | - Sanghyuk Lee
- grid.255649.90000 0001 2171 7754Department of Bio-Information Science, Ewha Womans University, Seoul, 03760 Republic of Korea
| | - Daehyun Baek
- grid.31501.360000 0004 0470 5905School of Biological Sciences, Seoul National University, Seoul, 08826 Republic of Korea
| | - Youngil Koh
- grid.412484.f0000 0001 0302 820XDepartment of Internal Medicine, Seoul National University Hospital, Seoul, 03080 Republic of Korea
| | - Sung-Soo Yoon
- grid.412484.f0000 0001 0302 820XDepartment of Internal Medicine, Seoul National University Hospital, Seoul, 03080 Republic of Korea
| | - Young-Joon Kim
- grid.15444.300000 0004 0470 5454Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul, 03722 Republic of Korea
| | - Jong-Hee Chae
- grid.31501.360000 0004 0470 5905Department of Pediatrics, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea ,grid.412484.f0000 0001 0302 820XDepartment of Genomic Medicine, Seoul National University Hospital, Seoul, 03080 Republic of Korea
| | - Woong-Yang Park
- grid.414964.a0000 0001 0640 5613Samsung Genome Institute, Samsung Medical Center, Seoul, 06351 Republic of Korea
| | - Jong Hwa Bhak
- grid.42687.3f0000 0004 0381 814XDepartment of Biomedical Engineering, College of Information and Biotechnology, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919 Republic of Korea
| | - Murim Choi
- grid.31501.360000 0004 0470 5905Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, 03080 Republic of Korea
| |
Collapse
|
43
|
Prediction of Regulatory SNPs in Putative Minor Genes of the Neuro-Cardiovascular Variant in Fabry Reveals Insights into Autophagy/Apoptosis and Fibrosis. BIOLOGY 2022; 11:biology11091287. [PMID: 36138766 PMCID: PMC9495465 DOI: 10.3390/biology11091287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 07/30/2022] [Accepted: 08/03/2022] [Indexed: 11/17/2022]
Abstract
Even though a mutation in monogenic diseases leads to a “classic” manifestation, many disorders exhibit great clinical variability that could be due to modifying genes also called minor genes. Fabry disease (FD) is an X-linked inborn error resulting from the deficient or absent activity of alpha-galactosidase A (α-GAL) enzyme, that leads to deposits of globotriaosylceramide. With our proprietary software SNPclinic v.1.0, we analyzed 110 single nucleotide polymorphisms (SNPs) in the proximal promoter of 14 genes that could modify the FD phenotype FD. We found seven regulatory-SNP (rSNPs) in three genes (IL10, TGFB1 and EDN1) in five cell lines relevant to FD (Cardiac myocytes and fibroblasts, Astrocytes-cerebellar, endothelial cells and T helper cells 1-TH1). Each SNP was confirmed as a true rSNP in public eQTL databases, and additional software suggested the prediction of variants. The two proposed rSNPs in IL10, could explain components for the regulation of active B cells that influence the fibrosis process. The three predicted rSNPs in TGFB1, could act in apoptosis-autophagy regulation. The two putative rSNPs in EDN1, putatively regulate chronic inflammation. The seven rSNPs described here could act to modulate Fabry’s clinical phenotype so we propose that IL10, TGFB1 and EDN1 be considered minor genes in FD.
Collapse
|
44
|
Schipper M, Posthuma D. "Demystifying non-coding GWAS variants: an overview of computational tools and methods.". Hum Mol Genet 2022; 31:R73-R83. [PMID: 35972862 PMCID: PMC9585674 DOI: 10.1093/hmg/ddac198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/11/2022] [Accepted: 08/11/2022] [Indexed: 02/01/2023] Open
Abstract
Genome-wide association studies (GWAS) have found the majority of disease-associated variants to be non-coding. Major efforts into the charting of the non-coding regulatory landscapes have allowed for the development of tools and methods which aim to aid in the identification of causal variants and their mechanism of action. In this review, we give an overview of current tools and methods for the analysis of non-coding GWAS variants in disease. We provide a workflow that allows for the accumulation of in silico evidence to generate novel hypotheses on mechanisms underlying disease and prioritize targets for follow-up study using non-coding GWAS variants. Lastly, we discuss the need for comprehensive benchmarks and novel tools for the analysis of non-coding variants.
Collapse
Affiliation(s)
- Marijn Schipper
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam, De Boelelaan 1105 1081HV Amsterdam, The Netherlands
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam, De Boelelaan 1105 1081HV Amsterdam, The Netherlands
| |
Collapse
|
45
|
An attention-based hybrid deep neural networks for accurate identification of transcription factor binding sites. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07502-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
46
|
scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data. Biomolecules 2022; 12:biom12070874. [PMID: 35883430 PMCID: PMC9312957 DOI: 10.3390/biom12070874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/16/2022] [Accepted: 06/16/2022] [Indexed: 02/04/2023] Open
Abstract
Recent advances in single-cell transposase-accessible chromatin using a sequencing assay (scATAC-seq) allow cellular heterogeneity dissection and regulatory landscape reconstruction with an unprecedented resolution. However, compared to bulk-sequencing, its ultra-high missingness remarkably reduces usable reads in each cell type, resulting in broader, fuzzier peak boundary definitions and limiting our ability to pinpoint functional regions and interpret variant impacts precisely. We propose a weakly supervised learning method, scEpiLock, to directly identify core functional regions from coarse peak labels and quantify variant impacts in a cell-type-specific manner. First, scEpiLock uses a multi-label classifier to predict chromatin accessibility via a deep convolutional neural network. Then, its weakly supervised object detection module further refines the peak boundary definition using gradient-weighted class activation mapping (Grad-CAM). Finally, scEpiLock provides cell-type-specific variant impacts within a given peak region. We applied scEpiLock to various scATAC-seq datasets and found that it achieves an area under receiver operating characteristic curve (AUC) of ~0.9 and an area under precision recall (AUPR) above 0.7. Besides, scEpiLock’s object detection condenses coarse peaks to only ⅓ of their original size while still reporting higher conservation scores. In addition, we applied scEpiLock on brain scATAC-seq data and reported several genome-wide association studies (GWAS) variants disrupting regulatory elements around known risk genes for Alzheimer’s disease, demonstrating its potential to provide cell-type-specific biological insights in disease studies.
Collapse
|
47
|
PTBP2 - a gene with relevance for both Anorexia nervosa and body weight regulation. Transl Psychiatry 2022; 12:241. [PMID: 35680849 PMCID: PMC9184595 DOI: 10.1038/s41398-022-02018-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 05/23/2022] [Accepted: 06/01/2022] [Indexed: 12/14/2022] Open
Abstract
Genetic factors are relevant for both eating disorders and body weight regulation. A recent genome-wide association study (GWAS) for anorexia nervosa (AN) detected eight genome-wide significant chromosomal loci. One of these loci, rs10747478, was also genome-wide and significantly associated with body mass index (BMI). The nearest coding gene is the Polypyrimidine Tract Binding Protein 2 gene (PTBP2). To detect mutations in PTBP2, Sanger sequencing of the coding region was performed in 192 female patients with AN (acute or recovered) and 191 children or adolescents with (extreme) obesity. Twenty-five variants were identified. Twenty-three of these were predicted to be pathogenic or functionally relevant in at least one in silico tool. Two novel synonymous variants (p.Ala77Ala and p.Asp195Asp), one intronic SNP (rs188987764), and the intronic deletion (rs561340981) located in the highly conserved region of PTBP2 may have functional consequences. Ten of 20 genes interacting with PTBP2 were studied for their impact on body weight regulation based on either previous functional studies or GWAS hits for body weight or BMI. In a GWAS for BMI (Pulit et al. 2018), the number of genome-wide significant associations at the PTBP2 locus was different between males (60 variants) and females (two variants, one of these also significant in males). More than 65% of these 61 variants showed differences in the effect size pertaining to BMI between sexes (absolute value of Z-score >2, two-sided p < 0.05). One LD block overlapping 5'UTR and all coding regions of PTBP2 comprises 56 significant variants in males. The analysis based on sex-stratified BMI GWAS summary statistics implies that PTBP2 may have a more pronounced effect on body weight regulation in males than in females.
Collapse
|
48
|
Evaluation of cfDNA as an early detection assay for dense tissue breast cancer. Sci Rep 2022; 12:8458. [PMID: 35589867 PMCID: PMC9120463 DOI: 10.1038/s41598-022-12457-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 05/06/2022] [Indexed: 11/23/2022] Open
Abstract
A cell-free DNA (cfDNA) assay would be a promising approach to early cancer diagnosis, especially for patients with dense tissues. Consistent cfDNA signatures have been observed for many carcinogens. Recently, investigations of cfDNA as a reliable early detection bioassay have presented a powerful opportunity for detecting dense tissue screening complications early. We performed a prospective study to evaluate the potential of characterizing cfDNA as a central element in the early detection of dense tissue breast cancer (BC). Plasma samples were collected from 32 consenting subjects with dense tissue and positive mammograms, 20 with positive biopsies and 12 with negative biopsies. After screening and before biopsy, cfDNA was extracted, and whole-genome next-generation sequencing (NGS) was performed on all samples. Copy number alteration (CNA) and single nucleotide polymorphism (SNP)/insertion/deletion (Indel) analyses were performed to characterize cfDNA. In the positive-positive subjects (cases), a total of 5 CNAs overlapped with 5 previously
reported BC-related oncogenes (KSR2, MAP2K4, MSI2, CANT1 and MSI2). In addition, 1 SNP was detected in KMT2C, a BC oncogene, and 9 others were detected in or near 10 genes (SERAC1, DAGLB, MACF1, NVL, FBXW4, FANK1, KCTD4, CAVIN1; ATP6V0A1 and ZBTB20-AS1) previously associated with non-BC cancers. For the positive–negative subjects (screening), 3 CNAs were detected in BC genes (ACVR2A, CUL3 and PIK3R1), and 5 SNPs were identified in 6 non-BC cancer genes (SNIP1, TBC1D10B, PANK1, PRKCA and RUNX2; SUPT3H). This study presents evidence of the potential of using cfDNA somatic variants as dense tissue BC biomarkers from a noninvasive liquid bioassay for early cancer detection.
Collapse
|
49
|
Giovannetti A, Bianco SD, Traversa A, Panzironi N, Bruselles A, Lazzari S, Liorni N, Tartaglia M, Carella M, Pizzuti A, Mazza T, Caputo V. MiRLog and dbmiR: prioritization and functional annotation tools to study human microRNA sequence variants. Hum Mutat 2022; 43:1201-1215. [PMID: 35583122 PMCID: PMC9546175 DOI: 10.1002/humu.24399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 05/03/2022] [Accepted: 05/11/2022] [Indexed: 11/22/2022]
Abstract
The recent identification of noncoding variants with pathogenic effects suggests that these variations could underlie a significant number of undiagnosed cases. Several computational methods have been developed to predict the functional impact of noncoding variants, but they exhibit only partial concordance and are not integrated with functional annotation resources, making the interpretation of these variants still challenging. MicroRNAs (miRNAs) are small noncoding RNA molecules that act as fine regulators of gene expression and play crucial functions in several biological processes, such as cell proliferation and differentiation. An increasing number of studies demonstrate a significant impact of miRNA single nucleotide variants (SNVs) both in Mendelian diseases and complex traits. To predict the functional effect of miRNA SNVs, we implemented a new meta‐predictor, MiRLog, and we integrated it into a comprehensive database, dbmiR, which includes a precompiled list of all possible miRNA allelic SNVs, providing their biological annotations at nucleotide and miRNA levels. MiRLog and dbmiR were used to explore the genetic variability of miRNAs in 15,708 human genomes included in the gnomAD project, finding several ultra‐rare SNVs with a potentially deleterious effect on miRNA biogenesis and function representing putative contributors to human phenotypes.
Collapse
Affiliation(s)
- Agnese Giovannetti
- Laboratory of Clinical Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Salvatore Daniele Bianco
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy.,Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Alice Traversa
- Laboratory of Clinical Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Noemi Panzironi
- Laboratory of Clinical Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Alessandro Bruselles
- Department of Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome, Italy
| | - Sara Lazzari
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Niccolò Liorni
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy.,Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Marco Tartaglia
- Genetics and Rare Diseases Research Division, Ospedale Pediatrico Bambino Gesù, IRCCS, Rome, Italy
| | - Massimo Carella
- Medical Genetics Unit, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Antonio Pizzuti
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Tommaso Mazza
- Unit of Bioinformatics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo (FG), Italy
| | - Viviana Caputo
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
50
|
Yang M, Huang L, Huang H, Tang H, Zhang N, Yang H, Wu J, Mu F. Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution. Nucleic Acids Res 2022; 50:e81. [PMID: 35536244 PMCID: PMC9371931 DOI: 10.1093/nar/gkac326] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 02/22/2022] [Accepted: 05/09/2022] [Indexed: 12/12/2022] Open
Abstract
Interpretation of non-coding genome remains an unsolved challenge in human genetics due to impracticality of exhaustively annotating biochemically active elements in all conditions. Deep learning based computational approaches emerge recently to help interpret non-coding regions. Here, we present LOGO (Language of Genome), a self-attention based contextualized pre-trained language model containing only two self-attention layers with 1 million parameters as a substantially light architecture that applies self-supervision techniques to learn bidirectional representations of the unlabelled human reference genome. LOGO is then fine-tuned for sequence labelling task, and further extended to variant prioritization task via a special input encoding scheme of alternative alleles followed by adding a convolutional module. Experiments show that LOGO achieves 15% absolute improvement for promoter identification and up to 4.5% absolute improvement for enhancer-promoter interaction prediction. LOGO exhibits state-of-the-art multi-task predictive power on thousands of chromatin features with only 3% parameterization benchmarking against the fully supervised model, DeepSEA and 1% parameterization against a recent BERT-based DNA language model. For allelic-effect prediction, locality introduced by one dimensional convolution shows improved sensitivity and specificity for prioritizing non-coding variants associated with human diseases. In addition, we apply LOGO to interpret type 2 diabetes (T2D) GWAS signals and infer underlying regulatory mechanisms. We make a conceptual analogy between natural language and human genome and demonstrate LOGO is an accurate, fast, scalable, and robust framework to interpret non-coding regions for global sequence labeling as well as for variant prioritization at base-resolution.
Collapse
Affiliation(s)
- Meng Yang
- MGI, BGI-Shenzhen, Shenzhen 518083, China.,Department of Biology, University of Copenhagen, Copenhagen DK-2200, Denmark
| | | | | | - Hui Tang
- MGI, BGI-Shenzhen, Shenzhen 518083, China
| | - Nan Zhang
- MGI, BGI-Shenzhen, Shenzhen 518083, China
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen 518083, China.,Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen, 518120, China
| | - Jihong Wu
- Department of Ophthalmology, Eye & ENT Hospital, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Key Laboratory of Visual Impairment and Restoration, Science and Technology Commission of Shanghai Municipality, Shanghai, China.,Key Laboratory of Myopia (Fudan University), Chinese Academy of Medical Sciences, National Health Commission, Shanghai, China
| | - Feng Mu
- MGI, BGI-Shenzhen, Shenzhen 518083, China
| |
Collapse
|