1
|
Briones MRS, Campos JH, Ferreira RC, Schneper L, Santos IM, Antoneli FM, Broach JR. Mitochondrial genome variants associated with amyotrophic lateral sclerosis and their haplogroup distribution. Muscle Nerve 2024. [PMID: 39126144 DOI: 10.1002/mus.28230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/12/2024]
Abstract
INTRODUCTION/AIMS Amyotrophic lateral sclerosis (ALS) may be familial or sporadic, and twin studies have revealed that even sporadic forms have a significant genetic component. Variants in 55 nuclear genes have been associated with ALS and although mitochondrial dysfunction is observed in ALS, variants in mitochondrial genomes (mitogenomes) have not yet been tested for association with ALS. The aim of this study was to determine whether mitogenome variants are associated with ALS. METHODS We conducted a genome-wide association study (GWAS) in mitogenomes of 1965 ALS patients and 2547 controls. RESULTS We identified 51 mitogenome variants with p values <10-7, of which 13 had odds ratios (ORs) >1, in genes RNR1, ND1, CO1, CO3, ND5, ND6, and CYB, while 38 variants had OR <1 in genes RNR1, RNA2, ND1, ND2, CO2, ATP8, ATP6, CO3, ND3, ND4, ND5, ND6, and CYB. The frequencies of haplogroups H, U, and L, the most frequent in our ALS data set, were the same in different onset sites (bulbar, limb, spinal, and axial). Also, intra-haplogroup GWAS revealed unique ALS-associated variants in haplogroups L and U. DISCUSSION Our study shows that mitogenome single nucleotide variants (SNVs) are associated with ALS and suggests that these SNVs could be included in routine genetic testing for ALS and that mitochondrial replacement therapy has the potential to serve as a basis for ALS treatment.
Collapse
Affiliation(s)
- Marcelo R S Briones
- Center for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, São Paulo, Brazil
| | - João H Campos
- Graduate Program in Microbiology and Immunology, Federal University of São Paulo, São Paulo, São Paulo, Brazil
| | - Renata C Ferreira
- Department of Neurology and Neurosurgery, Federal University of São Paulo, São Paulo, São Paulo, Brazil
- Bridges Genomics, M.E., São Paulo, São Paulo, Brazil
| | - Lisa Schneper
- Department of Biochemistry, Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| | - Ilda M Santos
- Graduate Program in Microbiology and Immunology, Federal University of São Paulo, São Paulo, São Paulo, Brazil
| | - Fernando M Antoneli
- Center for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo, São Paulo, Brazil
| | - James R Broach
- Department of Biochemistry, Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, Pennsylvania, USA
| |
Collapse
|
2
|
Wang Z, Li S, Cai G, Gao Y, Yang H, Li Y, Liang J, Zhang S, Hu J, Zheng J. Mendelian randomization analysis identifies druggable genes and drugs repurposing for chronic obstructive pulmonary disease. Front Cell Infect Microbiol 2024; 14:1386506. [PMID: 38660492 PMCID: PMC11039854 DOI: 10.3389/fcimb.2024.1386506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 03/22/2024] [Indexed: 04/26/2024] Open
Abstract
Background Chronic obstructive pulmonary disease (COPD) is a prevalent condition that significantly impacts public health. Unfortunately, there are few effective treatment options available. Mendelian randomization (MR) has been utilized to repurpose existing drugs and identify new therapeutic targets. The objective of this study is to identify novel therapeutic targets for COPD. Methods Cis-expression quantitative trait loci (cis-eQTL) were extracted for 4,317 identified druggable genes from genomics and proteomics data of whole blood (eQTLGen) and lung tissue (GTEx Consortium). Genome-wide association studies (GWAS) data for doctor-diagnosed COPD, spirometry-defined COPD (Forced Expiratory Volume in one second [FEV1]/Forced Vital Capacity [FVC] <0.7), and FEV1 were obtained from the cohort of FinnGen, UK Biobank and SpiroMeta consortium. We employed Summary-data-based Mendelian Randomization (SMR), HEIDI test, and colocalization analysis to assess the causal effects of druggable gene expression on COPD and lung function. The reliability of these druggable genes was confirmed by eQTL two-sample MR and protein quantitative trait loci (pQTL) SMR, respectively. The potential effects of druggable genes were assessed through the phenome-wide association study (PheWAS). Information on drug repurposing for COPD was collected from multiple databases. Results A total of 31 potential druggable genes associated with doctor-diagnosed COPD, spirometry-defined COPD, and FEV1 were identified through SMR, HEIDI test, and colocalization analysis. Among them, 22 genes (e.g., MMP15, PSMA4, ERBB3, and LMCD1) were further confirmed by eQTL two-sample MR and protein SMR analyses. Gene-level PheWAS revealed that ERBB3 expression might reduce inflammation, while GP9 and MRC2 were associated with other traits. The drugs Montelukast (targeting the MMP15 gene) and MARIZOMIB (targeting the PSMA4 gene) may reduce the risk of spirometry-defined COPD. Additionally, an existing small molecule inhibitor of the APH1A gene has the potential to increase FEV1. Conclusions Our findings identified 22 potential drug targets for COPD and lung function. Prioritizing clinical trials that target these identified druggable genes with existing drugs or novel medications will be beneficial for the development of COPD treatments.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Jinping Zheng
- National Center for Respiratory Medicine, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
3
|
Deng MG, Liu F, Wang K, Liang Y, Nie JQ, Chai C. Genetic association between coffee/caffeine consumption and the risk of obstructive sleep apnea in the European population: a two-sample Mendelian randomization study. Eur J Nutr 2023; 62:3423-3431. [PMID: 37668652 DOI: 10.1007/s00394-023-03239-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 08/16/2023] [Indexed: 09/06/2023]
Abstract
BACKGROUND The association between coffee/caffeine consumption and obstructive sleep apnea (OSA) risk remains unclear. PURPOSE To determine the relationship between coffee/caffeine consumption and the risk of OSA, using the Mendelian randomization (MR) method in the European population. METHODS Two sets of coffee consumption-associated genetic variants were, respectively, extracted from the recent genome-wide meta-analysis (GWMA) and genome-wide association study (GWAS) of coffee consumption. Taking other caffeine sources into account, genetic variants associated with caffeine consumption from tea and plasma caffeine (reflecting total caffeine intake) were also obtained. The inverse variance weighted (IVW) technique was utilized as the primary analysis, supplemented by the MR-Egger, weighted-median, and MR-Pleiotropy RESidual Sum and Outlier (PRESSO) techniques. Leave-one-out (LOO) analysis was performed to assess whether the overall casual estimates were driven by a single SNP. Additional sensitivity analyses were performed using similar methods, while the genetic variants associated with confounders, e.g., body mass index and hypertension, were excluded. RESULTS The IVW method demonstrated that coffee consumption GWMA (OR: 1.065, 95% CI 0.927-1.224, p = 0.376), coffee consumption GWAS (OR: 1.665, 95% CI 0.932-2.977, p = 0.086), caffeine from tea (OR: 1.198, 95% CI 0.936-1.534, p = 0.151), and blood caffeine levels (OR: 1.054, 95% CI 0.902-1.231, p = 0.508) were unlikely to be associated with the risk of OSA. The other three methods presented similar results, where no significant associations were found. No single genetic variant was driving the overall estimates by the LOO analysis. These findings were also supported by the sensitivity analyses with no confounding genetic variants. CONCLUSION Our study found no association between coffee/caffeine consumption and the risk of OSA.
Collapse
Affiliation(s)
- Ming-Gang Deng
- Department of Psychiatry, Wuhan Mental Health Center, Wuhan, 430012, Hubei, China.
- Department of Psychiatry, Wuhan Hospital for Psychotherapy, Wuhan, 430012, Hubei, China.
| | - Fang Liu
- School of Public Health, Wuhan University, Wuhan, 430071, Hubei, China
| | - Kai Wang
- Department of Public Health, Wuhan Fourth Hospital, Wuhan, 430033, Hubei, China
| | - Yuehui Liang
- School of Public Health, Wuhan University, Wuhan, 430071, Hubei, China
| | - Jia-Qi Nie
- Xiaogan Center for Disease Control and Prevention, Xiaogan, 432000, Huebi, China
| | - Chen Chai
- Emergency Center, Hubei Clinical Research Center for Emergency and Resuscitation, Zhongnan Hospital of Wuhan University, Wuhan, 430071, Hubei, China
| |
Collapse
|
4
|
Ng JK, Vats P, Fritz-Waters E, Sarkar S, Sams EI, Padhi EM, Payne ZL, Leonard S, West MA, Prince C, Trani L, Jansen M, Vacek G, Samadi M, Harkins TT, Pohl C, Turner TN. de novo variant calling identifies cancer mutation signatures in the 1000 Genomes Project. Hum Mutat 2022; 43:1979-1993. [PMID: 36054329 PMCID: PMC9771978 DOI: 10.1002/humu.24455] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 07/22/2022] [Accepted: 08/29/2022] [Indexed: 01/25/2023]
Abstract
Detection of de novo variants (DNVs) is critical for studies of disease-related variation and mutation rates. To accelerate DNV calling, we developed a graphics processing units-based workflow. We applied our workflow to whole-genome sequencing data from three parent-child sequenced cohorts including the Simons Simplex Collection (SSC), Simons Foundation Powering Autism Research (SPARK), and the 1000 Genomes Project (1000G) that were sequenced using DNA from blood, saliva, and lymphoblastoid cell lines (LCLs), respectively. The SSC and SPARK DNV callsets were within expectations for number of DNVs, percent at CpG sites, phasing to the paternal chromosome of origin, and average allele balance. However, the 1000G DNV callset was not within expectations and contained excessive DNVs that are likely cell line artifacts. Mutation signature analysis revealed 30% of 1000G DNV signatures matched B-cell lymphoma. Furthermore, we found variants in DNA repair genes and at Clinvar pathogenic or likely-pathogenic sites and significant excess of protein-coding DNVs in IGLL5; a gene known to be involved in B-cell lymphomas. Our study provides a new rapid DNV caller for the field and elucidates important implications of using sequencing data from LCLs for reference building and disease-related projects.
Collapse
Affiliation(s)
- Jeffrey K. Ng
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | - Elyn Fritz-Waters
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Stephanie Sarkar
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Eleanor I. Sams
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Evin M. Padhi
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Zachary L. Payne
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Shawn Leonard
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Marc A. West
- NVIDIA Corporation, Santa Clara, California, USA
| | - Chandler Prince
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Lee Trani
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Marshall Jansen
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - George Vacek
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Craig Pohl
- Research Infrastructure Services, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Tychele N. Turner
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
5
|
Song M, Greenbaum J, Luttrell J, Zhou W, Wu C, Luo Z, Qiu C, Zhao LJ, Su KJ, Tian Q, Shen H, Hong H, Gong P, Shi X, Deng HW, Zhang C. An autoencoder-based deep learning method for genotype imputation. Front Artif Intell 2022; 5:1028978. [PMID: 36406474 PMCID: PMC9671213 DOI: 10.3389/frai.2022.1028978] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 09/29/2022] [Indexed: 11/06/2022] Open
Abstract
Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses.
Collapse
Affiliation(s)
- Meng Song
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Jonathan Greenbaum
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Joseph Luttrell
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States
| | - Weihua Zhou
- College of Computing, Michigan Technological University, Houghton, MI, United States
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Zhe Luo
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Chuan Qiu
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Lan Juan Zhao
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Kuan-Jui Su
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Qing Tian
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Hui Shen
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, United States
| | - Ping Gong
- Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS, United States
| | - Xinghua Shi
- Department of Computer & Information Sciences, Temple University, Philadelphia, PA, United States
| | - Hong-Wen Deng
- Tulane Center of Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA, United States,*Correspondence: Hong-Wen Deng
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, United States,Chaoyang Zhang
| |
Collapse
|
6
|
Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K, Fairley S, Runnels A, Winterkorn L, Lowy E, Paul Flicek, Germer S, Brand H, Hall IM, Talkowski ME, Narzisi G, Zody MC. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 2022; 185:3426-3440.e19. [PMID: 36055201 PMCID: PMC9439720 DOI: 10.1016/j.cell.2022.08.004] [Citation(s) in RCA: 265] [Impact Index Per Article: 132.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 06/21/2022] [Accepted: 08/03/2022] [Indexed: 01/05/2023]
Abstract
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
Collapse
Affiliation(s)
| | | | - Xuefang Zhao
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | | | - Haley J Abel
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | - Wayne E Clarke
- New York Genome Center, New York, NY 10013, USA; Outlier Informatics Inc., Saskatoon, SK S7H 1L4, Canada
| | | | | | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | - Ernesto Lowy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | - Harrison Brand
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA; Center for Genomic Health, Yale University School of Medicine, New Haven, CT 06510, USA; Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | | |
Collapse
|
7
|
Halabian R, Makałowski W. A Map of 3' DNA Transduction Variants Mediated by Non-LTR Retroelements on 3202 Human Genomes. BIOLOGY 2022; 11:1032. [PMID: 36101413 PMCID: PMC9311842 DOI: 10.3390/biology11071032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 05/03/2023]
Abstract
As one of the major structural constituents, mobile elements comprise more than half of the human genome, among which Alu, L1, and SVA elements are still active and continue to generate new offspring. One of the major characteristics of L1 and SVA elements is their ability to co-mobilize adjacent downstream sequences to new loci in a process called 3' DNA transduction. Transductions influence the structure and content of the genome in different ways, such as increasing genome variation, exon shuffling, and gene duplication. Moreover, given their mutagenicity capability, 3' transductions are often involved in tumorigenesis or in the development of some diseases. In this study, we analyzed 3202 genomes sequenced at high coverage by the New York Genome Center to catalog and characterize putative 3' transduced segments mediated by L1s and SVAs. Here, we present a genome-wide map of inter/intrachromosomal 3' transduction variants, including their genomic and functional location, length, progenitor location, and allelic frequency across 26 populations. In total, we identified 7103 polymorphic L1s and 3040 polymorphic SVAs. Of these, 268 and 162 variants were annotated as high-confidence L1 and SVA 3' transductions, respectively, with lengths that ranged from 7 to 997 nucleotides. We found specific loci within chromosomes X, 6, 7, and 6_GL000253v2_alt as master L1s and SVAs that had yielded more transductions, among others. Together, our results demonstrate the dynamic nature of transduction events within the genome and among individuals and their contribution to the structural variations of the human genome.
Collapse
Affiliation(s)
| | - Wojciech Makałowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, 48149 Münster, Germany;
| |
Collapse
|
8
|
Lowy E, Fairley S, Flicek P. Variant calling across 505 openly consented samples from four Gambian populations on GRCh38. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.17001.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The International Genome Sample Resource (IGSR) repository was established to maximise the utility of human genetic data derived from openly consented samples within the research community. Here we describe variant detection in 505 samples from four populations in The Gambia, using the GRCh38 reference genome, adding to the range of populations for which this has been done and, importantly, making allele frequencies available. A multi-caller site discovery process was applied along with imputation and phasing to produce a phased biallelic single nucleotide variant (SNV) and insertion/deletion (INDEL) call set. Variation had not previously been explored on the GRCh38 human genome assembly for 387 of the samples. Compared to our previous work with the 1000 Genomes Project data on GRCh38, we identified over nine million novel SNVs and over 870 thousand novel INDELs.
Collapse
|
9
|
Charon C, Allodji R, Meyer V, Deleuze JF. Impact of pre- and post-variant filtration strategies on imputation. Sci Rep 2021; 11:6214. [PMID: 33737531 PMCID: PMC7973508 DOI: 10.1038/s41598-021-85333-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 02/22/2021] [Indexed: 01/04/2023] Open
Abstract
Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, however they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1089 NCBI recorded individuals for additional validation. Without QC-based variant pre-filtration, we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). Thus, to maintain confidence and enough SNVs, we propose here a two-step filtering procedure which allows less stringent filtering prior to imputation and post-imputation in order to increase the number of very rare and rare variants compared to conservative filtration methods.
Collapse
Affiliation(s)
- Céline Charon
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France.
| | - Rodrigue Allodji
- Radiation Epidemiology Group CESP, Inserm Unit 1018, Gustave Roussy Université Paris Saclay, 114 rue Edouard Vaillant, Villejuif, 94805, France
| | - Vincent Meyer
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| | - Jean-François Deleuze
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| |
Collapse
|
10
|
Sun J, Zhang Y, Wang M, Guan Q, Yang X, Ou JX, Yan M, Wang C, Zhang Y, Li ZH, Lan C, Mao C, Zhou HW, Hao B, Zhang Z. The Biological Significance of Multi-copy Regions and Their Impact on Variant Discovery. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:516-524. [PMID: 32827758 PMCID: PMC8377240 DOI: 10.1016/j.gpb.2019.05.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 05/07/2019] [Accepted: 06/06/2019] [Indexed: 11/23/2022]
Abstract
Identification of genetic variants via high-throughput sequencing (HTS) technologies has been essential for both fundamental and clinical studies. However, to what extent the genome sequence composition affects variant calling remains unclear. In this study, we identified 63,897 multi-copy sequences (MCSs) with a minimum length of 300 bp, each of which occurs at least twice in the human genome. The 151,749 genomic loci (multi-copy regions, or MCRs) harboring these MCSs account for 1.98% of the genome and are distributed unevenly across chromosomes. MCRs containing the same MCS tend to be located on the same chromosome. Gene Ontology (GO) analyses revealed that 3800 genes whose UTRs or exons overlap with MCRs are enriched for Golgi-related cellular component terms and various enzymatic activities in the GO biological function category. MCRs are also enriched for loci that are sensitive to neocarzinostatin-induced double-strand breaks. Moreover, genetic variants discovered by genome-wide association studies and recorded in dbSNP are significantly underrepresented in MCRs. Using simulated HTS datasets, we show that false variant discovery rates are significantly higher in MCRs than in other genomic regions. These results suggest that extra caution must be taken when identifying genetic variants in the MCRs via HTS technologies.
Collapse
Affiliation(s)
- Jing Sun
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China; Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China; Center for Precision Medicine, Shunde Hospital of Southern Medical University, Foshan 528399, China
| | - Yanfang Zhang
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China; Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China
| | - Minhui Wang
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Qian Guan
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China
| | - Xiujia Yang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China
| | - Jin Xia Ou
- Microbiome Medicine Center, Division of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510282, China
| | - Mingchen Yan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Chengrui Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Yan Zhang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Zhi-Hao Li
- Division of Epidemiology, School of Public Health, Southern Medical University, Guangzhou 510515, China
| | - Chunhong Lan
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China; Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China; Center for Precision Medicine, Shunde Hospital of Southern Medical University, Foshan 528399, China
| | - Chen Mao
- Division of Epidemiology, School of Public Health, Southern Medical University, Guangzhou 510515, China
| | - Hong-Wei Zhou
- Microbiome Medicine Center, Division of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510282, China
| | - Bingtao Hao
- Center for Precision Medicine, Shunde Hospital of Southern Medical University, Foshan 528399, China.
| | - Zhenhai Zhang
- State Key Laboratory of Organ Failure Research, National Clinical Research Center for Kidney Disease, Division of Nephrology, Nanfang Hospital, Southern Medical University, Guangzhou 510515, China; Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Mental Health of the Ministry of Education, Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Southern Medical University, Guangzhou 510515, China; Center for Precision Medicine, Shunde Hospital of Southern Medical University, Foshan 528399, China.
| |
Collapse
|
11
|
Swart Y, van Eeden G, Sparks A, Uren C, Möller M. Prospective avenues for human population genomics and disease mapping in southern Africa. Mol Genet Genomics 2020; 295:1079-1089. [PMID: 32440765 PMCID: PMC7240165 DOI: 10.1007/s00438-020-01684-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 05/06/2020] [Indexed: 12/22/2022]
Abstract
Population substructure within human populations is globally evident and a well-known confounding factor in many genetic studies. In contrast, admixture mapping exploits population stratification to detect genotype-phenotype correlations in admixed populations. Southern Africa has untapped potential for disease mapping of ancestry-specific disease risk alleles due to the distinct genetic diversity in its populations compared to other populations worldwide. This diversity contributes to a number of phenotypes, including ancestry-specific disease risk and response to pathogens. Although the 1000 Genomes Project significantly improved our understanding of genetic variation globally, southern African populations are still severely underrepresented in biomedical and human genetic studies due to insufficient large-scale publicly available data. In addition to a lack of genetic data in public repositories, existing software, algorithms and resources used for imputation and phasing of genotypic data (amongst others) are largely ineffective for populations with a complex genetic architecture such as that seen in southern Africa. This review article, therefore, aims to summarise the current limitations of conducting genetic studies on populations with a complex genetic architecture to identify potential areas for further research and development.
Collapse
Affiliation(s)
- Yolandi Swart
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Gerald van Eeden
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Anel Sparks
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Caitlin Uren
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Marlo Möller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa.
| |
Collapse
|
12
|
Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res 2020; 48:D941-D947. [PMID: 31584097 PMCID: PMC6943028 DOI: 10.1093/nar/gkz836] [Citation(s) in RCA: 185] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 09/06/2019] [Accepted: 09/30/2019] [Indexed: 12/21/2022] Open
Abstract
To sustain and develop the largest fully open human genomic resources the International Genome Sample Resource (IGSR) (https://www.internationalgenome.org) was established. It is built on the foundation of the 1000 Genomes Project, which created the largest openly accessible catalogue of human genomic variation developed from samples spanning five continents. IGSR (i) maintains access to 1000 Genomes Project resources, (ii) updates 1000 Genomes Project resources to the GRCh38 human reference assembly, (iii) adds new data generated on 1000 Genomes Project cell lines, (iv) shares data from samples with a similarly open consent to increase the number of samples and populations represented in the resources and (v) provides support to users of these resources. Among recent updates are the release of variation calls from 1000 Genomes Project data calculated directly on GRCh38 and the addition of high coverage sequence data for the 2504 samples in the 1000 Genomes Project phase three panel. The data portal, which facilitates web-based exploration of the IGSR resources, has been updated to include samples which were not part of the 1000 Genomes Project and now presents a unified view of data and samples across almost 5000 samples from multiple studies. All data is fully open and publicly accessible.
Collapse
Affiliation(s)
- Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ernesto Lowy-Gallego
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
13
|
Hong JY, Kim JH. PG-path: Modeling and personalizing pharmacogenomics-based pathways. PLoS One 2020; 15:e0230950. [PMID: 32365122 PMCID: PMC7197763 DOI: 10.1371/journal.pone.0230950] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 03/11/2020] [Indexed: 11/18/2022] Open
Abstract
A pharmacogenomics-based pathway represents a series of reactions that occur between drugs and genes in the human body after drug administration. PG-path is a pharmacogenomics-based pathway that standardizes and visualizes the components (nodes) and actions (edges) involved in pharmacokinetic and pharmacodynamic processes. It provides an intuitive understanding of the drug response in the human body. A pharmacokinetic pathway visualizes the absorption, distribution, metabolism, and excretion (ADME) at the systemic level, and a pharmacodynamic pathway shows the action of the drug in the target cell at the cellular-molecular level. The genes in the pathway are displayed in locations similar to those inside the body. PG-path allows personalized pathways to be created by annotating each gene with the overall impact degree of deleterious variants in the gene. These personalized pathways play a role in assisting tailored individual prescriptions by predicting changes in the drug concentration in the plasma. PG-path also supports counseling for personalized drug therapy by providing visualization and documentation.
Collapse
Affiliation(s)
- Joo Young Hong
- Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea
- Cipherome. Inc., Seoul, Korea
| | - Ju Han Kim
- Division of Biomedical Informatics, Seoul National University College of Medicine, Seoul, Korea
- Division of Biomedical Informatics, Seoul National University Biomedical Informatics (SNUBI), Seoul National University College of Medicine, Seoul, Korea
| |
Collapse
|
14
|
Liu Y, Yan C, Yin Z, Wan Z, Xia W, Kantarcioglu M, Vorobeychik Y, Clayton EW, Malin BA. Biomedical Research Cohort Membership Disclosure on Social Media. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:607-616. [PMID: 32308855 PMCID: PMC7153128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
To accelerate medical knowledge discovery, an increasing number of research programs are gathering and sharing data on a large number of participants. Due to the privacy concerns and legal restrictions on data sharing, these programs apply various strategies to mitigate privacy risk. However, the activities of participants and research program sponsors, particularly on social media, might reveal an individual's membership in a study, making it easier to recognize participants' records and uncover the information they have yet to disclose. This behavior can jeopardize the privacy of the participants themselves, the reputation of the projects, sponsors, and the research enterprise. To investigate the dangers of self-disclosure behavior, we gathered and analyzed 4,020 tweets, and uncovered over 100 tweets disclosing the individuals' memberships in over 15 programs. Our investigation showed that self-disclosure on social media can reveal participants' membership in research cohorts, and such activity might lead to the leakage of a person's identity, genomic, and other sensitive health information.
Collapse
Affiliation(s)
| | - Chao Yan
- Vanderbilt University, Nashville, TN
| | | | - Zhiyu Wan
- Vanderbilt University, Nashville, TN
| | - Weiyi Xia
- Vanderbilt University, Nashville, TN
| | | | | | | | - Bradley A Malin
- Vanderbilt University, Nashville, TN
- Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
15
|
Lowy-Gallego E, Fairley S, Zheng-Bradley X, Ruffier M, Clarke L, Flicek P. Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project. Wellcome Open Res 2019; 4:50. [PMID: 32175479 PMCID: PMC7059836 DOI: 10.12688/wellcomeopenres.15126.2] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/17/2019] [Indexed: 12/20/2022] Open
Abstract
We present a set of biallelic SNVs and INDELs, from 2,548 samples spanning 26 populations from the 1000 Genomes Project, called
de novo on GRCh38. We believe this will be a useful reference resource for those using GRCh38. It represents an improvement over the “lift-overs” of the 1000 Genomes Project data that have been available to date by encompassing all of the GRCh38 primary assembly autosomes and pseudo-autosomal regions, including novel, medically relevant loci. Here, we describe how the data set was created and benchmark our call set against that produced by the final phase of the 1000 Genomes Project on GRCh37 and the lift-over of that data to GRCh38.
Collapse
Affiliation(s)
- Ernesto Lowy-Gallego
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Xiangqun Zheng-Bradley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | | |
Collapse
|
16
|
Zanfardino M, Franzese M, Pane K, Cavaliere C, Monti S, Esposito G, Salvatore M, Aiello M. Bringing radiomics into a multi-omics framework for a comprehensive genotype-phenotype characterization of oncological diseases. J Transl Med 2019; 17:337. [PMID: 31590671 PMCID: PMC6778975 DOI: 10.1186/s12967-019-2073-2] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 09/18/2019] [Indexed: 02/07/2023] Open
Abstract
Genomic and radiomic data integration, namely radiogenomics, can provide meaningful knowledge in cancer diagnosis, prognosis and treatment. Despite several data structures based on multi-layer architecture proposed to combine multi-omic biological information, none of these has been designed and assessed to include radiomic data as well. To meet this need, we propose to use the MultiAssayExperiment (MAE), an R package that provides data structures and methods for manipulating and integrating multi-assay experiments, as a suitable tool to manage radiogenomic experiment data. To this aim, we first examine the role of radiogenomics in cancer phenotype definition, then the current state of radiogenomics data integration in public repository and, finally, challenges and limitations of including radiomics in MAE, designing an extended framework and showing its application on a case study from the TCGA-TCIA archives. Radiomic and genomic data from 91 patients have been successfully integrated in a single MAE object, demonstrating the suitability of the MAE data structure as container of radiogenomic data.
Collapse
|
17
|
Lowy-Gallego E, Fairley S, Zheng-Bradley X, Ruffier M, Clarke L, Flicek P. Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project. Wellcome Open Res 2019; 4:50. [DOI: 10.12688/wellcomeopenres.15126.1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2019] [Indexed: 01/07/2023] Open
Abstract
We present biallelic SNVs called from 2,548 samples across 26 populations from the 1000 Genomes Project, called directly on GRCh38. We believe this will be a useful reference resource for those using GRCh38, representing an improvement over the “lift-overs” of the 1000 Genomes Project data that have been available to date and providing a resource necessary for the full adoption of GRCh38 by the community. Here, we describe how the call set was created and provide benchmarking data describing how our call set compares to that produced by the final phase of the 1000 Genomes Project on GRCh37.
Collapse
|
18
|
Wang X, Liu A, Lu Y, Hu Q. Novel compound heterozygous mutations in the SPTA1 gene, causing hereditary spherocytosis in a neonate with Coombs‑negative hemolytic jaundice. Mol Med Rep 2019; 19:2801-2807. [PMID: 30816434 PMCID: PMC6423610 DOI: 10.3892/mmr.2019.9947] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Accepted: 02/06/2019] [Indexed: 12/31/2022] Open
Abstract
Hereditary spherocytosis (HS) is a common heterogeneous type of inherited hemolytic anemia characterized by jaundice and splenomegaly. Diagnosis of HS in neonates is considered unreliable, and is generally based on positive family history, spherocytes in peripheral smears, increased osmotic fragility, and jaundice. In the present study, routine laboratory tests, next‑generation sequencing, and Sanger sequencing were applied to diagnose a neonatal patient with Coombs‑negative hemolytic jaundice. The neonate had no family history of HS; however, spherocytes were observed in peripheral smears, and the patient exhibited Coombs‑negative and severe hemolytic jaundice, normal mean corpuscular hemoglobin concentration (MCHC) and mean corpuscular volume (MCV), normal glucose‑6‑phosphate dehydrogenase activity, negative thalassemia genetic mutation screening results, and negative autoimmune antibody tests. Novel compound heterozygous mutations in the spectrin‑α, erythrocytic 1 (SPTA1) gene (c.3897‑1G>C and c.5029G>A) were identified. The SPTA1 c.3897‑1G>C mutation in intron 27‑1, which disrupted the consensus splice site, was inherited from his asymptomatic mother, and the SPTA1 c.5029G>A (p.Gly1677Arg) mutation in trans with the SPTA1 c.3897‑1G>C mutation was inherited from his asymptomatic father. Sanger sequencing of mRNA reverse transcribed into cDNA identified a deletion of the first 10 nucleotides of exon 28, confirming the splicing mutation. In conclusion, the present study reports a rare case of autosomal‑recessive HS with a severe clinical phenotype, but normal MCHC and MCV.
Collapse
Affiliation(s)
- Xiong Wang
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Aiguo Liu
- Department of Pediatrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Yanjun Lu
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| | - Qun Hu
- Department of Pediatrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, P.R. China
| |
Collapse
|
19
|
Rivandi M, Martens JWM, Hollestelle A. Elucidating the Underlying Functional Mechanisms of Breast Cancer Susceptibility Through Post-GWAS Analyses. Front Genet 2018; 9:280. [PMID: 30116257 PMCID: PMC6082943 DOI: 10.3389/fgene.2018.00280] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/09/2018] [Indexed: 12/12/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified more than 170 single nucleotide polymorphisms (SNPs) associated with the susceptibility to breast cancer. Together, these SNPs explain 18% of the familial relative risk, which is estimated to be nearly half of the total familial breast cancer risk that is collectively explained by low-risk susceptibility alleles. An important aspect of this success has been the access to large sample sizes through collaborative efforts within the Breast Cancer Association Consortium (BCAC), but also collaborations between cancer association consortia. Despite these achievements, however, understanding of each variant's underlying mechanism and how these SNPs predispose women to breast cancer remains limited and represents a major challenge in the field, particularly since the vast majority of the GWAS-identified SNPs are located in non-coding regions of the genome and are merely tags for the causal variants. In recent years, fine-scale mapping studies followed by functional evaluation of putative causal variants have begun to elucidate the biological function of several GWAS-identified variants. In this review, we discuss the findings and lessons learned from these post-GWAS analyses of 22 risk loci. Identifying the true causal variants underlying breast cancer susceptibility and their function not only provides better estimates of the explained familial relative risk thereby improving polygenetic risk scores (PRSs), it also increases our understanding of the biological mechanisms responsible for causing susceptibility to breast cancer. This will facilitate the identification of further breast cancer risk alleles and the development of preventive medicine for those women at increased risk for developing the disease.
Collapse
Affiliation(s)
- Mahdi Rivandi
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, Netherlands.,Department of Modern Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - John W M Martens
- Department of Medical Oncology, Erasmus MC Cancer Institute, Rotterdam, Netherlands.,Cancer Genomics Centre, Utrecht, Netherlands
| | | |
Collapse
|
20
|
Zhou L, Zhao F. Prioritization and functional assessment of noncoding variants associated with complex diseases. Genome Med 2018; 10:53. [PMID: 29996888 PMCID: PMC6042373 DOI: 10.1186/s13073-018-0565-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Accepted: 06/29/2018] [Indexed: 12/11/2022] Open
Abstract
Unraveling functional noncoding variants associated with complex diseases is still a great challenge. We present a novel algorithm, Prioritization And Functional Assessment (PAFA), that prioritizes and assesses the functionality of genetic variants by introducing population differentiation measures and recalibrating training variants. Comprehensive evaluations demonstrate that PAFA exhibits much higher sensitivity and specificity in prioritizing noncoding risk variants than existing methods. PAFA achieves improved performance in distinguishing both common and rare recurrent variants from non-recurrent variants by integrating multiple annotations and metrics. An integrated platform was developed, providing comprehensive functional annotations for noncoding variants by integrating functional genomic data, which can be accessed at http://159.226.67.237:8080/pafa .
Collapse
Affiliation(s)
- Lin Zhou
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Fangqing Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
21
|
Vergara C, Parker MM, Franco L, Cho MH, Valencia-Duarte AV, Beaty TH, Duggal P. Genotype imputation performance of three reference panels using African ancestry individuals. Hum Genet 2018; 137:281-292. [PMID: 29637265 PMCID: PMC6209094 DOI: 10.1007/s00439-018-1881-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 03/31/2018] [Indexed: 12/22/2022]
Abstract
Genotype imputation estimates unobserved genotypes from genome-wide makers, to increase genome coverage and power for genome-wide association studies. Imputation has been successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We compared the performance of these reference panels when imputing variation in 3747 African Americans (AA) from two cohorts (HCV and COPDGene) genotyped using Illumina Omni microarrays. The haplotypes of 2504 (1000G), 883 (CAAPA) and 32,470 individuals (HRC) were used as reference. We compared the number of variants, imputation quality, imputation accuracy and coverage between panels. In both cohorts, 1000G imputed 1.5-1.6× more variants than CAAPA and 1.2× more than HRC. Similar findings were observed for variants with imputation R2 > 0.5 and for rare, low-frequency, and common variants. When merging imputed variants of the three panels, the total number was 62-63 M with 20 M overlapping variants imputed by all three panels, and a range of 5-15 M variants imputed exclusively with one of them. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. 1000G, HRC and CAAPA provided high performance and accuracy for imputation of African American individuals, increasing the number of variants available for subsequent analyses. These panels are complementary and would benefit from the development of an integrated African reference panel.
Collapse
Affiliation(s)
| | - Margaret M Parker
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Liliana Franco
- National School of Public Health, Universidad de Antioquia, Medellín, Colombia
- School of Medicine, Universidad Pontificia Bolivariana, Medellín, Colombia
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Terri H Beaty
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, MD, USA
| | - Priya Duggal
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
22
|
Kel I, Chang Z, Galluccio N, Romeo M, Beretta S, Diomede L, Mezzelani A, Milanesi L, Dieterich C, Merelli I. SPIRE, a modular pipeline for eQTL analysis of RNA-Seq data, reveals a regulatory hotspot controlling miRNA expression in C. elegans. MOLECULAR BIOSYSTEMS 2017; 12:3447-3458. [PMID: 27722582 DOI: 10.1039/c6mb00453a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The interpretation of genome-wide association study is difficult, as it is hard to understand how polymorphisms can affect gene regulation, in particular for trans-regulatory elements located far from their controlling gene. Using RNA or protein expression data as phenotypes, it is possible to correlate their variations with specific genotypes. This technique is usually referred to as expression Quantitative Trait Loci (eQTLs) analysis and only few packages exist for the integration of genotype patterns and expression profiles. In particular, tools are needed for the analysis of next-generation sequencing (NGS) data on a genome-wide scale, which is essential to identify eQTLs able to control a large number of genes (hotspots). Here we present SPIRE (Software for Polymorphism Identification Regulating Expression), a generic, modular and functionally highly flexible pipeline for eQTL processing. SPIRE integrates different univariate and multivariate approaches for eQTL analysis, paying particular attention to the scalability of the procedure in order to support cis- as well as trans-mapping, thus allowing the identification of hotspots in NGS data. In particular, we demonstrated how SPIRE can handle big association study datasets, reproducing published results and improving the identification of trans-eQTLs. Furthermore, we employed the pipeline to analyse novel data concerning the genotypes of two different C. elegans strains (N2 and Hawaii) and related miRNA expression data, obtained using RNA-Seq. A miRNA regulatory hotspot was identified in chromosome 1, overlapping the transcription factor grh-1, known to be involved in the early phases of embryonic development of C. elegans. In a follow-up qPCR experiment we were able to verify most of the predicted eQTLs, as well as to show, for a novel miRNA, a significant difference in the sequences of the two analysed strains of C. elegans. SPIRE is publicly available as open source software at , together with some example data, a readme file, supplementary material and a short tutorial.
Collapse
Affiliation(s)
- Ivan Kel
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| | - Zisong Chang
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Robert-Rössle-Straße 10, 13125, Berlin, Germany.
| | - Nadia Galluccio
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| | - Margherita Romeo
- Dipartimento di Biochimica e Farmacologia Molecolare, IRCCS - Istituto di Ricerche Farmacologiche "Mario Negri", Via Giuseppe La Masa 19, Milan, Italy.
| | - Stefano Beretta
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli studi di Milano-Biccoca, Viale Sarca 336, 20125 Milano, Italy.
| | - Luisa Diomede
- Dipartimento di Biochimica e Farmacologia Molecolare, IRCCS - Istituto di Ricerche Farmacologiche "Mario Negri", Via Giuseppe La Masa 19, Milan, Italy.
| | - Alessandra Mezzelani
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| | - Luciano Milanesi
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| | - Christoph Dieterich
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology and Department of Internal Medicine III, University of Heidelberg, Grabengasse 1, 69117 Heidelberg, Germany.
| | - Ivan Merelli
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| |
Collapse
|
23
|
Zheng-Bradley X, Streeter I, Fairley S, Richardson D, Clarke L, Flicek P. Alignment of 1000 Genomes Project reads to reference assembly GRCh38. Gigascience 2017; 6:1-8. [PMID: 28531267 PMCID: PMC5522380 DOI: 10.1093/gigascience/gix038] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 03/29/2017] [Accepted: 05/19/2017] [Indexed: 12/30/2022] Open
Abstract
The 1000 Genomes Project produced more than 100 trillion basepairs of short read sequence from more than 2600 samples in 26 populations over a period of five years. In its final phase, the project released over 85 million genotyped and phased variants on human reference genome assembly GRCh37. An updated reference assembly, GRCh38, was released in late 2013, but there was insufficient time for the final phase of the project analysis to change to the new assembly. Although it is possible to lift the coordinates of the 1000 Genomes Project variants to the new assembly, this is a potentially error-prone process as coordinate remapping is most appropriate only for non-repetitive regions of the genome and those that did not see significant change between the two assemblies. It will also miss variants in any region that was newly added to GRCh38. Thus, to produce the highest quality variants and genotypes on GRCh38, the best strategy is to realign the reads and recall the variants based on the new alignment. As the first step of variant calling for the 1000 Genomes Project data, we have finished remapping all of the 1000 Genomes sequence reads to GRCh38 with alternative scaffold-aware BWA-MEM. The resulting alignments are available as CRAM, a reference-based sequence compression format. The data have been released on our FTP site and are also available from European Nucleotide Archive to facilitate researchers discovering variants on the primary sequences and alternative contigs of GRCh38.
Collapse
Affiliation(s)
- Xiangqun Zheng-Bradley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian Streeter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - David Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | |
Collapse
|
24
|
Clarke L, Fairley S, Zheng-Bradley X, Streeter I, Perry E, Lowy E, Tassé AM, Flicek P. The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data. Nucleic Acids Res 2016; 45:D854-D859. [PMID: 27638885 PMCID: PMC5210610 DOI: 10.1093/nar/gkw829] [Citation(s) in RCA: 154] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 09/08/2016] [Indexed: 01/09/2023] Open
Abstract
The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data—previously only browseable through our FTP site—by focusing on particular samples, populations or data sets of interest.
Collapse
Affiliation(s)
- Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Xiangqun Zheng-Bradley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ian Streeter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emily Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Ernesto Lowy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anne-Marie Tassé
- Public Population Project in Genomics and Society, McGill University and Genome Quebec Innovation Centre, Montreal, Quebec, Canada
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|