1
|
French JN, Pua VB, Laboulaye R, Leal TP, Olivas MC, Lima-Costa MF, Horta BL, Barreto ML, Tarazona-Santos E, Mata I, O’Connor TD. Comparing the effect of imputation reference panel composition in four distinct Latin American cohorts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.11.589057. [PMID: 38659746 PMCID: PMC11042191 DOI: 10.1101/2024.04.11.589057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Genome-wide association studies have been useful in identifying genetic risk factors for various phenotypes. These studies rely on imputation and many existing panels are largely composed of individuals of European ancestry, resulting in lower levels of imputation quality in underrepresented populations. We aim to analyze how the composition of imputation reference panels affects imputation quality in four target Latin American cohorts. We compared imputation quality for chromosomes 7 and X when altering the imputation reference panel by: 1) increasing the number of Latin American individuals; 2) excluding either Latin American, African, or European individuals, or 3) increasing the Indigenous American (IA) admixture proportions of included Latin Americans. We found that increasing the number of Latin Americans in the reference panel improved imputation quality in the four populations; however, there were differences between chromosomes 7 and X in some cohorts. Excluding Latin Americans from analysis resulted in worse imputation quality in every cohort, while differential effects were seen when excluding Europeans and Africans between and within cohorts and between chromosomes 7 and X. Finally, increasing IA-like admixture proportions in the reference panel increased imputation quality at different levels in different populations. The difference in results between populations and chromosomes suggests that existing and future reference panels containing Latin American individuals are likely to perform differently in different Latin American populations.
Collapse
Affiliation(s)
- Jennifer N French
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
| | - Victor Borda Pua
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
- University of Maryland Institute for Health Computing, Rockville, MD
| | - Roland Laboulaye
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
| | - Thiago Peixoto Leal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | - Mario Cornejo Olivas
- Neurogenetics Working Group, Universidad Cientifica del Sur, Lima, Peru
- Neurogenetics Research Center, Instituto Nacional de Ciencias Neurologicas, Lima, Peru
| | | | - Bernardo L Horta
- Postgraduate Program in Epidemiology, Federal University of Pelotas, Pelotas, Brazil
| | - Mauricio L Barreto
- Center for Data and Knowledge Integration for Health (CIDACS), Gonçalo Moniz Institute (IGM), Oswaldo Cruz Foundation (FIOCRUZ-BA), Salvador, Bahia, Brazil
- Collective Health Institute, Federal University of Bahia (UFBA), Salvador, Bahia, Brazil
| | - Eduardo Tarazona-Santos
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Brazil
| | - Ignacio Mata
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | - Timothy D. O’Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD
- Program in Health Equity and Population Health, University of Maryland School of Medicine
| |
Collapse
|
2
|
Levi H, Elkon R, Shamir R. The predictive capacity of polygenic risk scores for disease risk is only moderately influenced by imputation panels tailored to the target population. Bioinformatics 2024; 40:btae036. [PMID: 38265251 PMCID: PMC10868313 DOI: 10.1093/bioinformatics/btae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 12/20/2023] [Accepted: 01/20/2024] [Indexed: 01/25/2024] Open
Abstract
MOTIVATION Polygenic risk scores (PRSs) predict individuals' genetic risk of developing complex diseases. They summarize the effect of many variants discovered in genome-wide association studies (GWASs). However, to date, large GWASs exist primarily for the European population and the quality of PRS prediction declines when applied to other ethnicities. Genetic profiling of individuals in the discovery set (on which the GWAS was performed) and target set (on which the PRS is applied) is typically done by SNP arrays that genotype a fraction of common SNPs. Therefore, a key step in GWAS analysis and PRS calculation is imputing untyped SNPs using a panel of fully sequenced individuals. The imputation results depend on the ethnic composition of the imputation panel. Imputing genotypes with a panel of individuals of the same ethnicity as the genotyped individuals typically improves imputation accuracy. However, there has been no systematic investigation into the influence of the ethnic composition of imputation panels on the accuracy of PRS predictions when applied to ethnic groups that differ from the population used in the GWAS. RESULTS We estimated the effect of imputation of the target set on prediction accuracy of PRS when the discovery and the target sets come from different ethnic groups. We analyzed binary phenotypes on ethnically distinct sets from the UK Biobank and other resources. We generated ethnically homogenous panels, imputed the target sets, and generated PRSs. Then, we assessed the prediction accuracy obtained from each imputation panel. Our analysis indicates that using an imputation panel matched to the ethnicity of the target population yields only a marginal improvement and only under specific conditions. AVAILABILITY AND IMPLEMENTATION The source code used for executing the analyses is this paper is available at https://github.com/Shamir-Lab/PRS-imputation-panels.
Collapse
Affiliation(s)
- Hagai Levi
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ran Elkon
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
3
|
Phulpagar P, Holla VV, Tomar D, Kamble N, Yadav R, Pal PK, Muthusamy B. Novel CWF19L1 mutations in patients with spinocerebellar ataxia, autosomal recessive 17. J Hum Genet 2023; 68:859-866. [PMID: 37752213 DOI: 10.1038/s10038-023-01195-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 08/09/2023] [Accepted: 09/06/2023] [Indexed: 09/28/2023]
Abstract
Spinocerebellar ataxia, autosomal recessive-17 (SCAR17) is a rare hereditary ataxia characterized by ataxic gait, cerebellar signs and occasionally accompanied by intellectual disability and seizures. Pathogenic mutations in the CWF19L1 gene that code for CWF19 like cell cycle control factor 1 cause SCAR17. We report here two unrelated families with the clinical characteristics of global developmental delay, cerebellar ataxia, pyramidal signs, and seizures. Cerebellar atrophy, and T2/FLAIR hypointense transverse pontine stripes were observed in brain imaging. Exome sequencing identified novel homozygous mutations including a splice acceptor site variant c.1375-2 A > G on intron 12 in a male patient and a single nucleotide variant c.452 T > G on exon 5 resulting in a missense variant p.Ile151Ser in the female patient from two unrelated families, respectively. Sanger sequencing confirmed the segregation of these variants in the family members with autosomal recessive inheritance. Transcript analysis of the splice site variant revealed activation of a novel cryptic splice acceptor site on exon 13 resulting in an alternative transcription with a loss of nine nucleotides on exon 13. Translation of this transcript predicted an in-frame deletion of three amino acids p.(459_461del). We also observed a novel exon 13 skipping which results in premature termination of the protein product. Our study expands the phenotype, radiological features, and genotypes known in SCAR17.
Collapse
Affiliation(s)
- Prashant Phulpagar
- Institute of Bioinformatics, International Technology Park, Bangalore, 560066, India
- Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India
| | - Vikram V Holla
- Department of Neurology, NIMHANS, Hosur Road, Bangalore, 560029, India
| | - Deepti Tomar
- Institute of Bioinformatics, International Technology Park, Bangalore, 560066, India
| | - Nitish Kamble
- Department of Neurology, NIMHANS, Hosur Road, Bangalore, 560029, India
| | - Ravi Yadav
- Department of Neurology, NIMHANS, Hosur Road, Bangalore, 560029, India
| | - Pramod Kumar Pal
- Department of Neurology, NIMHANS, Hosur Road, Bangalore, 560029, India.
| | - Babylakshmi Muthusamy
- Institute of Bioinformatics, International Technology Park, Bangalore, 560066, India.
- Manipal Academy of Higher Education, Manipal, 576104, Karnataka, India.
| |
Collapse
|
4
|
Jiménez-Kaufmann A, Chong AY, Cortés A, Quinto-Cortés CD, Fernandez-Valverde SL, Ferreyra-Reyes L, Cruz-Hervert LP, Medina-Muñoz SG, Sohail M, Palma-Martinez MJ, Delgado-Sánchez G, Mongua-Rodríguez N, Mentzer AJ, Hill AVS, Moreno-Macías H, Huerta-Chagoya A, Aguilar-Salinas CA, Torres M, Kim HL, Kalsi N, Schuster SC, Tusié-Luna T, Del-Vecchyo DO, García-García L, Moreno-Estrada A. Imputation Performance in Latin American Populations: Improving Rare Variants Representation With the Inclusion of Native American Genomes. Front Genet 2022; 12:719791. [PMID: 35046991 PMCID: PMC8762266 DOI: 10.3389/fgene.2021.719791] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 11/01/2021] [Indexed: 11/13/2022] Open
Abstract
Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research.
Collapse
Affiliation(s)
- Andrés Jiménez-Kaufmann
- Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
| | - Amanda Y Chong
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Adrián Cortés
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Consuelo D Quinto-Cortés
- Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
| | - Selene L Fernandez-Valverde
- Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
| | | | | | - Santiago G Medina-Muñoz
- Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
| | - Mashaal Sohail
- Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico.,Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - María J Palma-Martinez
- Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
| | | | | | - Alexander J Mentzer
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Adrian V S Hill
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.,Nuffield Department of Medicine, The Jenner Institute, University of Oxford, Oxford, United Kingdom
| | - Hortensia Moreno-Macías
- Unidad de Biología Molecular y Medicina Genómica, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán (INCMNSZ), Mexico City, Mexico.,Departamento de Economía, Universidad Autónoma Metropolitana, Mexico City, Mexico
| | - Alicia Huerta-Chagoya
- Unidad de Biología Molecular y Medicina Genómica, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán (INCMNSZ), Mexico City, Mexico
| | - Carlos A Aguilar-Salinas
- Departamento de Endocrinología y Metabolismo, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Unidad de Investigación de Enfermedades Metabólicas, Mexico City, Mexico.,Tecnológico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, Mexico
| | - Michael Torres
- Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
| | - Hie Lim Kim
- Singapore Centre on Environmental Life Sciences Engineering, Nanyang Technological University, Singapore.,GenomeAsia 100K (GA100K) Consortium, Singapore.,School of Biological Science, Nanyang Technological University, Singapore
| | - Namrata Kalsi
- Singapore Centre on Environmental Life Sciences Engineering, Nanyang Technological University, Singapore.,GenomeAsia 100K (GA100K) Consortium, Singapore
| | - Stephan C Schuster
- Singapore Centre on Environmental Life Sciences Engineering, Nanyang Technological University, Singapore.,GenomeAsia 100K (GA100K) Consortium, Singapore.,School of Biological Science, Nanyang Technological University, Singapore
| | - Teresa Tusié-Luna
- Unidad de Biología Molecular y Medicina Genómica, Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán (INCMNSZ), Mexico City, Mexico.,Instituto de Investigaciones Biomédicas de la UNAM, Mexico City, Mexico
| | - Diego Ortega Del-Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), UNAM, Juriquilla, Mexico
| | | | - Andrés Moreno-Estrada
- Laboratorio Nacional de Genómica para la Biodiversidad (UGA-LANGEBIO), Unidad de Genómica Avanzada, Irapuato, Mexico
| |
Collapse
|
5
|
Yu Y, Werdyani S, Carey M, Parfrey P, Yilmaz YE, Savas S. A comprehensive analysis of SNPs and CNVs identifies novel markers associated with disease outcomes in colorectal cancer. Mol Oncol 2021; 15:3329-3347. [PMID: 34309201 PMCID: PMC8637572 DOI: 10.1002/1878-0261.13067] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 05/29/2021] [Accepted: 07/24/2021] [Indexed: 12/15/2022] Open
Abstract
We aimed to examine the associations of a genome-wide set of single nucleotide polymorphisms (SNPs) and 254 copy number variations (CNVs) and/or insertion/deletions (INDELs) with clinical outcomes in colorectal cancer patients (n = 505). We also aimed to investigate whether their associations changed (e.g., appeared, diminished) over time. Multivariable Cox proportional hazards and piece-wise Cox regression models were used to examine the associations. The Cancer Genome Atlas (TCGA) datasets were used for replication purposes and to examine the gene expression differences between tumor and nontumor tissue samples. A common SNP (WBP11-rs7314075) was associated with disease-specific survival with P-value of 3.2 × 10-8 . Association of this region with disease-specific survival was also detected in the TCGA patient cohort. Two expression quantitative trait loci (eQTLs) were identified in this locus that were implicated in the regulation of ERP27 expression. Interestingly, expression levels of ERP27 and WBP11 were significantly different between colorectal tumors and nontumor tissues. Three SNPs predicted the risk of recurrent disease only after 5 years postdiagnosis. Overall, our study identified novel variants, one of which also showed an association in the TCGA dataset, but no CNVs/INDELs, that associated with outcomes in colorectal cancer. Three SNPs were candidate predictors of long-term recurrence/metastasis risk.
Collapse
Affiliation(s)
- Yajun Yu
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John's, NL, Canada
| | - Salem Werdyani
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John's, NL, Canada
| | - Megan Carey
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John's, NL, Canada
| | - Patrick Parfrey
- Discipline of Medicine, Faculty of Medicine, Memorial University, St. John's, NL, Canada
| | - Yildiz E Yilmaz
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John's, NL, Canada.,Discipline of Medicine, Faculty of Medicine, Memorial University, St. John's, NL, Canada.,Department of Mathematics and Statistics, Faculty of Science, Memorial University, St. John's, NL, Canada
| | - Sevtap Savas
- Discipline of Genetics, Faculty of Medicine, Memorial University, St. John's, NL, Canada.,Discipline of Oncology, Faculty of Medicine, Memorial University, St. John's, NL, Canada
| |
Collapse
|
6
|
Wei CY, Yang JH, Yeh EC, Tsai MF, Kao HJ, Lo CZ, Chang LP, Lin WJ, Hsieh FJ, Belsare S, Bhaskar A, Su MW, Lee TC, Lin YL, Liu FT, Shen CY, Li LH, Chen CH, Wall JD, Wu JY, Kwok PY. Genetic profiles of 103,106 individuals in the Taiwan Biobank provide insights into the health and history of Han Chinese. NPJ Genom Med 2021; 6:10. [PMID: 33574314 PMCID: PMC7878858 DOI: 10.1038/s41525-021-00178-9] [Citation(s) in RCA: 92] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 01/06/2021] [Indexed: 02/06/2023] Open
Abstract
Personalized medical care focuses on prediction of disease risk and response to medications. To build the risk models, access to both large-scale genomic resources and human genetic studies is required. The Taiwan Biobank (TWB) has generated high-coverage, whole-genome sequencing data from 1492 individuals and genome-wide SNP data from 103,106 individuals of Han Chinese ancestry using custom SNP arrays. Principal components analysis of the genotyping data showed that the full range of Han Chinese genetic variation was found in the cohort. The arrays also include thousands of known functional variants, allowing for simultaneous ascertainment of Mendelian disease-causing mutations and variants that affect drug metabolism. We found that 21.2% of the population are mutation carriers of autosomal recessive diseases, 3.1% have mutations in cancer-predisposing genes, and 87.3% carry variants that affect drug response. We highlight how TWB data provide insight into both population history and disease burden, while showing how widespread genetic testing can be used to improve clinical care.
Collapse
Affiliation(s)
- Chun-Yu Wei
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Jenn-Hwai Yang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Erh-Chan Yeh
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ming-Fang Tsai
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Hsiao-Jung Kao
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Chen-Zen Lo
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Lung-Pao Chang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Wan-Jia Lin
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Feng-Jen Hsieh
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Saurabh Belsare
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Anand Bhaskar
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Ming-Wei Su
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Te-Chang Lee
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Yi-Ling Lin
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Fu-Tong Liu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Chen-Yang Shen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Ling-Hui Li
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Chien-Hsiun Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Jeffrey D Wall
- Institute for Human Genetics, University of California, San Francisco, CA, USA
| | - Jer-Yuarn Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Pui-Yan Kwok
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
- Institute for Human Genetics, University of California, San Francisco, CA, USA.
| |
Collapse
|
7
|
Jain A, Bhoyar RC, Pandhare K, Mishra A, Sharma D, Imran M, Senthivel V, Divakar MK, Rophina M, Jolly B, Batra A, Sharma S, Siwach S, Jadhao AG, Palande N, Jha GN, Ashrafi N, Mishra PK, A. K. V, Jain S, Dash D, Kumar NS, Vanlallawma A, Sarma R, Chhakchhuak L, Kalyanaraman S, Mahadevan R, Kandasamy S, B. M. P, Rajagopal RE, J. ER, P. ND, Bajaj A, Gupta V, Mathew S, Goswami S, Mangla M, Prakash S, Joshi K, S. S, Gajjar D, Soraisham R, Yadav R, Devi YS, Gupta A, Mukerji M, Ramalingam S, B. K. B, Scaria V, Sivasubbu S. IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes. Nucleic Acids Res 2021; 49:D1225-D1232. [PMID: 33095885 PMCID: PMC7778947 DOI: 10.1093/nar/gkaa923] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/01/2020] [Accepted: 10/22/2020] [Indexed: 12/15/2022] Open
Abstract
With the advent of next-generation sequencing, large-scale initiatives for mining whole genomes and exomes have been employed to better understand global or population-level genetic architecture. India encompasses more than 17% of the world population with extensive genetic diversity, but is under-represented in the global sequencing datasets. This gave us the impetus to perform and analyze the whole genome sequencing of 1029 healthy Indian individuals under the pilot phase of the 'IndiGen' program. We generated a compendium of 55,898,122 single allelic genetic variants from geographically distinct Indian genomes and calculated the allele frequency, allele count, allele number, along with the number of heterozygous or homozygous individuals. In the present study, these variants were systematically annotated using publicly available population databases and can be accessed through a browsable online database named as 'IndiGenomes' http://clingen.igib.res.in/indigen/. The IndiGenomes database will help clinicians and researchers in exploring the genetic component underlying medical conditions. Till date, this is the most comprehensive genetic variant resource for the Indian population and is made freely available for academic utility. The resource has also been accessed extensively by the worldwide community since it's launch.
Collapse
Affiliation(s)
- Abhinav Jain
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Rahul C Bhoyar
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Kavita Pandhare
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Anushree Mishra
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Disha Sharma
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Mohamed Imran
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Vigneshwar Senthivel
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Mohit Kumar Divakar
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Mercy Rophina
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Bani Jolly
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Arushi Batra
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Sumit Sharma
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Sanjay Siwach
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Arun G Jadhao
- Department of Zoology, RTM Nagpur University, Nagpur, Maharashtra 440033, India
| | - Nikhil V Palande
- Department of Zoology, Shri Mathuradas Mohota College of Science, Nagpur, Maharashtra 440009, India
| | - Ganga Nath Jha
- Department of Anthropology, Vinoba Bhave University, Hazaribag, Jharkhand 825301, India
| | - Nishat Ashrafi
- Department of Anthropology, Vinoba Bhave University, Hazaribag, Jharkhand 825301, India
| | - Prashant Kumar Mishra
- Department of Biotechnology, Vinoba Bhave University, Hazaribag, Jharkhand 825301, India
| | - Vidhya A. K.
- Department of Biochemistry, Dr. Kongu Science and Art College, Erode, Tamil Nadu 638107, India
| | - Suman Jain
- Thalassemia and Sickle cell Society, Hyderabad, Telangana 500052, India
| | - Debasis Dash
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | | | - Andrew Vanlallawma
- Department of Biotechnology, Mizoram University, Aizawl, Mizoram 796004, India
| | - Ranjan Jyoti Sarma
- Department of Biotechnology, Mizoram University, Aizawl, Mizoram 796004, India
| | | | | | - Radha Mahadevan
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | - Sunitha Kandasamy
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | - Pabitha B. M.
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | | | - Ezhil Ramya J.
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | - Nirmala Devi P.
- TVMC, Tirunelveli Medical College, Tirunelveli, Tamil Nadu 627011, India
| | - Anjali Bajaj
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Vishu Gupta
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Samatha Mathew
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Sangam Goswami
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Mohit Mangla
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Savinitha Prakash
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Kandarp Joshi
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Sreedevi S.
- Department of Microbiology, St.Pious X Degree & PG College for Women, Hyderabad, Telangana 500076, India
| | - Devarshi Gajjar
- Department of Microbiology, The Maharaja Sayajirao University of Baroda, Vadodara, Gujarat 390002, India
| | - Ronibala Soraisham
- Department of Dermatology, Venereology and Leprology, Regional Institute of Medical Sciences, Imphal, Manipur 795004, India
| | - Rohit Yadav
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Yumnam Silla Devi
- CSIR- North East Institute of Science and Technology, Jorhat, Assam 785006, India
| | - Aayush Gupta
- Department of Dermatology, Dr. D.Y. Patil Medical College, Pune, Maharashtra 411018, India
| | - Mitali Mukerji
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Sivaprakash Ramalingam
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Binukumar B. K.
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Vinod Scaria
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Sridhar Sivasubbu
- CSIR-Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| |
Collapse
|
8
|
Quick C, Anugu P, Musani S, Weiss ST, Burchard EG, White MJ, Keys KL, Cucca F, Sidore C, Boehnke M, Fuchsberger C. Sequencing and imputation in GWAS: Cost-effective strategies to increase power and genomic coverage across diverse populations. Genet Epidemiol 2020; 44:537-549. [PMID: 32519380 PMCID: PMC7449570 DOI: 10.1002/gepi.22326] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 04/02/2020] [Accepted: 05/22/2020] [Indexed: 01/03/2023]
Abstract
A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to fully capture genetic variation, but remains prohibitively expensive for large sample sizes. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture a wider set of variants. However, imputation quality depends crucially on reference panel size and genetic distance from the target population. Here, we consider sequencing a subset of GWAS participants and imputing the rest using a reference panel that includes both sequenced GWAS participants and an external reference panel. We investigate how imputation quality and GWAS power are affected by the number of participants sequenced for admixed populations (African and Latino Americans) and European population isolates (Sardinians and Finns), and identify powerful, cost-effective GWAS designs given current sequencing and array costs. For populations that are well-represented in existing reference panels, we find that array genotyping alone is cost-effective and well-powered to detect common- and rare-variant associations. For poorly represented populations, sequencing a subset of participants is often most cost-effective, and can substantially increase imputation quality and GWAS power.
Collapse
Affiliation(s)
- Corbin Quick
- Department of Biostatistics and Center for Statistical GeneticsUniversity of Michigan School of Public HealthAnn ArborMichigan
| | - Pramod Anugu
- University of Mississippi Medical CenterJacksonMississippi
| | - Solomon Musani
- University of Mississippi Medical CenterJacksonMississippi
| | - Scott T. Weiss
- Harvard Medical SchoolBostonMassachusetts
- Channing Department of Network MedicineBrigham and Women's HospitalBostonCalifornia
- Partners HealthCare Personalized MedicineBostonMassachusetts
| | - Esteban G. Burchard
- Department of MedicineUniversity of California San FranciscoSan FranciscoCalifornia
- Department of Bioengineering and Therapeutic SciencesUniversity of California San FranciscoSan FranciscoCalifornia
| | - Marquitta J. White
- Department of MedicineUniversity of California San FranciscoSan FranciscoCalifornia
| | - Kevin L. Keys
- Department of MedicineUniversity of California San FranciscoSan FranciscoCalifornia
| | - Francesco Cucca
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNRMonserratoItaly
- Dipartimento di Scienze BiomedicheUniversità di SassariSassariItaly
| | - Carlo Sidore
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNRMonserratoItaly
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical GeneticsUniversity of Michigan School of Public HealthAnn ArborMichigan
| | - Christian Fuchsberger
- Department of Biostatistics and Center for Statistical GeneticsUniversity of Michigan School of Public HealthAnn ArborMichigan
- Department of Genetics and Pharmacology, Institute of Genetic EpidemiologyMedical University of InnsbruckInnsbruckAustria
- Institute for Biomedicine, Eurac ResearchAffiliated Institute of the University of LübeckBolzanoItaly
| |
Collapse
|
9
|
Peterson RE, Kuchenbaecker K, Walters RK, Chen CY, Popejoy AB, Periyasamy S, Lam M, Iyegbe C, Strawbridge RJ, Brick L, Carey CE, Martin AR, Meyers JL, Su J, Chen J, Edwards AC, Kalungi A, Koen N, Majara L, Schwarz E, Smoller JW, Stahl EA, Sullivan PF, Vassos E, Mowry B, Prieto ML, Cuellar-Barboza A, Bigdeli TB, Edenberg HJ, Huang H, Duncan LE. Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations. Cell 2019; 179:589-603. [PMID: 31607513 PMCID: PMC6939869 DOI: 10.1016/j.cell.2019.08.051] [Citation(s) in RCA: 362] [Impact Index Per Article: 72.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 07/10/2019] [Accepted: 08/26/2019] [Indexed: 12/19/2022]
Abstract
Genome-wide association studies (GWASs) have focused primarily on populations of European descent, but it is essential that diverse populations become better represented. Increasing diversity among study participants will advance our understanding of genetic architecture in all populations and ensure that genetic research is broadly applicable. To facilitate and promote research in multi-ancestry and admixed cohorts, we outline key methodological considerations and highlight opportunities, challenges, solutions, and areas in need of development. Despite the perception that analyzing genetic data from diverse populations is difficult, it is scientifically and ethically imperative, and there is an expanding analytical toolbox to do it well.
Collapse
Affiliation(s)
- Roseann E Peterson
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA.
| | - Karoline Kuchenbaecker
- Division of Psychiatry and UCL Genetics Institute, University College London, London W1T 7NF, UK
| | - Raymond K Walters
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Chia-Yen Chen
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA; Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Alice B Popejoy
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA 94305, USA
| | - Sathish Periyasamy
- Queensland Brain Institute and Queensland Centre for Mental Health Research, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Max Lam
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Conrad Iyegbe
- Department of Psychosis Studies, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London SE5 8AF, UK
| | - Rona J Strawbridge
- Institute of Health and Wellbeing, University of Glasgow, Glasgow G12 8RZ, UK; Department of Medicine Solna, Karolinska Institute, Stockholm, SE 17176, Sweden
| | - Leslie Brick
- Department of Psychiatry and Human Behavior, Warren Alpert Medical School, Brown University, Providence, RI 02906, USA
| | - Caitlin E Carey
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Alicia R Martin
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Jacquelyn L Meyers
- Department of Psychiatry, State University of New York Downstate Medical Center, Brooklyn, NY 11203, USA
| | - Jinni Su
- Department of Psychology, Arizona State University, Tempe, AZ 85281, USA
| | - Junfang Chen
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany
| | - Alexis C Edwards
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Psychiatry, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Allan Kalungi
- Mental Health Section of MRC/UVRI and LSHTM Uganda Research Unit, P.O. Box 49, Entebbe, Uganda; Department of Psychiatry, Faculty of Medicine & Health Sciences, University of Stellenbosch, Cape Town, South Africa; Department of Medical Microbiology, College of Health Sciences, Makerere University, Kampala, Uganda; Global Initiative for Neuropsychiatric Genetics Education in Research, Harvard T.H. Chan School of Public Health and Broad Institute, Boston, MA 02115, USA
| | - Nastassja Koen
- Department of Psychiatry, Faculty of Medicine & Health Sciences, University of Stellenbosch, Cape Town, South Africa; Department of Medical Microbiology, College of Health Sciences, Makerere University, Kampala, Uganda; Global Initiative for Neuropsychiatric Genetics Education in Research, Harvard T.H. Chan School of Public Health and Broad Institute, Boston, MA 02115, USA
| | - Lerato Majara
- Global Initiative for Neuropsychiatric Genetics Education in Research, Harvard T.H. Chan School of Public Health and Broad Institute, Boston, MA 02115, USA; MRC Human Genetics Research Unit, Division of Human Genetics, Department of Pathology, Institute of Infectious Diseases and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, 7925, South Africa
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, 68159 Mannheim, Germany
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Eli A Stahl
- Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Patrick F Sullivan
- Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, SE 17176, Sweden; Genetics and Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Evangelos Vassos
- Social, Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, SE5 8AF, UK
| | - Bryan Mowry
- Queensland Brain Institute and Queensland Centre for Mental Health Research, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Miguel L Prieto
- Department of Psychiatry, Faculty of Medicine, Universidad de los Andes, Santiago 7620001, Chile; Mental Health Service, Clínica Universidad de los Andes, Santiago 7620001, Chile; Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
| | - Alfredo Cuellar-Barboza
- Department of Psychiatry, University Hospital and School of Medicine, Universidad Autonoma de Nuevo Leon, Monterrey, Mexico; Department of Psychiatry and Psychology, Mayo Clinic, Rochester, MN, USA
| | - Tim B Bigdeli
- Department of Psychiatry, State University of New York Downstate Medical Center, Brooklyn, NY 11203, USA
| | - Howard J Edenberg
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Hailiang Huang
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Laramie E Duncan
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
10
|
Chen J, Lippold D, Frank J, Rayner W, Meyer-Lindenberg A, Schwarz E. Gimpute: an efficient genetic data imputation pipeline. Bioinformatics 2018; 35:1433-1435. [DOI: 10.1093/bioinformatics/bty814] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 08/09/2018] [Accepted: 09/18/2018] [Indexed: 11/14/2022] Open
Affiliation(s)
- Junfang Chen
- Department of Psychiatry and Psychotherapy, Heidelberg University, Mannheim, Germany
| | - Dietmar Lippold
- Department of Psychiatry and Psychotherapy, Heidelberg University, Mannheim, Germany
| | - Josef Frank
- Department of Genetic Epidemiology in Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - William Rayner
- Radcliffe Department of Medicine, Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Headington, Oxford, UK
- Nuffield Department of Medicine, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | | | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Heidelberg University, Mannheim, Germany
| |
Collapse
|
11
|
Magalhães WCS, Araujo NM, Leal TP, Araujo GS, Viriato PJS, Kehdy FS, Costa GN, Barreto ML, Horta BL, Lima-Costa MF, Pereira AC, Tarazona-Santos E, Rodrigues MR. EPIGEN-Brazil Initiative resources: a Latin American imputation panel and the Scientific Workflow. Genome Res 2018; 28:1090-1095. [PMID: 29903722 PMCID: PMC6028131 DOI: 10.1101/gr.225458.117] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 05/24/2018] [Indexed: 12/24/2022]
Abstract
EPIGEN-Brazil is one of the largest Latin American initiatives at the interface of human genomics, public health, and computational biology. Here, we present two resources to address two challenges to the global dissemination of precision medicine and the development of the bioinformatics know-how to support it. To address the underrepresentation of non-European individuals in human genome diversity studies, we present the EPIGEN-5M+1KGP imputation panel—the fusion of the public 1000 Genomes Project (1KGP) Phase 3 imputation panel with haplotypes derived from the EPIGEN-5M data set (a product of the genotyping of 4.3 million SNPs in 265 admixed individuals from the EPIGEN-Brazil Initiative). When we imputed a target SNPs data set (6487 admixed individuals genotyped for 2.2 million SNPs from the EPIGEN-Brazil project) with the EPIGEN-5M+1KGP panel, we gained 140,452 more SNPs in total than when using the 1KGP Phase 3 panel alone and 788,873 additional high confidence SNPs (info score ≥ 0.8). Thus, the major effect of the inclusion of the EPIGEN-5M data set in this new imputation panel is not only to gain more SNPs but also to improve the quality of imputation. To address the lack of transparency and reproducibility of bioinformatics protocols, we present a conceptual Scientific Workflow in the form of a website that models the scientific process (by including publications, flowcharts, masterscripts, documents, and bioinformatics protocols), making it accessible and interactive. Its applicability is shown in the context of the development of our EPIGEN-5M+1KGP imputation panel. The Scientific Workflow also serves as a repository of bioinformatics resources.
Collapse
Affiliation(s)
- Wagner C S Magalhães
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil.,Instituto Mario Penna, Núcleo de Ensino e Pesquisa, Belo Horizonte, Minas Gerais, 30380-472, Brazil
| | - Nathalia M Araujo
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil
| | - Thiago P Leal
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil
| | - Gilderlanio S Araujo
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil
| | - Paula J S Viriato
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil
| | - Fernanda S Kehdy
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil.,Laboratório de Hanseníase, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Rio de Janeiro, 21040-900, Brazil
| | - Gustavo N Costa
- Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador, Bahia, 40110-040, Brazil
| | - Mauricio L Barreto
- Instituto de Saúde Coletiva, Universidade Federal da Bahia, Salvador, Bahia, 40110-040, Brazil.,Center for Data and Knowledge Integration for Health, Institute Gonçalo Muniz, Fundação Oswaldo Cruz, Salvador, Bahia, 40296-710, Brazil
| | - Bernardo L Horta
- Programa de Pós-Graduação em Epidemiologia, Universidade Federal de Pelotas, Pelotas, Rio Grande do Sul, 96020-220, Brazil
| | - Maria Fernanda Lima-Costa
- Instituto de Pesquisa Rene Rachou, Fundação Oswaldo Cruz, Belo Horizonte, Minas Gerais, 30190-009, Brazil
| | - Alexandre C Pereira
- Instituto do Coração, Universidade de São Paulo, São Paulo, São Paulo, 05403-900, Brazil
| | - Eduardo Tarazona-Santos
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil
| | - Maíra R Rodrigues
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-901, Brazil.,Faculdade de Ciências Médicas e Instituto de Matemática, Estatística e Ciência da Computação, Universidade de Campinas, São Paulo, 13083-894, Brazil
| | | |
Collapse
|
12
|
Construction of full-length Japanese reference panel of class I HLA genes with single-molecule, real-time sequencing. THE PHARMACOGENOMICS JOURNAL 2018; 19:136-146. [PMID: 29352165 PMCID: PMC6462828 DOI: 10.1038/s41397-017-0010-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 10/11/2017] [Accepted: 11/06/2017] [Indexed: 12/30/2022]
Abstract
Human leukocyte antigen (HLA) is a gene complex known for its exceptional diversity across populations, importance in organ and blood stem cell transplantation, and associations of specific alleles with various diseases. We constructed a Japanese reference panel of class I HLA genes (ToMMo HLA panel), comprising a distinct set of HLA-A, HLA-B, HLA-C, and HLA-H alleles, by single-molecule, real-time (SMRT) sequencing of 208 individuals included in the 1070 whole-genome Japanese reference panel (1KJPN). For high-quality allele reconstruction, we developed a novel pipeline, Primer-Separation Assembly and Refinement Pipeline (PSARP), in which the SMRT sequencing and additional short-read data were used. The panel consisted of 139 alleles, which were all extended from known IPD-IMGT/HLA sequences, contained 40 with novel variants, and captured more than 96.5% of allelic diversity in 1KJPN. These newly available sequences would be important resources for research and clinical applications including high-resolution HLA typing, genetic association studies, and analyzes of cis-regulatory elements.
Collapse
|