1
|
Spisak S, Tisza V, Nuzzo PV, Seo JH, Pataki B, Ribli D, Sztupinszki Z, Bell C, Rohanizadegan M, Stillman DR, Alaiwi SA, Bartels AH, Papp M, Shetty A, Abbasi F, Lin X, Lawrenson K, Gayther SA, Pomerantz M, Baca S, Solymosi N, Csabai I, Szallasi Z, Gusev A, Freedman ML. A biallelic multiple nucleotide length polymorphism explains functional causality at 5p15.33 prostate cancer risk locus. Nat Commun 2023; 14:5118. [PMID: 37612286 PMCID: PMC10447552 DOI: 10.1038/s41467-023-40616-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 08/03/2023] [Indexed: 08/25/2023] Open
Abstract
To date, single-nucleotide polymorphisms (SNPs) have been the most intensively investigated class of polymorphisms in genome wide associations studies (GWAS), however, other classes such as insertion-deletion or multiple nucleotide length polymorphism (MNLPs) may also confer disease risk. Multiple reports have shown that the 5p15.33 prostate cancer risk region is a particularly strong expression quantitative trait locus (eQTL) for Iroquois Homeobox 4 (IRX4) transcripts. Here, we demonstrate using epigenome and genome editing that a biallelic (21 and 47 base pairs (bp)) MNLP is the causal variant regulating IRX4 transcript levels. In LNCaP prostate cancer cells (homozygous for the 21 bp short allele), a single copy knock-in of the 47 bp long allele potently alters the chromatin state, enabling de novo functional binding of the androgen receptor (AR) associated with increased chromatin accessibility, Histone 3 lysine 27 acetylation (H3K27ac), and ~3-fold upregulation of IRX4 expression. We further show that an MNLP is amongst the strongest candidate susceptibility variants at two additional prostate cancer risk loci. We estimated that at least 5% of prostate cancer risk loci could be explained by functional non-SNP causal variants, which may have broader implications for other cancers GWAS. More generally, our results underscore the importance of investigating other classes of inherited variation as causal mediators of human traits.
Collapse
Affiliation(s)
- Sandor Spisak
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Viktoria Tisza
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Computational Health Informatics Program (CHIP) Boston Children's Hospital Harvard Medical School, Boston, MA, 02215, USA
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
| | - Pier Vitale Nuzzo
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Internal Medicine, School of Medicine, University of Genoa, Genoa, Lgo R. Benzi 10, 16132, Italy
| | - Ji-Heui Seo
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Balint Pataki
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Pázmány P. s. 1A, Budapest, 1117, Hungary
| | - Dezso Ribli
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Pázmány P. s. 1A, Budapest, 1117, Hungary
| | - Zsofia Sztupinszki
- Computational Health Informatics Program (CHIP) Boston Children's Hospital Harvard Medical School, Boston, MA, 02215, USA
| | - Connor Bell
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Mersedeh Rohanizadegan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - David R Stillman
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Sarah Abou Alaiwi
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Alan H Bartels
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Marton Papp
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
- Centre for Bioinformatics, University of Veterinary Medicine, Istvan str. 2, Budapest, 1078, Hungary
| | - Anamay Shetty
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Division of Genetics, Brigham & Women's Hospital, Boston, MA, USA
| | - Forough Abbasi
- Women's Cancer Program, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Xianzhi Lin
- Women's Cancer Program, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Kate Lawrenson
- Women's Cancer Program, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Simon A Gayther
- Division of Gynecologic Oncology, Department of Obstetrics and Gynecology, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Mark Pomerantz
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Sylvan Baca
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- The Eli and Edythe L. Broad Institute, Cambridge, MA, 02142, USA
| | - Norbert Solymosi
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Pázmány P. s. 1A, Budapest, 1117, Hungary
| | - Istvan Csabai
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Pázmány P. s. 1A, Budapest, 1117, Hungary
| | - Zoltan Szallasi
- Computational Health Informatics Program (CHIP) Boston Children's Hospital Harvard Medical School, Boston, MA, 02215, USA
- Department of Bioinformatics, Forensic and Insurance Medicine Semmelweis University, Budapest, Hungary
- Danish Cancer Society Research Center, Strandboulevarden 49, 2100, Copenhagen, Denmark
- National Korányi Institute of Pulmonology, Budapest, 1112, Hungary
| | - Alexander Gusev
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Division of Genetics, Brigham & Women's Hospital, Boston, MA, USA
- The Eli and Edythe L. Broad Institute, Cambridge, MA, 02142, USA
| | - Matthew L Freedman
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- The Eli and Edythe L. Broad Institute, Cambridge, MA, 02142, USA.
| |
Collapse
|
2
|
Yao Y, Sun K, Yang Q, Zhou Z, Shao C, Qian X, Tang Q, Xie J. Assessing Autosomal InDel Loci With Multiple Insertions or Deletions of Random DNA Sequences in Human Genome. Front Genet 2022; 12:809815. [PMID: 35178073 PMCID: PMC8844376 DOI: 10.3389/fgene.2021.809815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 12/27/2021] [Indexed: 11/13/2022] Open
Abstract
Multiple mutational events of insertion/deletion occurring at or around InDel sites could form multi-allelic InDels and multi-InDels (abbreviated as MM-InDels), while InDels with random DNA sequences could imply a unique mutation event at these loci. In this study, preliminary investigation of MM-InDels with random sequences was conducted using high-throughput phased data from the 1000 Genomes Project. A total of 3,599 multi-allelic InDels and 6,375 multi-InDels were filtered with multiple alleles. A vast majority of the obtained MM-InDels (85.59%) presented 3 alleles, which implies that only one secondary insertion or deletion mutation event occurred at these loci. The more frequent presence of two adjacent InDel loci was observed within 20 bp. MM-InDels with random sequences presented an uneven distribution across the genome and showed a correlation with InDels, SNPs, recombination rate, and GC content. The average allelic frequencies and prevalence of multi-allelic InDels and multi-InDels presented similar distribution patterns in different populations. Altogether, MM-InDels with random sequences can provide useful information for population resolution.
Collapse
Affiliation(s)
- Yining Yao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Kuan Sun
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qinrui Yang
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Zhihan Zhou
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Chengchen Shao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Xiaoqin Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qiqun Tang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Jianhui Xie
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
3
|
Fan H, He Y, Li S, Xie Q, Wang F, Du Z, Fang Y, Qiu P, Zhu B. Systematic Evaluation of a Novel 6-dye Direct and Multiplex PCR-CE-Based InDel Typing System for Forensic Purposes. Front Genet 2022; 12:744645. [PMID: 35082827 PMCID: PMC8784372 DOI: 10.3389/fgene.2021.744645] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 10/29/2021] [Indexed: 12/16/2022] Open
Abstract
Insertion/deletion (InDel) polymorphisms, combined desirable characteristics of both short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs), are considerable potential in the fields of forensic practices and population genetics. However, most commercial InDel kits designed based on non-Asians limited extensive forensic applications in East Asian (EAS) populations. Recently, a novel 6-dye direct and multiplex PCR-CE-based typing system was designed on the basis of genome-wide EAS population data, which could amplify 60 molecular genetic markers, consisting of 57 autosomal InDels (A-InDels), 2 Y-chromosomal InDels (Y-InDels), and Amelogenin in a single PCR reaction and detect by capillary electrophoresis, simultaneously. In the present study, the DNA profiles of 279 unrelated individuals from the Hainan Li group were generated by the novel typing system. In addition, we collected two A-InDel sets to evaluate the forensic performances of the novel system in the 1,000 Genomes Project (1KG) populations and Hainan Li group. For the Universal A-InDel set (UAIS, containing 44 A-InDels) the cumulative power of discrimination (CPD) ranged from 1-1.03 × 10-14 to 1-1.27 × 10-18, and the cumulative power of exclusion (CPE) varied from 0.993634 to 0.999908 in the 1KG populations. For the East Asia-based A-InDel set (EAIS, containing 57 A-InDels) the CPD spanned from 1-1.32 × 10-23 to 1-9.42 × 10-24, and the CPE ranged from 0.999965 to 0.999997. In the Hainan Li group, the average heterozygote (He) was 0.4666 (0.2366-0.5448), and the polymorphism information content (PIC) spanned from 0.2116 to 0.3750 (mean PIC: 0.3563 ± 0.0291). In total, the CPD and CPE of 57 A-InDels were 1-1.32 × 10-23 and 0.999965, respectively. Consequently, the novel 6-dye direct and multiplex PCR-CE-based typing system could be considered as the reliable and robust tool for human identification and intercontinental population differentiation, and supplied additional information for kinship analysis in the 1KG populations and Hainan Li group.
Collapse
Affiliation(s)
- Haoliang Fan
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
- School of Basic Medicine and Life Science, Hainan Medical University, Haikou, China
| | - Yitong He
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Shuanglin Li
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Qiqian Xie
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Fenfen Wang
- First Clinical Medical College, Hainan Medical University, Haikou, China
| | - Zhengming Du
- First Clinical Medical College, Hainan Medical University, Haikou, China
| | - Yating Fang
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Pingming Qiu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Bofeng Zhu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
- Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
4
|
Chen J, Guo JT. Structural and functional analysis of somatic coding and UTR indels in breast and lung cancer genomes. Sci Rep 2021; 11:21178. [PMID: 34707120 PMCID: PMC8551294 DOI: 10.1038/s41598-021-00583-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/14/2021] [Indexed: 11/24/2022] Open
Abstract
Insertions and deletions (Indels) represent one of the major variation types in the human genome and have been implicated in diseases including cancer. To study the features of somatic indels in different cancer genomes, we investigated the indels from two large samples of cancer types: invasive breast carcinoma (BRCA) and lung adenocarcinoma (LUAD). Besides mapping somatic indels in both coding and untranslated regions (UTRs) from the cancer whole exome sequences, we investigated the overlap between these indels and transcription factor binding sites (TFBSs), the key elements for regulation of gene expression that have been found in both coding and non-coding sequences. Compared to the germline indels in healthy genomes, somatic indels contain more coding indels with higher than expected frame-shift (FS) indels in cancer genomes. LUAD has a higher ratio of deletions and higher coding and FS indel rates than BRCA. More importantly, these somatic indels in cancer genomes tend to locate in sequences with important functions, which can affect the core secondary structures of proteins and have a bigger overlap with predicted TFBSs in coding regions than the germline indels. The somatic CDS indels are also enriched in highly conserved nucleotides when compared with germline CDS indels.
Collapse
Affiliation(s)
- Jing Chen
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
5
|
Roberts R, Fair J. Genetics, its role in preventing the pandemic of coronary artery disease. Clin Cardiol 2021; 44:771-779. [PMID: 34080689 PMCID: PMC8207986 DOI: 10.1002/clc.23627] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/23/2021] [Accepted: 04/30/2021] [Indexed: 01/14/2023] Open
Abstract
Epidemiologists have claimed for decades that about 50% of predisposition for coronary artery disease (CAD) is genetic. Advances in technology made possible the discovery of hundreds of genetic risk variants predisposing to CAD. Multiple clinical trials have shown that cardiac events can be prevented by drugs to lower plasma low-density lipoprotein cholesterol (LDL-C). A major barrier to primary prevention is the lack of markers to identify those individuals at risk prior to the development of symptoms of the disease. Conventional risk factors are age-dependent, occurring mostly in the sixth or seventh decade, which is less than desirable for early primary prevention. A polygenic risk score, derived from the number of genetic risk variants predisposing to CAD inherited by an individual, has been evaluated in over 1 million individuals. The risk for CAD is stratified into high, intermediate, and low. Polygenic risk scores derived from retrospective genotyping of several clinical trials evaluating the effect of statin therapy or PCSK9 inhibitors show the genetic risk is reduced 40%-50% by decreasing plasma LDL-C. Prospective randomized placebo-controlled clinical trials document a 40%-50% reduction in cardiac events in individuals at high genetic risk associated with favorable lifestyle changes and increased physical activity. The polygenic risk score is not age-dependent and remains the same throughout life. Thus, the GRS is superior to conventional risk factors in identifying asymptomatic individuals at risk for CAD early in life for primary prevention. These results indicate clinical embracement of the GRS in primary prevention would be a paradigm shift in the treatment of the number one killer, CAD.
Collapse
Affiliation(s)
- Robert Roberts
- College of Medicine, Phoenix, St. Joseph's Hospital and Medical Center, The University of Arizona, Phoenix, Arizona, USA
| | - Jacques Fair
- College of Medicine, Phoenix, St. Joseph's Hospital and Medical Center, The University of Arizona, Phoenix, Arizona, USA
| |
Collapse
|
6
|
Genetic variation in the Mauritian cynomolgus macaque population reflects variation in the human population. Gene 2021; 787:145648. [PMID: 33848572 DOI: 10.1016/j.gene.2021.145648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/23/2021] [Accepted: 04/07/2021] [Indexed: 11/21/2022]
Abstract
The cynomolgus macaque is an important species for preclinical research, however the extent of genetic variation in this population and its similarity to the human population is not well understood. Exome sequencing was conducted for 101 cynomolgus macaques to characterize genetic variation. The variant distribution frequency was 7.81 variants per kilobase across the sequenced regions, with a total of 2,770,009 single nucleotide variants identified from 2,996,041 loci. A large portion (85.6%) had minor allele frequencies greater than 5%. Enriched pathways for genes with high genetic diversity (≥10 variants per kilobase) were those involving signaling peptides and immune response. Compared to human, the variant distribution frequency and nucleotide diversity in the macaque exome was approximately 4 times greater; however the ratio of non-synonymous to synonymous variants was similar (0.735 and 0.831, respectively). Understanding genetic variability in cynomolgus macaques will enable better interpretation and human translation of phenotypic variability in this species.
Collapse
|
7
|
Roberts R, Chang CC. A Journey through Genetic Architecture and Predisposition of Coronary Artery Disease. Curr Genomics 2020; 21:382-398. [PMID: 33093801 PMCID: PMC7536803 DOI: 10.2174/1389202921999200630145241] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 05/18/2020] [Accepted: 05/26/2020] [Indexed: 01/14/2023] Open
Abstract
Introduction To halt the spread of coronary artery disease (CAD), the number one killer in the world, requires primary prevention. Fifty percent of all Americans are expected to experience a cardiac event; the challenge is identifying those at risk. 40 to 60% of predisposition to CAD is genetic. The first genetic risk variant, 9p21, was discovered in 2007. Genome-Wide Association Studies has since discovered hundreds of genetic risk variants. The genetic burden for CAD can be expressed as a single number, Genetic Risk Score (GRS). Assessment of GRS to risk stratify for CAD was superior to conventional risk factors in several large clinical trials assessing statin therapy, and more recently in a population of nearly 500,000 (UK Biobank). Studies were performed based on prospective genetic risk stratification for CAD. These studies showed that a favorable lifestyle was associated with a 46% reduction in cardiac events and programmed exercise, a 50% reduction in cardiac events. Genetic risk score is superior to conventional risk factors, and is markedly attenuated by lifestyle changes and drug therapy. Genetic risk can be determined at birth or any time thereafter. Conclusion Utilizing the GRS to risk stratify young, asymptomatic individuals could provide a paradigm shift in the primary prevention of CAD and significantly halt its spread.
Collapse
Affiliation(s)
- Robert Roberts
- 1Cardiovascular Genomics & Genetics, University of Arizona, College of Medicine, Phoenix, AZ, USA; 2Cardiovascular Genomics & Genetics, Phoenix, AZ, USA
| | - Chih Chao Chang
- 1Cardiovascular Genomics & Genetics, University of Arizona, College of Medicine, Phoenix, AZ, USA; 2Cardiovascular Genomics & Genetics, Phoenix, AZ, USA
| |
Collapse
|
8
|
Liu Y, Jin X, Lan Q, Zhao C, Xu H, Xie T, Lan J, Tai Y, Zhu B. Forensic characteristic and population structure dissection of Shaanxi Han population in the light of diallelic deletion/insertion polymorphism data. Genomics 2020; 112:3837-3845. [PMID: 32574833 DOI: 10.1016/j.ygeno.2020.06.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 06/15/2020] [Accepted: 06/17/2020] [Indexed: 12/08/2022]
Abstract
The genetic polymorphisms of diallelic deletion/insertion polymorphic (DIP) loci in the Shaanxi Han population are still not clearly characterized. Herein, allele frequencies and forensic application efficiencies for 30 diallelic DIP loci were investigated in 506 unrelated healthy Han individuals from Chinese Shaanxi province. Based on population data of the same 30 diallelic DIP loci, the genetic differentiations, hierarchical clustering relationships and population architectures among Shaanxi Han and other 50 populations were further dissected through genetic and bioinformatics analyses. Results indicated that most of the 30 diallelic DIP loci were relatively high polymorphisms in the Shaanxi Han population; and there were the genetically intimate relationships between Shaanxi Han and the East Asian populations. In summary, this study provided significant insights into genetic background of Shaanxi Han population, and the multiplex amplification of these 30 diallelic DIP loci was appropriate for forensic individual identification and population genetic research in Shaanxi Han population.
Collapse
Affiliation(s)
- Yanfang Liu
- Multi-Omics Innovative Research Center of Forensic Identification; Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Xiaoye Jin
- Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, 710004 Xi'an, China; Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, 710004, Xi'an, China; College of Forensic Medicine, Xi'an Jiaotong University Health Science Center, Xi'an, 710061, China
| | - Qiong Lan
- Multi-Omics Innovative Research Center of Forensic Identification; Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Congying Zhao
- Multi-Omics Innovative Research Center of Forensic Identification; Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Hui Xu
- Multi-Omics Innovative Research Center of Forensic Identification; Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Tong Xie
- Multi-Omics Innovative Research Center of Forensic Identification; Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Jiangwei Lan
- Multi-Omics Innovative Research Center of Forensic Identification; Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Yunchun Tai
- Multi-Omics Innovative Research Center of Forensic Identification; Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
| | - Bofeng Zhu
- Multi-Omics Innovative Research Center of Forensic Identification; Department of Forensic Genetics, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China; Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi'an Jiaotong University, 710004 Xi'an, China; Clinical Research Center of Shaanxi Province for Dental and Maxillofacial Diseases, College of Stomatology, Xi'an Jiaotong University, 710004, Xi'an, China.
| |
Collapse
|
9
|
Whole genome detection of sequence and structural polymorphism in six diverse horses. PLoS One 2020; 15:e0230899. [PMID: 32271776 PMCID: PMC7144971 DOI: 10.1371/journal.pone.0230899] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 03/12/2020] [Indexed: 12/30/2022] Open
Abstract
The domesticated horse has played a unique role in human history, serving not just as a source of animal protein, but also as a catalyst for long-distance migration and military conquest. As a result, the horse developed unique physiological adaptations to meet the demands of both their climatic environment and their relationship with man. Completed in 2009, the first domesticated horse reference genome assembly (EquCab 2.0) produced most of the publicly available genetic variations annotations in this species. Yet, there are around 400 geographically and physiologically diverse breeds of horse. To enrich the current collection of genetic variants in the horse, we sequenced whole genomes from six horses of six different breeds: an American Miniature, a Percheron, an Arabian, a Mangalarga Marchador, a Native Mongolian Chakouyi, and a Tennessee Walking Horse, and mapped them to EquCab3.0 genome. Aside from extreme contrasts in body size, these breeds originate from diverse global locations and each possess unique adaptive physiology. A total of 1.3 billion reads were generated for the six horses with coverage between 15x to 24x per horse. After applying rigorous filtration, we identified and functionally annotated 17,514,723 Single Nucleotide Polymorphisms (SNPs), and 1,923,693 Insertions/Deletions (INDELs), as well as an average of 1,540 Copy Number Variations (CNVs) and 3,321 Structural Variations (SVs) per horse. Our results revealed putative functional variants including genes associated with size variation like LCORL gene (found in all horses), ZFAT in the Arabian, American Miniature and Percheron horses and ANKRD1 in the Native Mongolian Chakouyi horse. We detected a copy number variation in the Latherin gene that may be the result of evolutionary selection impacting thermoregulation by sweating, an important component of athleticism and heat tolerance. The newly discovered variants were formatted into user-friendly browser tracks and will provide a foundational database for future studies of the genetic underpinnings of diverse phenotypes within the horse. The domesticated horse played a unique role in human history, serving not just as a source of dietary animal protein, but also as a catalyst for long-distance migration and military conquest. As a result, the horse developed unique physiological adaptations to meet the demands of both their climatic environment and their relationship with man. Although the completion of the horse reference genome allowed for the discovery of many genetic variants, the remarkable diversity across breeds of horse calls for additional effort to quantify the complete span of genetic polymorphism within this unique species. In this work, we present genome re-sequencing and variant detection analysis for six horses belonging to six different breeds representing different morphology, origins and vary in their physiological demands and response. We identified and annotated not just single nucleotide polymorphisms (SNPs), but also insertions and deletions (INDELs), copy number variations (CNVs) and structural variations (SVs). Our results illustrate novel sources of polymorphism and highlight potentially impactful variations for phenotypes of body size and conformation. We also detected a copy number loss in the Latherin gene that could be the result of an evolutionary selection affecting thermoregulation through sweating. Our newly discovered variants were formatted into easy-to-use tracks that can be easily accessed by researchers around the globe.
Collapse
|
10
|
Wang S, Yi X, Wu M, Zhao H, Liu S, Pan Y, Li Q, Tang X, Zhu Y, Sun X. Detection of key gene InDels in TGF-β pathway and its relationship with growth traits in four sheep breeds. Anim Biotechnol 2019; 32:194-204. [PMID: 31625451 DOI: 10.1080/10495398.2019.1675682] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
TGF-β signaling pathway plays an important role in regulating cell proliferation and differentiation, embryonic development, bone formation, etc. LTBP1, THBS1, SMAD4 and other genes are important members of TGF-β signaling pathway. LTBP1 binds to TGF-β, while THBS1 binds to LTBP1, which is an important signal transduction molecule in the TGF-β pathway. In order to explore the effects of the insertion/deletion variation of three genes (LTBP1, THBS1, SMAD4) in the TGF-β signaling pathway on the growth traits such as body length and body weight of sheep, a total of 625 healthy individuals from 4 breeds of the Tong sheep, Hu sheep, small-tail Han sheep and Lanzhou fat-tail sheep were identified and analyzed. In this study, we identified 4 InDel loci: one loci of LTBP1, two loci of THBS1, and one loci of SMAD4, respectively named as: InDel-1 (deletion 13 bp), InDel-2 (deletion 16 bp), InDel-3 (deletion 22 bp), InDel-4 (deletion 7 bp). Among the 4 analyzed breeds, association analysis showed that all new InDel polymorphisms were significantly associated with 10 different growth traits (p < 0.05), which may provide a theoretical basis for sheep breeding to accelerate the progression of marker-assisted selection in sheep breeding.
Collapse
Affiliation(s)
- Shuhui Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Xiaohua Yi
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Mingli Wu
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Haidong Zhao
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Shirong Liu
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Yun Pan
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Qi Li
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Xiaoqin Tang
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Yanjiao Zhu
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China
| | - Xiuzhu Sun
- College of Animal Science and Technology, Northwest A&F University, Yangling, Shaanxi, P. R. China.,College of Grassland Agriculture, Northwest A&F University, Yangling, Shaanxi, P. R. China
| |
Collapse
|
11
|
Prediction and management of CAD risk based on genetic stratification. Trends Cardiovasc Med 2019; 30:328-334. [PMID: 31543237 DOI: 10.1016/j.tcm.2019.08.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 08/01/2019] [Accepted: 08/20/2019] [Indexed: 12/24/2022]
Abstract
Discovery of genetic risk variants for CAD and their assembly on a computerized microarray enables a genetic risk score (GRS) to be expressed as a single number. Utilizing this array, genetic risk stratification has been performed in over 1 million cases and controls. The genetic score based on one's DNA can be determined anytime from birth on and is independent of age and conventional risk factors. Utilizing the GRS, one can select those at highest risk and would benefit most from primary prevention. Clinical trials have shown that modifying lifestyle or using statin therapy reduces the risk for CAD by approximately 50%. The use of the GRS for primary prevention will have a transformative effect on preventing the spread of CAD.
Collapse
|
12
|
Hasan MS, Wu X, Zhang L. Uncovering missed indels by leveraging unmapped reads. Sci Rep 2019; 9:11093. [PMID: 31366961 PMCID: PMC6668410 DOI: 10.1038/s41598-019-47405-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 07/12/2019] [Indexed: 02/08/2023] Open
Abstract
In current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic variants. Although existing alignment tools have shown great accuracy in mapping short reads to the reference genome, a significant number of short reads still remain unmapped and are often excluded from downstream analyses thereby causing nonnegligible information loss in the subsequent variant calling procedure. This paper describes Genesis-indel, a computational pipeline that explores the unmapped reads to identify novel indels that are initially missed in the original procedure. Genesis-indel is applied to the unmapped reads of 30 breast cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-indel identifies 72,997 novel high-quality indels previously not found, among which 16,141 have not been annotated in the widely used mutation database. Statistical analysis of these indels shows significant enrichment of indels residing in oncogenes and tumour suppressor genes. Functional annotation further reveals that these indels are strongly correlated with pathways of cancer and can have high to moderate impact on protein functions. Additionally, some of the indels overlap with the genes that do not have any indel mutations called from the originally mapped reads but have been shown to contribute to the tumorigenesis in multiple carcinomas, further emphasizing the importance of rescuing indels hidden in the unmapped reads in cancer and disease studies.
Collapse
Affiliation(s)
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA.
| |
Collapse
|
13
|
Fuertes MA, Rodrigo JR, Alonso C. Conserved Critical Evolutionary Gene Structures in Orthologs. J Mol Evol 2019; 87:93-105. [PMID: 30815710 DOI: 10.1007/s00239-019-09889-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 02/13/2019] [Indexed: 12/18/2022]
Abstract
Unravelling gene structure requires the identification and understanding of the constraints that are often associated with the evolutionary history and functional domains of genes. We speculated in this manuscript with the possibility of the existence in orthologs of an emergent highly conserved gene structure that might explain their coordinated evolution during speciation events and their parental function. Here, we will address the following issues: (1) is there any conserved hypothetical structure along ortholog gene sequences? (2) If any, are such conserved structures maintained and conserved during speciation events? The data presented show evidences supporting this hypothesis. We have found that, (1) most orthologs studied share highly conserved compositional structures not observed previously. (2) While the percent identity of nucleotide sequences of orthologs correlates with the percent identity of composon sequences, the number of emergent compositional structures conserved during speciation does not correlate with the percent identity. (3) A broad range of species conserves the emergent compositional stretches. We will also discuss the concept of critical gene structure.
Collapse
Affiliation(s)
- Miguel A Fuertes
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain.
| | | | - Carlos Alonso
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain
| |
Collapse
|
14
|
Lin M, Whitmire S, Chen J, Farrel A, Shi X, Guo JT. Effects of short indels on protein structure and function in human genomes. Sci Rep 2017; 7:9313. [PMID: 28839204 PMCID: PMC5570956 DOI: 10.1038/s41598-017-09287-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 07/24/2017] [Indexed: 01/20/2023] Open
Abstract
Insertions and deletions (indels) represent the second most common type of genetic variations in human genomes. Indels can be deleterious and contribute to disease susceptibility as recent genome sequencing projects revealed a large number of indels in various cancer types. In this study, we investigated the possible effects of small coding indels on protein structure and function, and the baseline characteristics of indels in 2504 individuals of 26 populations from the 1000 Genomes Project. We found that each population has a distinct pattern in genes with small indels. Frameshift (FS) indels are enriched in olfactory receptor activity while non-frameshift (NFS) indels are enriched in transcription-related proteins. Structural analysis of NFS indels revealed that they predominantly adopt coil or disordered conformations, especially in proteins with transcription-related NFS indels. These results suggest that the annotated coding indels from the 1000 Genomes Project, while contributing to genetic variations and phenotypic diversity, generally do not affect the core protein structures and have no deleterious effect on essential biological processes. In addition, we found that a number of reference genome annotations might need to be updated due to the high prevalence of annotated homozygous indels in the general population.
Collapse
Affiliation(s)
- Maoxuan Lin
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Sarah Whitmire
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jing Chen
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Alvin Farrel
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
15
|
Genetics: Implications for Prevention and Management of Coronary Artery Disease. J Am Coll Cardiol 2017; 68:2797-2818. [PMID: 28007143 DOI: 10.1016/j.jacc.2016.10.039] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Revised: 10/12/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022]
Abstract
An exciting new era has dawned for the prevention and management of coronary artery disease (CAD) utilizing genetic risk variants. The recent identification of over 60 susceptibility loci for CAD confirms not only the importance of established risk factors, but also the existence of many novel causal pathways that are expected to improve our understanding of the genetic basis of CAD and facilitate the development of new therapeutic agents over time. Concurrently, Mendelian randomization studies have provided intriguing insights on the causal relationship between CAD-related traits, and highlight the potential benefits of long-term modifications of risk factors. Last, genetic risk scores of CAD may serve not only as prognostic, but also as predictive markers, and carry the potential to considerably improve the delivery of established prevention strategies. This review will summarize the evolution and discovery of genetic risk variants for CAD and their current and future clinical applications.
Collapse
|
16
|
Abstract
Deciphering the genetic basis of human disease requires a comprehensive knowledge of genetic variants irrespective of their class or frequency. Although an impressive number of human genetic variants have been catalogued, a large fraction of the genetic difference that distinguishes two human genomes is still not understood at the base-pair level. This is because the emphasis has been on single-nucleotide variation as opposed to less tractable and more complex genetic variants, including indels and structural variants. The latter, we propose, will have a large impact on human phenotypes but require a more systematic assessment of genomes at deeper coverage and alternate sequencing and mapping technologies.
Collapse
|
17
|
Huddleston J, Chaisson MJP, Steinberg KM, Warren W, Hoekzema K, Gordon D, Graves-Lindsay TA, Munson KM, Kronenberg ZN, Vives L, Peluso P, Boitano M, Chin CS, Korlach J, Wilson RK, Eichler EE. Discovery and genotyping of structural variation from long-read haploid genome sequence data. Genome Res 2016; 27:677-685. [PMID: 27895111 PMCID: PMC5411763 DOI: 10.1101/gr.214007.116] [Citation(s) in RCA: 227] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 11/15/2016] [Indexed: 01/07/2023]
Abstract
In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
Collapse
Affiliation(s)
- John Huddleston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Karyn Meltz Steinberg
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Wes Warren
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Tina A Graves-Lindsay
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Laura Vives
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Paul Peluso
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Matthew Boitano
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Chen-Shin Chin
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Jonas Korlach
- Pacific Biosciences of California, Incorporated, Menlo Park, California 94025, USA
| | - Richard K Wilson
- Department of Pathology, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
18
|
Spinks PQ, Thomson RC, McCartney-Melstad E, Shaffer HB. Phylogeny and temporal diversification of the New World pond turtles (Emydidae). Mol Phylogenet Evol 2016; 103:85-97. [DOI: 10.1016/j.ympev.2016.07.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 06/03/2016] [Accepted: 07/07/2016] [Indexed: 11/16/2022]
|
19
|
Wajnberg G, Passetti F. Using high-throughput sequencing transcriptome data for INDEL detection: challenges for cancer drug discovery. Expert Opin Drug Discov 2016; 11:257-68. [PMID: 26787005 DOI: 10.1517/17460441.2016.1143813] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION A cancer cell is a mosaic of genomic and epigenomic alterations. Distinct cancer molecular signatures can be observed depending on tumor type or patient genetic background. One type of genomic alteration is the insertion and/or deletion (INDEL) of nucleotides in the DNA sequence, which may vary in length, and may change the encoded protein or modify protein domains. INDELs are associated to a large number of diseases and their detection is done based on low-throughput techniques. However, high-throughput sequencing has also started to be used for detection of novel disease-causing INDELs. This search may identify novel drug targets. AREAS COVERED This review presents examples of using high-throughput sequencing (DNA-Seq and RNA-Seq) to investigate the incidence of INDELs in coding regions of human genes. Some of these examples successfully utilized RNA-Seq to identify INDELs associated to diseases. In addition, other studies have described small INDELs related to chemo-resistance or poor outcome of patients, while structural variants were associated with a better clinical outcome. EXPERT OPINION On average, there is twice as much RNA-Seq data available at the most used repositories for such data compared to DNA-Seq. Therefore, using RNA-Seq data is a promising strategy for studying cancer samples with unknown mechanisms of drug resistance, aiming at the discovery of proteins with potential as novel drug targets.
Collapse
Affiliation(s)
- Gabriel Wajnberg
- a Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute , Fundação Oswaldo Cruz (FIOCRUZ) , Rio de Janeiro , RJ , Brazil
| | - Fabio Passetti
- a Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute , Fundação Oswaldo Cruz (FIOCRUZ) , Rio de Janeiro , RJ , Brazil
| |
Collapse
|
20
|
Bobilev AM, McDougal ME, Taylor WL, Geisert EE, Netland PA, Lauderdale JD. Assessment of PAX6 alleles in 66 families with aniridia. Clin Genet 2016; 89:669-77. [PMID: 26661695 DOI: 10.1111/cge.12708] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 12/03/2015] [Accepted: 12/04/2015] [Indexed: 12/18/2022]
Abstract
We report on PAX6 alleles associated with a clinical diagnosis of classical aniridia in 81 affected individuals representing 66 families. Allelic variants expected to affect PAX6 function were identified in 61 families (76 individuals). Ten cases of sporadic aniridia (10 families) had complete (8 cases) or partial (2 cases) deletion of the PAX6 gene. Sequence changes that introduced a premature termination codon into the open reading frame of PAX6 occurred in 47 families (62 individuals). Three individuals with sporadic aniridia (three families) had sequence changes (one deletion, two run-on mutations) expected to result in a C-terminal extension. An intronic deletion of unknown functional significance was detected in one case of sporadic aniridia (one family), but not in unaffected relatives. Within these 61 families, single nucleotide substitutions accounted for 30/61 (49%), indels for 23/61 (38%), and complete deletion of the PAX6 locus for 8/61 (13%). In five cases of sporadic aniridia (five families), no disease-causing mutation in the coding region was detected. In total, 23 unique variants were identified that have not been reported in the Leiden Open Variation Database (LOVD) database. Within the group assessed, 92% had sequence changes expected to reduce PAX6 function, confirming the primacy of PAX6 haploinsufficiency as causal for aniridia.
Collapse
Affiliation(s)
- A M Bobilev
- Neuroscience Division of the Biomedical and Health Sciences Institute, The University of Georgia, Athens, GA, USA
| | - M E McDougal
- Department of Cellular Biology, The University of Georgia, Athens, GA, USA
| | - W L Taylor
- Molecular Resource Center, The University of Tennessee Health Science Center, Memphis, TN, USA
| | - E E Geisert
- Department of Ophthalmology in the Hamilton Eye Institute, The University of Tennessee Health Science Center, Memphis, TN, USA
| | - P A Netland
- Molecular Resource Center, The University of Tennessee Health Science Center, Memphis, TN, USA
| | - J D Lauderdale
- Neuroscience Division of the Biomedical and Health Sciences Institute, The University of Georgia, Athens, GA, USA.,Department of Cellular Biology, The University of Georgia, Athens, GA, USA
| |
Collapse
|
21
|
PExFInS: An Integrative Post-GWAS Explorer for Functional Indels and SNPs. Sci Rep 2015; 5:17302. [PMID: 26612672 PMCID: PMC4661514 DOI: 10.1038/srep17302] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 10/28/2015] [Indexed: 12/22/2022] Open
Abstract
Expression quantitative trait loci (eQTLs) mapping and linkage disequilibrium (LD) analysis have been widely employed to interpret findings of genome-wide association studies (GWAS). With the availability of deep sequencing data of 423 lymphoblastoid cell lines (LCLs) from six global populations and the microarray expression data, we performed eQTL analysis, identified more than 228 K SNP cis-eQTLs and 21 K indel cis-eQTLs and generated a LCL cis-eQTL database. We demonstrate that the percentages of population-shared and population-specific cis-eQTLs are comparable; while indel cis-eQTLs in the population-specific subsection make more contribution to gene expression variations than those in the population-shared subsection. We found cis-eQTLs, especially the population-shared cis-eQTLs are significantly enriched toward transcription start site. Moreover, the National Human Genome Research Institute cataloged GWAS SNPs are enriched for LCL cis-eQTLs. Specifically, 32.8% GWAS SNPs are LCL cis-eQTLs, among which 12.5% can be tagged by indel cis-eQTLs, suggesting the fundamental contribution of indel cis-eQTLs to GWAS association signals. To search for functional indels and SNPs tagging GWAS SNPs, a pipeline Post-GWAS Explorer for Functional Indels and SNPs (PExFInS) has been developed, integrating LD analysis, functional annotation from public databases, cis-eQTL mapping with our LCL cis-eQTL database and other published cis-eQTL datasets.
Collapse
|
22
|
Hasan MS, Wu X, Zhang L. Performance evaluation of indel calling tools using real short-read data. Hum Genomics 2015; 9:20. [PMID: 26286629 PMCID: PMC4545535 DOI: 10.1186/s40246-015-0042-2] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Accepted: 07/20/2015] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Insertion and deletion (indel), a common form of genetic variation, has been shown to cause or contribute to human genetic diseases and cancer. With the advance of next-generation sequencing technology, many indel calling tools have been developed; however, evaluation and comparison of these tools using large-scale real data are still scant. Here we evaluated seven popular and publicly available indel calling tools, GATK Unified Genotyper, VarScan, Pindel, SAMtools, Dindel, GTAK HaplotypeCaller, and Platypus, using 78 human genome low-coverage data from the 1000 Genomes project. RESULTS Comparing indels called by these tools with a known set of indels, we found that Platypus outperforms other tools. In addition, a high percentage of known indels still remain undetected and the number of common indels called by all seven tools is very low. CONCLUSION All these findings indicate the necessity of improving the existing tools or developing new algorithms to achieve reliable and consistent indel calling results.
Collapse
Affiliation(s)
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA.
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA.
| |
Collapse
|
23
|
Zhang G, Wang J, Yang J, Li W, Deng Y, Li J, Huang J, Hu S, Zhang B. Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling. BMC Genomics 2015; 16:581. [PMID: 26242175 PMCID: PMC4524363 DOI: 10.1186/s12864-015-1796-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 07/23/2015] [Indexed: 12/30/2022] Open
Abstract
Background To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Results Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3 % in four samples, whereas the concordance of co-detected variant loci reached 99 %. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5 %) was higher than the SNPs specific to TargetSeq-Proton (60.0 %) or specific to SureSelect-HiSeq (88.3 %). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0 %) and SureSelect-HiSeq-specific (89.6 %) were higher than those of TargetSeq-Proton-specific (15.8 %). Conclusions In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1796-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Guoqiang Zhang
- Core Genomic Facility and CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jianfeng Wang
- Core Genomic Facility and CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jin Yang
- Core Genomic Facility and CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Wenjie Li
- Core Genomic Facility and CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Yutian Deng
- Core Genomic Facility and CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing Li
- Core Genomic Facility and CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jun Huang
- Core Genomic Facility and CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Songnian Hu
- Core Genomic Facility and CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Bing Zhang
- Core Genomic Facility and CAS Key Laboratory of Genome Sciences & Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China.
| |
Collapse
|
24
|
Lim JQ, Tennakoon C, Guan P, Sung WK. BatAlign: an incremental method for accurate alignment of sequencing reads. Nucleic Acids Res 2015; 43:e107. [PMID: 26170239 PMCID: PMC4652746 DOI: 10.1093/nar/gkv533] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 05/09/2015] [Indexed: 11/12/2022] Open
Abstract
Structural variations (SVs) play a crucial role in genetic diversity. However, the alignments of reads near/across SVs are made inaccurate by the presence of polymorphisms. BatAlign is an algorithm that integrated two strategies called 'Reverse-Alignment' and 'Deep-Scan' to improve the accuracy of read-alignment. In our experiments, BatAlign was able to obtain the highest F-measures in read-alignments on mismatch-aberrant, indel-aberrant, concordantly/discordantly paired and SV-spanning data sets. On real data, the alignments of BatAlign were able to recover 4.3% more PCR-validated SVs with 73.3% less callings. These suggest BatAlign to be effective in detecting SVs and other polymorphic-variants accurately using high-throughput data. BatAlign is publicly available at https://goo.gl/a6phxB.
Collapse
Affiliation(s)
- Jing-Quan Lim
- Department of Computer Science, National University of Singapore, Singapore 117417 Laboratory of Cancer Epigenome, Division of Medical Sciences, National Cancer Centre Singapore, Singapore 169610
| | - Chandana Tennakoon
- Department of Computer Science, National University of Singapore, Singapore 117417 NUS Graduate School for Integrative Sciences and Engineering, (CeLS), #05-01, 28 Medical Drive, Singapore 117456 Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672 UAE University, PO Box 17551, Al Ain, UAE
| | - Peiyong Guan
- Department of Computer Science, National University of Singapore, Singapore 117417
| | - Wing-Kin Sung
- Department of Computer Science, National University of Singapore, Singapore 117417 Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672
| |
Collapse
|
25
|
Kloosterman WP, Francioli LC, Hormozdiari F, Marschall T, Hehir-Kwa JY, Abdellaoui A, Lameijer EW, Moed MH, Koval V, Renkens I, van Roosmalen MJ, Arp P, Karssen LC, Coe BP, Handsaker RE, Suchiman ED, Cuppen E, Thung DT, McVey M, Wendl MC, Uitterlinden A, van Duijn CM, Swertz MA, Wijmenga C, van Ommen GB, Slagboom PE, Boomsma DI, Schönhuth A, Eichler EE, de Bakker PIW, Ye K, Guryev V. Characteristics of de novo structural changes in the human genome. Genome Res 2015; 25:792-801. [PMID: 25883321 PMCID: PMC4448676 DOI: 10.1101/gr.185041.114] [Citation(s) in RCA: 94] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 04/01/2015] [Indexed: 11/29/2022]
Abstract
Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1-20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations.
Collapse
Affiliation(s)
- Wigard P Kloosterman
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht 3584CG, The Netherlands
| | - Laurent C Francioli
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht 3584CG, The Netherlands
| | - Fereydoun Hormozdiari
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, USA
| | - Tobias Marschall
- Life Sciences Group, Centrum voor Wiskunde en Informatica, Amsterdam 1098XG, The Netherlands
| | - Jayne Y Hehir-Kwa
- Department of Human Genetics, Radboud University Medical Center, Nijmegen 6525GA, The Netherlands
| | - Abdel Abdellaoui
- Department of Biological Psychology, VU University Amsterdam, Amsterdam 1081BT, The Netherlands
| | - Eric-Wubbo Lameijer
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden 2300RC, The Netherlands
| | - Matthijs H Moed
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden 2300RC, The Netherlands
| | - Vyacheslav Koval
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam 3000CA, The Netherlands
| | - Ivo Renkens
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht 3584CG, The Netherlands
| | - Markus J van Roosmalen
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht 3584CG, The Netherlands
| | - Pascal Arp
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam 3000CA, The Netherlands
| | - Lennart C Karssen
- Department of Epidemiology, Erasmus Medical Center, Rotterdam 3000CA, The Netherlands
| | - Bradley P Coe
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, USA
| | - Robert E Handsaker
- Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Eka D Suchiman
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden 2300RC, The Netherlands
| | - Edwin Cuppen
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht 3584CG, The Netherlands
| | - Djie Tjwan Thung
- Department of Human Genetics, Radboud University Medical Center, Nijmegen 6525GA, The Netherlands
| | - Mitch McVey
- Department of Biology, Tufts University, Medford, Massachusetts 02115, USA
| | - Michael C Wendl
- The Genome Institute, Washington University, St. Louis, Missouri 63108, USA; Department of Mathematics, Washington University, St. Louis, Missouri 63108, USA
| | - André Uitterlinden
- Department of Internal Medicine, Erasmus Medical Center, Rotterdam 3000CA, The Netherlands; Department of Epidemiology, Erasmus Medical Center, Rotterdam 3000CA, The Netherlands
| | - Cornelia M van Duijn
- Department of Epidemiology, Erasmus Medical Center, Rotterdam 3000CA, The Netherlands
| | - Morris A Swertz
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen 9700RB, The Netherlands; Genomics Coordination Center, University of Groningen, University Medical Center Groningen, Groningen 9700RB, The Netherlands
| | - Cisca Wijmenga
- Department of Genetics, University of Groningen, University Medical Center Groningen, Groningen 9700RB, The Netherlands; Genomics Coordination Center, University of Groningen, University Medical Center Groningen, Groningen 9700RB, The Netherlands
| | - GertJan B van Ommen
- Department of Human Genetics, Leiden University Medical Center, Leiden 2300RC, The Netherlands
| | - P Eline Slagboom
- Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden 2300RC, The Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, VU University Amsterdam, Amsterdam 1081BT, The Netherlands
| | - Alexander Schönhuth
- Life Sciences Group, Centrum voor Wiskunde en Informatica, Amsterdam 1098XG, The Netherlands
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, USA
| | - Paul I W de Bakker
- Department of Medical Genetics, Center for Molecular Medicine, University Medical Center Utrecht, Utrecht 3584CG, The Netherlands; Department of Epidemiology, University Medical Center Utrecht, Utrecht 3584CG, The Netherlands
| | - Kai Ye
- The Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, Groningen 9713AD, The Netherlands
| |
Collapse
|
26
|
Roberts R. A genetic basis for coronary artery disease. Trends Cardiovasc Med 2014; 25:171-8. [PMID: 25453988 DOI: 10.1016/j.tcm.2014.10.008] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Revised: 10/10/2014] [Accepted: 10/10/2014] [Indexed: 01/29/2023]
Abstract
CAD and cancer account for over one-half of all deaths in the world. It is claimed that the 21st century is the last century for CAD. This is, in part, because CAD is preventable based on randomized, placebo-controlled clinical trials, which show modifying known risk factors such as cholesterol is associated consistently with 40-60% reduction in morbidity and mortality from CAD. Comprehensive prevention will require modifying genetic risk factors that are claimed to account for 40-60% of predisposition to CAD. The 21st century is meeting this challenge with over 50 genetic risk variants discovered and replicated in large genome-wide association studies involving over 200,000 cases and controls. Similarly, 157 genetic variants have been discovered that regulate plasma lipids including, LDL-C, HDL-C, triglycerides, and total cholesterol. A major finding from these studies is that only 15 of the 50 genetic variants for CAD act through known risk factors. Hence, the pathogenesis of CAD in addition to cholesterol and other known risk factors is due to various other factors, many of which remain unknown. Secondly, genes regulating the plasma triglyceride levels are strongly associated with the pathogenesis of CAD. Thirdly, Mendelian randomization studies show no protection from genes that increase plasma HDL cholesterol. This is contrary to current opinion. These genetic risk variants have provided new targets for the development of novel therapies to prevent CAD. Already a new and potent drug has been developed targeting PCSK9, which is in phase 3 clinical trials and shows great efficacy and safety for prevention of CAD. The 21st century is looking very bright for the prevention of CAD.
Collapse
Affiliation(s)
- Robert Roberts
- University of Ottawa Heart Institute, Ottawa, Ontario, Canada; Ruddy Canadian Cardiovascular Genetics Centre, Ottawa, Ontario, Canada.
| |
Collapse
|
27
|
Xu H, Deng W, Huang F, Xiao S, Liu G, Liang H. Enhanced DNA toehold exchange reaction on a chip surface to discriminate single-base changes. Chem Commun (Camb) 2014; 50:14171-4. [DOI: 10.1039/c4cc07272c] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
28
|
Booker CS, Grattan DR. Identification of a truncated splice variant of IL-18 receptor alpha in the human and rat, with evidence of wider evolutionary conservation. PeerJ 2014; 2:e560. [PMID: 25250214 PMCID: PMC4168765 DOI: 10.7717/peerj.560] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2014] [Accepted: 08/15/2014] [Indexed: 01/14/2023] Open
Abstract
Interleukin-18 (IL-18) is a pro-inflammatory cytokine which stimulates activation of the nuclear factor kappa beta (NF-κB) pathway via interaction with the IL-18 receptor. The receptor itself is formed from a dimer of two subunits, with the ligand-binding IL-18Rα subunit being encoded by the IL18R1 gene. A splice variant of murine IL18r1, which has been previously described, is formed by transcription of an unspliced intron (forming a ‘type II’ IL18r1 transcript) and is predicted to encode a receptor with a truncated intracellular domain lacking the capacity to generate downstream signalling. In order to examine the relevance of this finding to human IL-18 function, we assessed the presence of a homologous transcript by reverse transcription-polymerase chain reaction (RT-PCR) in the human and rat as another common laboratory animal. We present evidence for type II IL18R1 transcripts in both species. While the mouse and rat transcripts are predicted to encode a truncated receptor with a novel 5 amino acid C-terminal domain, the human sequence is predicted to encode a truncated protein with a novel 22 amino acid sequence bearing resemblance to the ‘Box 1’ motif of the Toll/interleukin-1 receptor (TIR) domain, in a similar fashion to the inhibitory interleukin-1 receptor 2. Given that transcripts from these three species are all formed by inclusion of homologous unspliced intronic regions, an analysis of homologous introns across a wider array of 33 species with available IL18R1 gene records was performed, which suggests similar transcripts may encode truncated type II IL-18Rα subunits in other species. This splice variant may represent a conserved evolutionary mechanism for regulating IL-18 activity.
Collapse
Affiliation(s)
- Chris S Booker
- Centre for Neuroendocrinology, Department of Anatomy, University of Otago , Dunedin , New Zealand
| | - David R Grattan
- Centre for Neuroendocrinology, Department of Anatomy, University of Otago , Dunedin , New Zealand
| |
Collapse
|
29
|
Yan Y, Yi G, Sun C, Qu L, Yang N. Genome-wide characterization of insertion and deletion variation in chicken using next generation sequencing. PLoS One 2014; 9:e104652. [PMID: 25133774 PMCID: PMC4136736 DOI: 10.1371/journal.pone.0104652] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 07/10/2014] [Indexed: 12/30/2022] Open
Abstract
Insertion and deletion (INDEL) is one of the main events contributing to genetic and phenotypic diversity, which receives less attention than SNP and large structural variation. To gain a better knowledge of INDEL variation in chicken genome, we applied next generation sequencing on 12 diverse chicken breeds at an average effective depth of 8.6. Over 1.3 million non-redundant short INDELs (1-49 bp) were obtained, the vast majority (92.48%) of which were novel. Follow-up validation assays confirmed that most (88.00%) of the randomly selected INDELs represent true variations. The majority (95.76%) of INDELs were less than 10 bp. Both the detected number and affected bases were larger for deletions than insertions. In total, INDELs covered 3.8 Mbp, corresponding to 0.36% of the chicken genome. The average genomic INDEL density was estimated as 0.49 per kb. INDELs were ubiquitous and distributed in a non-uniform fashion across chromosomes, with lower INDEL density in micro-chromosomes than in others, and some functional regions like exons and UTRs were prone to less INDELs than introns and intergenic regions. Nearly 620,253 INDELs fell in genic regions, 1,765 (0.28%) of which located in exons, spanning 1,358 (7.56%) unique Ensembl genes. Many of them are associated with economically important traits and some are the homologues of human disease-related genes. We demonstrate that sequencing multiple individuals at a medium depth offers a promising way for reliable identification of INDELs. The coding INDELs are valuable candidates for further elucidation of the association between genotypes and phenotypes. The chicken INDELs revealed by our study can be useful for future studies, including development of INDEL markers, construction of high density linkage map, INDEL arrays design, and hopefully, molecular breeding programs in chicken.
Collapse
Affiliation(s)
- Yiyuan Yan
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Guoqiang Yi
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Congjiao Sun
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Lujiang Qu
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Ning Yang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| |
Collapse
|
30
|
Affiliation(s)
- Robert Roberts
- From the Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario, Canada
| |
Collapse
|
31
|
Haberstick BC, Smolen A, Stetler GL, Tabor JW, Roy T, Rick Casey H, Pardo A, Roy F, Ryals LA, Hewitt C, Whitsel EA, Halpern CT, Killeya-Jones LA, Lessem JM, Hewitt JK, Harris KM. Simple sequence repeats in the national longitudinal study of adolescent health: an ethnically diverse resource for genetic analysis of health and behavior. Behav Genet 2014; 44:487-97. [PMID: 24890516 DOI: 10.1007/s10519-014-9662-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 05/08/2014] [Indexed: 12/16/2022]
Abstract
Simple sequence repeats (SSRs) are one of the earliest available forms of genetic variation available for analysis and have been utilized in studies of neurological, behavioral, and health phenotypes. Although findings from these studies have been suggestive, their interpretation has been complicated by a variety of factors including, among others, limited power due to small sample sizes. The current report details the availability, diversity, and allele and genotype frequencies of six commonly examined SSRs in the ethnically diverse, population-based National Longitudinal Study of Adolescent Health. A total of 106,743 genotypes were generated across 15,140 participants that included four microsatellites and two di-nucleotide repeats in three dopamine genes (DAT1, DRD4, DRD5), the serotonin transporter, and monoamine oxidase A. Allele and genotype frequencies showed a complex pattern and differed significantly between populations. For both di-nucleotide repeats we observed a greater allelic diversity than previously reported. The availability of these six SSRs in a large, ethnically diverse sample with extensive environmental measures assessed longitudinally offers a unique resource for researchers interested in health and behavior.
Collapse
Affiliation(s)
- Brett C Haberstick
- Institute for Behavioral Genetics, University of Colorado Boulder, Campus Box 447, Boulder, CO, 80309-0447, USA,
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Zhang X, Lin H, Zhao H, Hao Y, Mort M, Cooper DN, Zhou Y, Liu Y. Impact of human pathogenic micro-insertions and micro-deletions on post-transcriptional regulation. Hum Mol Genet 2014; 23:3024-34. [PMID: 24436305 DOI: 10.1093/hmg/ddu019] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Small insertions/deletions (INDELs) of ≤21 bp comprise 18% of all recorded mutations causing human inherited disease and are evident in 24% of documented Mendelian diseases. INDELs affect gene function in multiple ways: for example, by introducing premature stop codons that either lead to the production of truncated proteins or affect transcriptional efficiency. However, the means by which they impact post-transcriptional regulation, including alternative splicing, have not been fully evaluated. In this study, we collate disease-causing INDELs from the Human Gene Mutation Database (HGMD) and neutral INDELs from the 1000 Genomes Project. The potential of these two types of INDELs to affect binding-site affinity of RNA-binding proteins (RBPs) was then evaluated. We identified several sequence features that can distinguish disease-causing INDELs from neutral INDELs. Moreover, we built a machine-learning predictor called PinPor (predicting pathogenic small insertions and deletions affecting post-transcriptional regulation, http://watson.compbio.iupui.edu/pinpor/) to ascertain which newly observed INDELs are likely to be pathogenic. Our results show that disease-causing INDELs are more likely to ablate RBP-binding sites and tend to affect more RBP-binding sites than neutral INDELs. Additionally, disease-causing INDELs give rise to greater deviations in binding affinity than neutral INDELs. We also demonstrated that disease-causing INDELs may be distinguished from neutral INDELs by several sequence features, such as their proximity to splice sites and their potential effects on RNA secondary structure. This predictor showed satisfactory performance in identifying numerous pathogenic INDELs, with a Matthews correlation coefficient (MCC) value of 0.51 and an accuracy of 0.75.
Collapse
Affiliation(s)
- Xinjun Zhang
- School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA
| | | | | | | | | | | | | | | |
Collapse
|
33
|
Rockah-Shmuel L, Tóth-Petróczy Á, Sela A, Wurtzel O, Sorek R, Tawfik DS. Correlated occurrence and bypass of frame-shifting insertion-deletions (InDels) to give functional proteins. PLoS Genet 2013; 9:e1003882. [PMID: 24204297 PMCID: PMC3812077 DOI: 10.1371/journal.pgen.1003882] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 09/02/2013] [Indexed: 11/19/2022] Open
Abstract
Short insertions and deletions (InDels) comprise an important part of the natural mutational repertoire. InDels are, however, highly deleterious, primarily because two-thirds result in frame-shifts. Bypass through slippage over homonucleotide repeats by transcriptional and/or translational infidelity is known to occur sporadically. However, the overall frequency of bypass and its relation to sequence composition remain unclear. Intriguingly, the occurrence of InDels and the bypass of frame-shifts are mechanistically related - occurring through slippage over repeats by DNA or RNA polymerases, or by the ribosome, respectively. Here, we show that the frequency of frame-shifting InDels, and the frequency by which they are bypassed to give full-length, functional proteins, are indeed highly correlated. Using a laboratory genetic drift, we have exhaustively mapped all InDels that occurred within a single gene. We thus compared the naive InDel repertoire that results from DNA polymerase slippage to the frame-shifting InDels tolerated following selection to maintain protein function. We found that InDels repeatedly occurred, and were bypassed, within homonucleotide repeats of 3–8 bases. The longer the repeat, the higher was the frequency of InDels formation, and the more frequent was their bypass. Besides an expected 8A repeat, other types of repeats, including short ones, and G and C repeats, were bypassed. Although obtained in vitro, our results indicate a direct link between the genetic occurrence of InDels and their phenotypic rescue, thus suggesting a potential role for frame-shifting InDels as bridging evolutionary intermediates. Homonucleotide repeats are exceptionally prone to insertions and/or deletions of bases (InDels). However, unless they occur in a multiplicity of 3 bases, InDels disrupt the reading frame and are thus expected to be purged from coding regions. Homonucleotide repeats, however, are also vulnerable to slippage by RNA polymerases and the ribosome. Using laboratory evolution techniques, we systematically mapped the occurrence of InDels within a given gene, before and after selection. Our data indicate that frame-shifting InDels were frequently bypassed to give functional proteins at surprisingly high frequencies. Further, we found a strict correlation between the repeat length, the frequency of occurrence of InDels at the DNA level, and the likelihood of bypass by transcriptional/translational slippage. Our results suggest that frame-shifting InDels might comprise functional evolutionary intermediates, and an effective mean of sequence divergence (e.g. when an adjacent InDel restores the frame, resulting in altered sequence and, potentially, in an altered protein structure).
Collapse
Affiliation(s)
- Liat Rockah-Shmuel
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Ágnes Tóth-Petróczy
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Asaf Sela
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Omri Wurtzel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Rotem Sorek
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Dan S. Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
- * E-mail:
| |
Collapse
|
34
|
Kvikstad EM, Duret L. Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome. Mol Biol Evol 2013; 31:23-36. [PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.
Collapse
Affiliation(s)
- Erika M Kvikstad
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, CNRS, Université Lyon 1, Villeurbanne, France
| | | |
Collapse
|
35
|
Zeng F, Jiang R, Chen T. PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for Ion Torrent and 454 data. Bioinformatics 2013; 29:2859-68. [PMID: 23995392 DOI: 10.1093/bioinformatics/btt512] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The identification of short insertions and deletions (indels) and single nucleotide polymorphisms (SNPs) from Ion Torrent and 454 reads is a challenging problem, essentially because these techniques are prone to sequence erroneously at homopolymers and can, therefore, raise indels in reads. Most of the existing mapping programs do not model homopolymer errors when aligning reads against the reference. The resulting alignments will then contain various kinds of mismatches and indels that confound the accurate determination of variant loci and alleles. RESULTS To address these challenges, we realign reads against the reference using our previously proposed hidden Markov model that models homopolymer errors and then merges these pairwise alignments into a weighted alignment graph. Based on our weighted alignment graph and hidden Markov model, we develop a method called PyroHMMvar, which can simultaneously detect short indels and SNPs, as demonstrated in human resequencing data. Specifically, by applying our methods to simulated diploid datasets, we demonstrate that PyroHMMvar produces more accurate results than state-of-the-art methods, such as Samtools and GATK, and is less sensitive to mapping parameter settings than the other methods. We also apply PyroHMMvar to analyze one human whole genome resequencing dataset, and the results confirm that PyroHMMvar predicts SNPs and indels accurately. AVAILABILITY AND IMPLEMENTATION Source code freely available at the following URL: https://code.google.com/p/pyrohmmvar/, implemented in C++ and supported on Linux. .
Collapse
Affiliation(s)
- Feng Zeng
- Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China and Computational Biology and Bioinformatics Program, University of Southern California, Los Angeles, CA 90089, USA
| | | | | |
Collapse
|
36
|
Spinks PQ, Thomson RC, Pauly GB, Newman CE, Mount G, Shaffer HB. Misleading phylogenetic inferences based on single-exemplar sampling in the turtle genus Pseudemys. Mol Phylogenet Evol 2013; 68:269-81. [PMID: 23583419 DOI: 10.1016/j.ympev.2013.03.031] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Revised: 03/05/2013] [Accepted: 03/25/2013] [Indexed: 11/16/2022]
Abstract
Reconstructing species trees for clades containing weakly delimited or incorrectly identified taxa is one of the most serious challenges facing systematists because building phylogenetic trees is generally predicated on correctly identifying species membership for the terminals in an analysis. A common practice, particularly in large-scale phylogenetic analyses, is to use single-exemplar sampling under the implicit assumption that the resulting phylogenetic trees will be poorly supported if the sampled taxa are not good species. We examine this fundamental assumption in the North American turtle genus Pseudemys, a group of common, widely distributed freshwater turtles whose species boundaries and phylogenetic relationships have challenged systematists for over half a century. We sequenced 10 nuclear and three mitochondrial genes from the nine currently recognized species and subspecies of Pseudemys using geographically-widespread sampling of each taxon, and analyzed the resulting 86-individual data set using population-genetic and phylogenetic methods. We found little or no evidence supporting the division of Pseudemys into its currently recognized species/subspecies. Rather, our data strongly suggest that the group has been oversplit and contains fewer species than currently recognized. Even so, when we conducted 100 replicated, single-exemplar phylogenetic analyses of these same nine taxa, most Bayesian trees were well resolved, had high posterior probabilities, and yet returned completely conflicting topologies. These analyses suggest that phylogenetic analyses based on single-exemplar sampling may recover trees that depend on the individuals that are sampled, rather than the underlying species tree that systematists assume they are estimating. Our results clearly indicate that final resolution of Pseudemys will require an integrated analysis of morphology and historical biogeographic data coupled with extensive geographic sampling and large amounts of molecular data, and we do not recommend taxonomic changes based on our analyses. If our 100-tree resampling experiments generalize to other taxa, they suggest that single-exemplar phylogenies should be interpreted with caution, particularly for groups where species are shallowly diverged or inadequately delimited.
Collapse
Affiliation(s)
- Phillip Q Spinks
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA.
| | | | | | | | | | | |
Collapse
|
37
|
Genomics in cardiovascular disease. J Am Coll Cardiol 2013; 61:2029-37. [PMID: 23524054 DOI: 10.1016/j.jacc.2012.12.054] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/12/2012] [Revised: 01/29/2013] [Accepted: 02/19/2013] [Indexed: 01/29/2023]
Abstract
A paradigm shift toward biology occurred in the 1990s and was subsequently catalyzed by the sequencing of the human genome in 2000. The cost of deoxyribonucleic acid (DNA) sequencing has gone from millions to thousands of dollars with sequencing of one's entire genome costing only $1,000. Rapid DNA sequencing is being embraced for single gene disorders, particularly for sporadic cases and those from small families. Transmission of lethal genes such as associated with Huntington's disease can, through in vitro fertilization, avoid passing it on to one's offspring. DNA sequencing will meet the challenge of elucidating the genetic predisposition for common polygenic diseases, especially in determining the function of the novel common genetic risk variants and identifying the rare variants, which may also partially ascertain the source of the missing heritability. The challenge for DNA sequencing remains great, despite human genome sequences being 99.5% identical, the 3 million single nucleotide polymorphisms responsible for most of the unique features add up to 40 to 60 new mutations per person which, for 7 billion people, is 300 to 400 billion mutations. It is claimed that DNA sequencing has increased 10,000-fold while information storage and retrieval only 16-fold. The physician and health user will be challenged by the convergence of 2 major trends, whole genome sequencing, and the storage/retrieval and integration of the data.
Collapse
|
38
|
Montgomery SB, Goode DL, Kvikstad E, Albers CA, Zhang ZD, Mu XJ, Ananda G, Howie B, Karczewski KJ, Smith KS, Anaya V, Richardson R, Davis J, MacArthur DG, Sidow A, Duret L, Gerstein M, Makova KD, Marchini J, McVean G, Lunter G. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res 2013; 23:749-61. [PMID: 23478400 PMCID: PMC3638132 DOI: 10.1101/gr.148718.112] [Citation(s) in RCA: 163] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Short insertions and deletions (indels) are the second most abundant form of human genetic variation, but our understanding of their origins and functional effects lags behind that of other types of variants. Using population-scale sequencing, we have identified a high-quality set of 1.6 million indels from 179 individuals representing three diverse human populations. We show that rates of indel mutagenesis are highly heterogeneous, with 43%–48% of indels occurring in 4.03% of the genome, whereas in the remaining 96% their prevalence is 16 times lower than SNPs. Polymerase slippage can explain upwards of three-fourths of all indels, with the remainder being mostly simple deletions in complex sequence. However, insertions do occur and are significantly associated with pseudo-palindromic sequence features compatible with the fork stalling and template switching (FoSTeS) mechanism more commonly associated with large structural variations. We introduce a quantitative model of polymerase slippage, which enables us to identify indel-hypermutagenic protein-coding genes, some of which are associated with recurrent mutations leading to disease. Accounting for mutational rate heterogeneity due to sequence context, we find that indels across functional sequence are generally subject to stronger purifying selection than SNPs. We find that indel length modulates selection strength, and that indels affecting multiple functionally constrained nucleotides undergo stronger purifying selection. We further find that indels are enriched in associations with gene expression and find evidence for a contribution of nonsense-mediated decay. Finally, we show that indels can be integrated in existing genome-wide association studies (GWAS); although we do not find direct evidence that potentially causal protein-coding indels are enriched with associations to known disease-associated SNPs, our findings suggest that the causal variant underlying some of these associations may be indels.
Collapse
Affiliation(s)
- Stephen B Montgomery
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Lettre G. The search for genetic modifiers of disease severity in the β-hemoglobinopathies. Cold Spring Harb Perspect Med 2012; 2:2/10/a015032. [PMID: 23028136 DOI: 10.1101/cshperspect.a015032] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Sickle cell disease (SCD) and β-thalassemia, two monogenic diseases caused by mutations in the β-globin gene, affect millions of individuals worldwide. These hemoglobin disorders are characterized by extreme clinical heterogeneity, complicating patient management and treatment. A better understanding of this patient-to-patient clinical variability would dramatically improve care and might also guide the development of novel therapies. Studies of the natural history of these β-hemoglobinopathies have identified fetal hemoglobin levels and concomitant α-thalassemia as important modifiers of disease severity. Several small-scale studies have attempted to identify additional genetic modifiers of SCD and β-thalassemia, without much success. Fortunately, improved knowledge of the human genome and the development of new genomic tools, such as genome-wide genotyping arrays and next-generation DNA sequencers, offer new opportunities to use genetics to better understand the causes of the many complications observed in β-hemoglobinopathy patients. Here I discuss the most important factors to consider when planning an experiment to find associations between β-hemoglobinopathy-related complications and DNA sequence variants, with a focus on how to successfully perform a genome-wide association study. I also review the literature and explain why most published findings in the field of SCD modifier genetics are likely to be false-positive reports, with the goal to draw lessons allowing investigators to design better genetic experiments.
Collapse
Affiliation(s)
- Guillaume Lettre
- Montreal Heart Institute and Université Montréal, Montréal, Québec H1T 1C8, Canada.
| |
Collapse
|
40
|
Spinks PQ, Thomson RC, Zhang Y, Che J, Wu Y, Shaffer HB. Species boundaries and phylogenetic relationships in the critically endangered Asian box turtle genus Cuora. Mol Phylogenet Evol 2012; 63:656-67. [PMID: 22649793 DOI: 10.1016/j.ympev.2012.02.014] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Turtles are currently the most endangered major clade of vertebrates on earth, and Asian box turtles (Cuora) are in catastrophic decline. Effective management of this diverse turtle clade has been hampered by human-mediated, and perhaps natural hybridization, resulting in discordance between mitochondrial and nuclear markers and confusion regarding species boundaries and phylogenetic relationships among hypothesized species of Cuora. Here, we present analyses of mitochondrial and nuclear DNA data for all 12 currently hypothesized species to resolve both species boundaries and phylogenetic relationships. Our 15-gene, 40-individual nuclear data set was frequently in conflict with our mitochondrial data set; based on its general concordance with published morphological analyses and the strength of 15 independent estimates of evolutionary history, we interpret the nuclear data as representing the most reliable estimate of species boundaries and phylogeny of Cuora. Our results strongly reiterate the necessity of using multiple nuclear markers for phylogeny and species delimitation in these animals, including any form of DNA "barcoding", and point to Cuora as an important case study where reliance on mitochondrial DNA can lead to incorrect species identification.
Collapse
Affiliation(s)
- Phillip Q Spinks
- Department of Evolution and Ecology and Center for Population Biology, University of California, Davis, CA 95616, USA.
| | | | | | | | | | | |
Collapse
|
41
|
Yuan Q, Zhou Z, Lindell SG, Higley JD, Ferguson B, Thompson RC, Lopez JF, Suomi SJ, Baghal B, Baker M, Mash DC, Barr CS, Goldman D. The rhesus macaque is three times as diverse but more closely equivalent in damaging coding variation as compared to the human. BMC Genet 2012; 13:52. [PMID: 22747632 PMCID: PMC3426462 DOI: 10.1186/1471-2156-13-52] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 05/18/2012] [Indexed: 11/23/2022] Open
Abstract
Background As a model organism in biomedicine, the rhesus macaque (Macaca mulatta) is the most widely used nonhuman primate. Although a draft genome sequence was completed in 2007, there has been no systematic genome-wide comparison of genetic variation of this species to humans. Comparative analysis of functional and nonfunctional diversity in this highly abundant and adaptable non-human primate could inform its use as a model for human biology, and could reveal how variation in population history and size alters patterns and levels of sequence variation in primates. Results We sequenced the mRNA transcriptome and H3K4me3-marked DNA regions in hippocampus from 14 humans and 14 rhesus macaques. Using equivalent methodology and sampling spaces, we identified 462,802 macaque SNPs, most of which were novel and disproportionately located in the functionally important genomic regions we had targeted in the sequencing. At least one SNP was identified in each of 16,797 annotated macaque genes. Accuracy of macaque SNP identification was conservatively estimated to be >90%. Comparative analyses using SNPs equivalently identified in the two species revealed that rhesus macaque has approximately three times higher SNP density and average nucleotide diversity as compared to the human. Based on this level of diversity, the effective population size of the rhesus macaque is approximately 80,000 which contrasts with an effective population size of less than 10,000 for humans. Across five categories of genomic regions, intergenic regions had the highest SNP density and average nucleotide diversity and CDS (coding sequences) the lowest, in both humans and macaques. Although there are more coding SNPs (cSNPs) per individual in macaques than in humans, the ratio of dN/dS is significantly lower in the macaque. Furthermore, the number of damaging nonsynonymous cSNPs (have damaging effects on protein functions from PolyPhen-2 prediction) in the macaque is more closely equivalent to that of the human. Conclusions This large panel of newly identified macaque SNPs enriched for functionally significant regions considerably expands our knowledge of genetic variation in the rhesus macaque. Comparative analysis reveals that this widespread, highly adaptable species is approximately three times as diverse as the human but more closely equivalent in damaging variation.
Collapse
Affiliation(s)
- Qiaoping Yuan
- Laboratory of Neurogenetics, National Institute on Alcohol Abuse and Alcoholism, NIH, Bethesda, MD 20892, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Huang S, Yu T, Chen Z, Yuan S, Chen S, Xu A. More single-nucleotide mutations surround small insertions than small deletions in primates. Hum Mutat 2012; 33:1099-106. [PMID: 22461281 DOI: 10.1002/humu.22085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2011] [Accepted: 03/06/2012] [Indexed: 01/26/2023]
Abstract
Early studies have shown that single-nucleotide mutation rates increase close to insertions and deletions, but it is not fully understood how natural selection shapes genome-wide patterns of indels and their nearby single-nucleotide mutations. In this study, we find that, in primates, more single-nucleotide mutations surround small insertions than small deletions. This pattern affects <150 base pair (bp) sequences close to indels and persists under different genomic properties, such as exon/intron/intergenic contexts, repeated/nonrepeated sequences, replication timing, recombination rates, indel density, and guanine-cytosine (GC) content. We propose two different, but not mutually exclusive, hypothetical mechanisms to explain the pattern. One mechanism is that the sequence context preferring insertion formation may also favor nucleotide substitutions. Another mechanism is related to a hypothesis in which indel heterozygosity tends to increase nearby nucleotide substitution rates. It means that if insertions spend more time in heterozygotes, insertions may accumulate more surrounding single-nucleotide changes. In conclusion, we characterize a special genome-wide evolutionary pattern for indels and nearby single-nucleotide changes. This pattern may be driven by natural selection and bias primates' genome evolution and phenotypic variations.
Collapse
Affiliation(s)
- Shengfeng Huang
- Guangdong Key Laboratory of Pharmaceutical Functional Genes, College of Life Sciences, Sun Yat-Sen University, 135 XinGangXi Road,Guangzhou, People's Republic of China
| | | | | | | | | | | |
Collapse
|
43
|
SPINKS PHILLIPQ, THOMSON ROBERTC, HUGHES BILL, MOXLEY BRAD, BROWN RAFE, DIESMOS ARVIN, SHAFFER HBRADLEY. Cryptic variation and the tragedy of unrecognized taxa: the case of international trade in the spiny turtle Heosemys spinosa (Testudines: Geoemydidae). Zool J Linn Soc 2012. [DOI: 10.1111/j.1096-3642.2011.00788.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
44
|
Lu JT, Wang Y, Gibbs RA, Yu F. Characterizing linkage disequilibrium and evaluating imputation power of human genomic insertion-deletion polymorphisms. Genome Biol 2012; 13:R15. [PMID: 22377349 PMCID: PMC3334570 DOI: 10.1186/gb-2012-13-2-r15] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2011] [Revised: 02/14/2012] [Accepted: 02/29/2012] [Indexed: 02/07/2023] Open
Abstract
Background Indels are an important cause of human variation and central to the study of human disease. The 1000 Genomes Project Low-Coverage Pilot identified over 1.3 million indels shorter than 50 bp, of which over 890 were identified as potentially disruptive variants. Yet, despite their ubiquity, the local genomic characteristics of indels remain unexplored. Results Herein we describe population- and minor allele frequency-based differences in linkage disequilibrium and imputation characteristics for indels included in the 1000 Genomes Project Low-Coverage Pilot for the CEU, YRI and CHB+JPT populations. Common indels were well tagged by nearby SNPs in all studied populations, and were also tagged at a similar rate to common SNPs. Both neutral and functionally deleterious common indels were imputed with greater than 95% concordance from HapMap Phase 3 and OMNI SNP sites. Further, 38 to 56% of low frequency indels were tagged by low frequency SNPs. We were able to impute heterozygous low frequency indels with over 50% concordance. Lastly, our analysis also revealed evidence of ascertainment bias. This bias prevents us from extending the applicability of our results to highly polymorphic indels that could not be identified in the Low-Coverage Pilot. Conclusions Although further scope exists to improve the imputation of low frequency indels, our study demonstrates that there are already ample opportunities to retrospectively impute indels for prior genome-wide association studies and to incorporate indel imputation into future case/control studies.
Collapse
Affiliation(s)
- James T Lu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | | | | | | |
Collapse
|
45
|
Lemos RR, Souza MBR, Oliveira JRM. Exploring the implications of INDELs in neuropsychiatric genetics: challenges and perspectives. J Mol Neurosci 2012; 47:419-24. [PMID: 22350990 DOI: 10.1007/s12031-012-9714-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Accepted: 01/24/2012] [Indexed: 02/04/2023]
Abstract
The decade passed after publishing the Human Genome first draft faced an enormous growth at the understanding of the genomic variation among different subjects, populations, and groups of patients. Single nucleotide polymorphisms (SNPs) and insertion or deletions (INDELs) have been increasingly recognized as a major type of genetic variations, with potential impact in protein activities and gene expression changes observed in complex genetic traits, like neuropsychiatric diseases. INDELs represent the second most common class of variations after SNPs, but there is still an important gap between the number of INDELs reported and the actual knowledge about the functional implications of such variations. There are approximately 10 million SNPs already reported, and the human populations are expected to collectively harbor at least 1.6-2.5 million INDELs. One of the major challenges is to find better platforms to screen for INDELs in a high throughput manner. The discordance in between the data from different studies might be explained by the diverse approaches employed to sequence the genomes with variable platforms. Short INDEL variations increased the scope of genetic markers in human genetic diseases, and various studies showed that common microdeletions and smaller INDELs might be highly associated with neuropsychiatric diseases such as schizophrenia, autism, mental retardation, and Alzheimer disease. The rapidly increasing amount of resequencing, genotyping, and personal genome data generated by large-scale genetic human projects require the development of integrated bioinformatics tools able to efficiently manage and analyze these genetic data. Our group is currently dealing with different approaches that might optimize sequencing and bioinformatics analyses of short INDELs to broaden our research capabilities of identifying those intriguing genetic variations. Hopefully, INDELs might become a new trend in association studies in neuropsychiatric genetics since so far the level of significant and positive associations with the standard SNPs reported presents limited predictive application.
Collapse
Affiliation(s)
- R R Lemos
- Keizo Asami Laboratory (LIKA), Federal University of Pernambuco, 50670901, Recife, Pernambuco, Brazil
| | | | | |
Collapse
|
46
|
Bansal V, Libiger O. A probabilistic method for the detection and genotyping of small indels from population-scale sequence data. ACTA ACUST UNITED AC 2011; 27:2047-53. [PMID: 21653520 DOI: 10.1093/bioinformatics/btr344] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION High-throughput sequencing technologies have made population-scale studies of human genetic variation possible. Accurate and comprehensive detection of DNA sequence variants is crucial for the success of these studies. Small insertions and deletions represent the second most frequent class of variation in the human genome after single nucleotide polymorphisms (SNPs). Although several alignment tools for the gapped alignment of sequence reads to a reference genome are available, computational methods for discriminating indels from sequencing errors and genotyping indels directly from sequence reads are needed. RESULTS We describe a probabilistic method for the accurate detection and genotyping of short indels from population-scale sequence data. In this approach, aligned sequence reads from a population of individuals are used to automatically account for context-specific sequencing errors associated with indels. We applied this approach to population sequence datasets from the 1000 Genomes exon pilot project generated using the Roche 454 and Illumina sequencing platforms, and were able to detect a significantly greater number of indels than reported previously. Comparison to indels identified in the 1000 Genomes pilot project demonstrated the sensitivity of our method. The consistency in the number of indels and the fraction of indels whose length is a multiple of three across different human populations and two different sequencing platforms indicated that our method has a low false discovery rate. Finally, the method represents a general approach for the detection and genotyping of small-scale DNA sequence variants for population-scale sequencing projects. AVAILABILITY A program implementing this method is available at http://polymorphism.scripps.edu/~vbansal/software/piCALL/
Collapse
Affiliation(s)
- Vikas Bansal
- Scripps Genomic Medicine, Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA.
| | | |
Collapse
|
47
|
Mills RE, Pittard WS, Mullaney JM, Farooq U, Creasy TH, Mahurkar AA, Kemeza DM, Strassler DS, Ponting CP, Webber C, Devine SE. Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res 2011; 21:830-9. [PMID: 21460062 DOI: 10.1101/gr.115907.110] [Citation(s) in RCA: 168] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Human genetic variation is expected to play a central role in personalized medicine. Yet only a fraction of the natural genetic variation that is harbored by humans has been discovered to date. Here we report almost 2 million small insertions and deletions (INDELs) that range from 1 bp to 10,000 bp in length in the genomes of 79 diverse humans. These variants include 819,363 small INDELs that map to human genes. Small INDELs frequently were found in the coding exons of these genes, and several lines of evidence indicate that such variation is a major determinant of human biological diversity. Microarray-based genotyping experiments revealed several interesting observations regarding the population genetics of small INDEL variation. For example, we found that many of our INDELs had high levels of linkage disequilibrium (LD) with both HapMap SNPs and with high-scoring SNPs from genome-wide association studies. Overall, our study indicates that small INDEL variation is likely to be a key factor underlying inherited traits and diseases in humans.
Collapse
Affiliation(s)
- Ryan E Mills
- Department of Biochemistry, Emory University School of Medicine, Atlanta, Georgia 30322, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
The rhox homeobox gene cluster is imprinted and selectively targeted for regulation by histone h1 and DNA methylation. Mol Cell Biol 2011; 31:1275-87. [PMID: 21245380 DOI: 10.1128/mcb.00734-10] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Histone H1 is an abundant and essential component of chromatin whose precise role in regulating gene expression is poorly understood. Here, we report that a major target of H1-mediated regulation in embryonic stem (ES) cells is the X-linked Rhox homeobox gene cluster. To address the underlying mechanism, we examined the founding member of the Rhox gene cluster-Rhox5-and found that its distal promoter (Pd) loses H1, undergoes demethylation, and is transcriptionally activated in response to loss of H1 genes in ES cells. Demethylation of the Pd is required for its transcriptional induction and we identified a single cytosine in the Pd that, when methylated, is sufficient to inhibit Pd transcription. Methylation of this single cytosine prevents the Pd from binding GA-binding protein (GABP), a transcription factor essential for Pd transcription. Thus, H1 silences Rhox5 transcription by promoting methylation of one of its promoters, a mechanism likely to extend to other H1-regulated Rhox genes, based on analysis of ES cells lacking DNA methyltransferases. The Rhox cluster genes targeted for H1-mediated transcriptional repression are also subject to another DNA methylation-regulated process: Xp imprinting. Remarkably, we found that only H1-regulated Rhox genes are imprinted, not those immune to H1-mediated repression. Together, our results indicate that the Rhox gene cluster is a major target of H1-mediated transcriptional repression in ES cells and that H1 is a candidate to have a role in Xp imprinting.
Collapse
|
49
|
Mullaney JM, Mills RE, Pittard WS, Devine SE. Small insertions and deletions (INDELs) in human genomes. Hum Mol Genet 2010; 19:R131-6. [PMID: 20858594 DOI: 10.1093/hmg/ddq400] [Citation(s) in RCA: 215] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
In this review, we focus on progress that has been made with detecting small insertions and deletions (INDELs) in human genomes. Over the past decade, several million small INDELs have been discovered in human populations and personal genomes. The amount of genetic variation that is caused by these small INDELs is substantial. The number of INDELs in human genomes is second only to the number of single nucleotide polymorphisms (SNPs), and, in terms of base pairs of variation, INDELs cause similar levels of variation as SNPs. Many of these INDELs map to functionally important sites within human genes, and thus, are likely to influence human traits and diseases. Therefore, small INDEL variation will play a prominent role in personalized medicine.
Collapse
Affiliation(s)
- Julienne M Mullaney
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore Street, 615 BioPark II, Baltimore, MD 21201, USA
| | | | | | | |
Collapse
|
50
|
Mullikin JC, Hansen NF, Shen L, Ebling H, Donahue WF, Tao W, Saranga DJ, Brand A, Rubenfield MJ, Young AC, Cruz P, Driscoll C, David V, Al-Murrani SWK, Locniskar MF, Abrahamsen MS, O'Brien SJ, Smith DR, Brockman JA. Light whole genome sequence for SNP discovery across domestic cat breeds. BMC Genomics 2010; 11:406. [PMID: 20576142 PMCID: PMC2996934 DOI: 10.1186/1471-2164-11-406] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2009] [Accepted: 06/24/2010] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The domestic cat has offered enormous genomic potential in the veterinary description of over 250 hereditary disease models as well as the occurrence of several deadly feline viruses (feline leukemia virus--FeLV, feline coronavirus--FECV, feline immunodeficiency virus--FIV) that are homologues to human scourges (cancer, SARS, and AIDS respectively). However, to realize this bio-medical potential, a high density single nucleotide polymorphism (SNP) map is required in order to accomplish disease and phenotype association discovery. DESCRIPTION To remedy this, we generated 3,178,297 paired fosmid-end Sanger sequence reads from seven cats, and combined these data with the publicly available 2X cat whole genome sequence. All sequence reads were assembled together to form a 3X whole genome assembly allowing the discovery of over three million SNPs. To reduce potential false positive SNPs due to the low coverage assembly, a low upper-limit was placed on sequence coverage and a high lower-limit on the quality of the discrepant bases at a potential variant site. In all domestic cats of different breeds: female Abyssinian, female American shorthair, male Cornish Rex, female European Burmese, female Persian, female Siamese, a male Ragdoll and a female African wildcat were sequenced lightly. We report a total of 964 k common SNPs suitable for a domestic cat SNP genotyping array and an additional 900 k SNPs detected between African wildcat and domestic cats breeds. An empirical sampling of 94 discovered SNPs were tested in the sequenced cats resulting in a SNP validation rate of 99%. CONCLUSIONS These data provide a large collection of mapped feline SNPs across the cat genome that will allow for the development of SNP genotyping platforms for mapping feline diseases.
Collapse
Affiliation(s)
- James C Mullikin
- Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Nancy F Hansen
- Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Lei Shen
- Agencourt Bioscience Corporation, Beverly, Massachusetts 01915, USA
| | - Heather Ebling
- Agencourt Bioscience Corporation, Beverly, Massachusetts 01915, USA
| | | | - Wei Tao
- Agencourt Bioscience Corporation, Beverly, Massachusetts 01915, USA
| | - David J Saranga
- Agencourt Bioscience Corporation, Beverly, Massachusetts 01915, USA
| | - Adrianne Brand
- Agencourt Bioscience Corporation, Beverly, Massachusetts 01915, USA
| | | | - Alice C Young
- Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Pedro Cruz
- Genome Technology Branch and NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Carlos Driscoll
- Laboratory of Genomic Diversity, National Cancer Institute, Frederick, Maryland 21702, USA
| | - Victor David
- Laboratory of Genomic Diversity, National Cancer Institute, Frederick, Maryland 21702, USA
| | | | | | | | - Stephen J O'Brien
- Laboratory of Genomic Diversity, National Cancer Institute, Frederick, Maryland 21702, USA
| | - Douglas R Smith
- Agencourt Bioscience Corporation, Beverly, Massachusetts 01915, USA
| | | |
Collapse
|