26
|
Zago VHS, Scherrer DZ, Parra ES, Vieira IC, Marson FAL, de Faria EC. Effects of SNVs in ABCA1, ABCG1, ABCG5, ABCG8, and SCARB1 Genes on Plasma Lipids, Lipoproteins, and Adiposity Markers in a Brazilian Population. Biochem Genet 2021; 60:822-841. [PMID: 34505223 DOI: 10.1007/s10528-021-10131-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 08/25/2021] [Indexed: 10/20/2022]
Abstract
Several proteins are involved in cholesterol homeostasis, as scavenger receptor class B type I and ATP-binding cassette (ABC) transporters including ABCA1, ABCG1, ABCG5, and ABCG8. This study aimed to determine the effects of single nucleotide variants (SNVs) rs2275543 (ABCA1), rs1893590 (ABCG1), rs6720173 (ABCG5), rs6544718 (ABCG8), and rs5888 (SCARB1) on plasma lipids, lipoproteins, and adiposity markers in an asymptomatic population and its sex-specific effects. Volunteers (n = 590) were selected and plasma lipids, lipoproteins, and adiposity markers (waist-to-hip and waist-to-height ratios, lipid accumulation product and body adiposity index) were measured. Genomic DNA was isolated from peripheral blood cells according to the method adapted from Gross-Bellard. SNVs were detected in the TaqMan® OpenArray® Real-Time polymerase chain reaction platform and data analyses were performed using the TaqMan® Genotyper Software. The rs2275543*C point to an increase of high-density lipoprotein size in females while in males very-low-density lipoprotein, cholesterol, and triglycerides were statistically lower (P value < 0.05). The rs1893590*C was statistically associated with lower apolipoprotein A-I levels and higher activities of paraoxonase-1 and cholesteryl ester transfer protein (P value < 0.05). The rs6720173 was statistically associated with an increase in cholesterol and low-density lipoprotein cholesterol in males; moreover, rs6544718*T reduced adiposity markers in females (P value < 0.05). Regarding the rs5888, a decreased adiposity marker in the total population and in females occurred (P value < 0.05). Multivariate analysis of variance showed that SNVs could influence components of high-density lipoprotein metabolism, mainly through ABCG1 (P value < 0.05). The ABCA1 and ABCG5 variants showed sex-specific effects on lipids and lipoproteins, while SCARB1 and ABCG8 variants might influence adiposity markers in females. Our data indicate a possible role of ABCG1 on HDL metabolism.
Collapse
|
27
|
Li X, Kumar S, Harmanci A, Li S, Kitchen RR, Zhang Y, Wali VB, Reddy SM, Woodward WA, Reuben JM, Rozowsky J, Hatzis C, Ueno NT, Krishnamurthy S, Pusztai L, Gerstein M. Whole-genome sequencing of phenotypically distinct inflammatory breast cancers reveals similar genomic alterations to non-inflammatory breast cancers. Genome Med 2021; 13:70. [PMID: 33902690 PMCID: PMC8077918 DOI: 10.1186/s13073-021-00879-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 03/25/2021] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Inflammatory breast cancer (IBC) has a highly invasive and metastatic phenotype. However, little is known about its genetic drivers. To address this, we report the largest cohort of whole-genome sequencing (WGS) of IBC cases. METHODS We performed WGS of 20 IBC samples and paired normal blood DNA to identify genomic alterations. For comparison, we used 23 matched non-IBC samples from the Cancer Genome Atlas Program (TCGA). We also validated our findings using WGS data from the International Cancer Genome Consortium (ICGC) and the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. We examined a wide selection of genomic features to search for differences between IBC and conventional breast cancer. These include (i) somatic and germline single-nucleotide variants (SNVs), in both coding and non-coding regions; (ii) the mutational signature and the clonal architecture derived from these SNVs; (iii) copy number and structural variants (CNVs and SVs); and (iv) non-human sequence in the tumors (i.e., exogenous sequences of bacterial origin). RESULTS Overall, IBC has similar genomic characteristics to non-IBC, including specific alterations, overall mutational load and signature, and tumor heterogeneity. In particular, we observed similar mutation frequencies between IBC and non-IBC, for each gene and most cancer-related pathways. Moreover, we found no exogenous sequences of infectious agents specific to IBC samples. Even though we could not find any strongly statistically distinguishing genomic features between the two groups, we did find some suggestive differences in IBC: (i) The MAST2 gene was more frequently mutated (20% IBC vs. 0% non-IBC). (ii) The TGF β pathway was more frequently disrupted by germline SNVs (50% vs. 13%). (iii) Different copy number profiles were observed in several genomic regions harboring cancer genes. (iv) Complex SVs were more frequent. (v) The clonal architecture was simpler, suggesting more homogenous tumor-evolutionary lineages. CONCLUSIONS Whole-genome sequencing of IBC manifests a similar genomic architecture to non-IBC. We found no unique genomic alterations shared in just IBCs; however, subtle genomic differences were observed including germline alterations in TGFβ pathway genes and somatic mutations in the MAST2 kinase that could represent potential therapeutic targets.
Collapse
|
28
|
Stranneheim H, Lagerstedt-Robinson K, Magnusson M, Kvarnung M, Nilsson D, Lesko N, Engvall M, Anderlid BM, Arnell H, Johansson CB, Barbaro M, Björck E, Bruhn H, Eisfeldt J, Freyer C, Grigelioniene G, Gustavsson P, Hammarsjö A, Hellström-Pigg M, Iwarsson E, Jemt A, Laaksonen M, Enoksson SL, Malmgren H, Naess K, Nordenskjöld M, Oscarson M, Pettersson M, Rasi C, Rosenbaum A, Sahlin E, Sardh E, Stödberg T, Tesi B, Tham E, Thonberg H, Töhönen V, von Döbeln U, Vassiliou D, Vonlanthen S, Wikström AC, Wincent J, Winqvist O, Wredenberg A, Ygberg S, Zetterström RH, Marits P, Soller MJ, Nordgren A, Wirta V, Lindstrand A, Wedell A. Integration of whole genome sequencing into a healthcare setting: high diagnostic rates across multiple clinical entities in 3219 rare disease patients. Genome Med 2021; 13:40. [PMID: 33726816 PMCID: PMC7968334 DOI: 10.1186/s13073-021-00855-5] [Citation(s) in RCA: 101] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 02/11/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND We report the findings from 4437 individuals (3219 patients and 1218 relatives) who have been analyzed by whole genome sequencing (WGS) at the Genomic Medicine Center Karolinska-Rare Diseases (GMCK-RD) since mid-2015. GMCK-RD represents a long-term collaborative initiative between Karolinska University Hospital and Science for Life Laboratory to establish advanced, genomics-based diagnostics in the Stockholm healthcare setting. METHODS Our analysis covers detection and interpretation of SNVs, INDELs, uniparental disomy, CNVs, balanced structural variants, and short tandem repeat expansions. Visualization of results for clinical interpretation is carried out in Scout-a custom-developed decision support system. Results from both singleton (84%) and trio/family (16%) analyses are reported. Variant interpretation is done by 15 expert teams at the hospital involving staff from three clinics. For patients with complex phenotypes, data is shared between the teams. RESULTS Overall, 40% of the patients received a molecular diagnosis ranging from 19 to 54% for specific disease groups. There was heterogeneity regarding causative genes (n = 754) with some of the most common ones being COL2A1 (n = 12; skeletal dysplasia), SCN1A (n = 8; epilepsy), and TNFRSF13B (n = 4; inborn errors of immunity). Some causative variants were recurrent, including previously known founder mutations, some novel mutations, and recurrent de novo mutations. Overall, GMCK-RD has resulted in a large number of patients receiving specific molecular diagnoses. Furthermore, negative cases have been included in research studies that have resulted in the discovery of 17 published, novel disease-causing genes. To facilitate the discovery of new disease genes, GMCK-RD has joined international data sharing initiatives, including ClinVar, UDNI, Beacon, and MatchMaker Exchange. CONCLUSIONS Clinical WGS at GMCK-RD has provided molecular diagnoses to over 1200 individuals with a broad range of rare diseases. Consolidation and spread of this clinical-academic partnership will enable large-scale national collaboration.
Collapse
|
29
|
Higuchi T, Oka S, Furukawa H, Tohma S, Yatsuhashi H, Migita K. Genetic risk factors for autoimmune hepatitis: implications for phenotypic heterogeneity and biomarkers for drug response. Hum Genomics 2021; 15:6. [PMID: 33509297 PMCID: PMC7841991 DOI: 10.1186/s40246-020-00301-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 12/15/2020] [Indexed: 01/10/2023] Open
Abstract
Autoimmune hepatitis (AIH) is a rare chronic progressive liver disease with autoimmune features. It mainly affects middle-aged women. AIH is occasionally complicated with liver cirrhosis that worsens the prognosis. Genetic and environmental factors are involved in the pathogenesis of AIH. Genetic studies of other diseases have been revealing of pathogenesis and drug efficacy. In this review, we summarize the genetic risk factors for AIH, including human leukocyte antigen (HLA) and non-HLA genes. A genome-wide association study (GWAS) on European AIH revealed the strongest associations to be with single nucleotide variants (SNVs) in HLA. Predisposing alleles for AIH were DRB1*03:01 and DRB1*04:01 in Europeans; DRB1*04:04, DRB1*04:05, and DRB1*13:01 in Latin Americans; and DRB1*04:01 and DRB1*04:05 in Japanese. Other risk SNVs in non-HLA genes for AIH were found by a candidate gene approach, but several SNVs were confirmed in replication studies. Some genetic factors of AIH overlapped with those of other autoimmune diseases. Larger-scale GWASs of other ethnic groups are required. The results of genetic studies might provide an explanation for the phenotypic heterogeneity of AIH and biomarkers for drug responses.
Collapse
|
30
|
Qaiser F, Yin Y, Mervis CB, Morris CA, Klein-Tasman BP, Tam E, Osborne LR, Yuen RKC. Rare and low frequency genomic variants impacting neuronal functions modify the Dup7q11.23 phenotype. Orphanet J Rare Dis 2021; 16:6. [PMID: 33407644 PMCID: PMC7788915 DOI: 10.1186/s13023-020-01648-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 12/14/2020] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND 7q11.23 duplication (Dup7) is one of the most frequent recurrent copy number variants (CNVs) in individuals with autism spectrum disorder (ASD), but based on gold-standard assessments, only 19% of Dup7 carriers have ASD, suggesting that additional genetic factors are necessary to manifest the ASD phenotype. To assess the contribution of additional genetic variants to the Dup7 phenotype, we conducted whole-genome sequencing analysis of 20 Dup7 carriers: nine with ASD (Dup7-ASD) and 11 without ASD (Dup7-non-ASD). RESULTS We identified three rare variants of potential clinical relevance for ASD: a 1q21.1 microdeletion (Dup7-non-ASD) and two deletions which disrupted IMMP2L (one Dup7-ASD, one Dup7-non-ASD). There were no significant differences in gene-set or pathway variant burden between the Dup7-ASD and Dup7-non-ASD groups. However, overall intellectual ability negatively correlated with the number of rare loss-of-function variants present in nervous system development and membrane component pathways, and adaptive behaviour standard scores negatively correlated with the number of low-frequency likely-damaging missense variants found in genes expressed in the prenatal human brain. ASD severity positively correlated with the number of low frequency loss-of-function variants impacting genes expressed at low levels in the brain, and genes with a low level of intolerance. CONCLUSIONS Our study suggests that in the presence of the same pathogenic Dup7 variant, rare and low frequency genetic variants act additively to contribute to components of the overall Dup7 phenotype.
Collapse
|
31
|
Kawasaki A, Namba N, Sada KE, Hirano F, Kobayashi S, Nagasaka K, Sugihara T, Ono N, Fujimoto T, Kusaoi M, Tamura N, Yamagata K, Sumida T, Hashimoto H, Ozaki S, Makino H, Arimura Y, Harigai M, Tsuchiya N. Association of TERT and DSP variants with microscopic polyangiitis and myeloperoxidase-ANCA positive vasculitis in a Japanese population: a genetic association study. Arthritis Res Ther 2020; 22:246. [PMID: 33076992 PMCID: PMC7574242 DOI: 10.1186/s13075-020-02347-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Accepted: 10/06/2020] [Indexed: 12/01/2022] Open
Abstract
Background Interstitial lung disease (ILD) is a severe complication with poor prognosis in anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis (AAV). Prevalence of AAV-associated ILD (AAV-ILD) in Japan is considerably higher than that in Europe. Recently, we reported that a MUC5B variant rs35705950, the strongest susceptibility variant to idiopathic pulmonary fibrosis (IPF), was strikingly increased in AAV-ILD patients but not in AAV patients without ILD; however, due to the low allele frequency in the Japanese population, the MUC5B variant alone cannot account for the high prevalence of AAV-ILD in Japan. In this study, we examined whether other IPF susceptibility alleles in TERT and DSP genes are associated with susceptibility to AAV subsets and AAV-ILD. Methods Five hundred and forty-four Japanese patients with AAV and 5558 controls were analyzed. Among the AAV patients, 432 were positive for myeloperoxidase (MPO)-ANCA (MPO-AAV). A total of 176 MPO-AAV patients were positive and 216 were negative for ILD based on CT or high-resolution CT. Genotypes of TERT and DSP variants were determined by TaqMan SNP Genotyping Assay, and their association was tested by chi-square test. Results When the frequencies of the IPF risk alleles TERT rs2736100A and DSP rs2076295G were compared between AAV subsets and healthy controls, both alleles were significantly increased in microscopic polyangiitis (MPA) (TERT P = 2.3 × 10−4, Pc = 0.0023, odds ratio [OR] 1.38; DSP P = 6.9 × 10−4, Pc = 0.0069, OR 1.32) and MPO-AAV (TERT P = 1.5 × 10−4, Pc = 0.0015, OR 1.33; DSP P = 0.0011, Pc = 0.011, OR 1.26). On the other hand, no significant association was detected when the allele frequencies were compared between MPO-AAV patients with and without ILD. Conclusions Unexpectedly, TERT and DSP IPF risk alleles were found to be associated with MPA and MPO-AAV, regardless of the presence of ILD. These findings suggest that TERT and DSP may be novel susceptibility genes to MPA/MPO-AAV and also that some susceptibility genes may be shared between IPF and MPA/MPO-AAV.
Collapse
|
32
|
Magnusson M, Eisfeldt J, Nilsson D, Rosenbaum A, Wirta V, Lindstrand A, Wedell A, Stranneheim H. Loqusdb: added value of an observations database of local genomic variation. BMC Bioinformatics 2020; 21:273. [PMID: 32611382 PMCID: PMC7329469 DOI: 10.1186/s12859-020-03609-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 06/17/2020] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Exome and genome sequencing is becoming the method of choice for rare disease diagnostics. One of the key challenges remaining is distinguishing the disease causing variants from the benign background variation. After analysis and annotation of the sequencing data there are typically thousands of candidate variants requiring further investigation. One of the most effective and least biased ways to reduce this number is to assess the rarity of a variant in any population. Currently, there are a number of reliable sources of information for major population frequencies when considering single nucleotide variants (SNVs) and small insertion and deletions (INDELs), with gnomAD as the most prominent public resource available. However, local variation or frequencies in sub-populations may be underrepresented in these public resources. In contrast, for structural variation (SV), the background frequency in the general population is more or less unknown mostly due to challenges in calling SVs in a consistent way. Keeping track of local variation is one way to overcome these problems and significantly reduce the number of potential disease causing variants retained for manual inspection, both for SNVs and SVs. RESULTS Here, we present loqusdb, a tool to solve the challenge of keeping track of any type of variant observations from genome sequencing data. Loqusdb was designed to handle a large flow of samples and unlike other solutions, samples can be added continuously to the database without rebuilding it, facilitating improvements and additions. We assessed the added value of a local observations database using 98 samples annotated with information from a background of 888 unrelated individuals. CONCLUSIONS We show both how powerful SV analysis can be when filtering for population frequencies and how the number of apparently rare SNVs/INDELs can be reduced by adding local population information even after annotating the data with other large frequency databases, such as gnomAD. In conclusion, we show that a local frequency database is an attractive, and a necessary addition to the publicly available databases that facilitate the analysis of exome and genome data in a clinical setting.
Collapse
|
33
|
Reeb J, Wirth T, Rost B. Variant effect predictions capture some aspects of deep mutational scanning experiments. BMC Bioinformatics 2020; 21:107. [PMID: 32183714 PMCID: PMC7077003 DOI: 10.1186/s12859-020-3439-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/03/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Deep mutational scanning (DMS) studies exploit the mutational landscape of sequence variation by systematically and comprehensively assaying the effect of single amino acid variants (SAVs; also referred to as missense mutations, or non-synonymous Single Nucleotide Variants - missense SNVs or nsSNVs) for particular proteins. We assembled SAV annotations from 22 different DMS experiments and normalized the effect scores to evaluate variant effect prediction methods. Three trained on traditional variant effect data (PolyPhen-2, SIFT, SNAP2), a regression method optimized on DMS data (Envision), and a naïve prediction using conservation information from homologs. RESULTS On a set of 32,981 SAVs, all methods captured some aspects of the experimental effect scores, albeit not the same. Traditional methods such as SNAP2 correlated slightly more with measurements and better classified binary states (effect or neutral). Envision appeared to better estimate the precise degree of effect. Most surprising was that the simple naïve conservation approach using PSI-BLAST in many cases outperformed other methods. All methods captured beneficial effects (gain-of-function) significantly worse than deleterious (loss-of-function). For the few proteins with multiple independent experimental measurements, experiments differed substantially, but agreed more with each other than with predictions. CONCLUSIONS DMS provides a new powerful experimental means of understanding the dynamics of the protein sequence space. As always, promising new beginnings have to overcome challenges. While our results demonstrated that DMS will be crucial to improve variant effect prediction methods, data diversity hindered simplification and generalization.
Collapse
|
34
|
Venkatasubramani JP, Subramanyam P, Pal R, Reddy BK, Srinivasan DJ, Chattarji S, Iossifov I, Klann E, Bhattacharya A. N-terminal variant Asp14Asn of the human p70 S6 Kinase 1 enhances translational signaling causing different effects in developing and mature neuronal cells. Neurobiol Learn Mem 2020; 171:107203. [PMID: 32147585 DOI: 10.1016/j.nlm.2020.107203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 01/23/2020] [Accepted: 02/29/2020] [Indexed: 01/02/2023]
Abstract
The ribosomal p70 S6 Kinase 1 (S6K1) has been implicated in the etiology of complex neurological diseases including autism, depression and dementia. Though no major gene disruption has been reported in humans in RPS6KB1, single nucleotide variants (SNVs) causing missense mutations have been identified, which have not been assessed for their impact on protein function. These S6K1 mutations have the potential to influence disease progression and treatment response. We mined the Simon Simplex Collection (SSC) and SPARK autism database to find inherited SNVs in S6K1 and characterized the effect of two missense SNVs, Asp14Asn (allele frequency = 0.03282%) and Glu44Gln (allele frequency = 0.0008244%), on S6K1 function in HEK293, human ES cells and primary neurons. Expressing Asp14Asn in HEK293 cells resulted in increased basal phosphorylation of downstream targets of S6K1 and increased de novo translation. This variant also showed blunted response to the specific S6K1 inhibitor, FS-115. In human embryonic cell line Shef4, Asp14Asn enhanced spontaneous neural fate specification in the absence of differentiating growth factors. In addition to enhanced translation, neurons expressing Asp14Asn exhibited impaired dendritic arborization and increased levels of phosphorylated ERK 1/2. Finally, in the SSC families tracked, Asp14Asn segregated with lower IQ scores when found in the autistic individual rather than the unaffected sibling. The Glu44Gln mutation showed a milder, but opposite phenotype in HEK cells as compared to Asp14Asn. Although the Glu44Gln mutation displayed increased neuronal translation, it had no impact on neuronal morphology. Our results provide the first characterization of naturally occurring human S6K1 variants on cognitive phenotype, neuronal morphology and maturation, underscoring again the importance of translation control in neural development and plasticity.
Collapse
|
35
|
Domínguez-Cruz MG, Muñoz MDL, Totomoch-Serra A, García-Escalante MG, Burgueño J, Valadez-González N, Pinto-Escalante D, Díaz-Badillo A. Maya gene variants related to the risk of type 2 diabetes in a family-based association study. Gene 2020; 730:144259. [PMID: 31759989 DOI: 10.1016/j.gene.2019.144259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 11/06/2019] [Accepted: 11/07/2019] [Indexed: 12/01/2022]
Abstract
Mexican Maya populations have a notably high prevalence of type 2 diabetes (T2D) as a consequence of the interaction between environmental factors and a genetic component. To assess the impact of 24 single nucleotide variants (SNVs) located in 18 T2D risk genes, we conducted a family-based association evaluation in samples from Maya communities with a high incidence of the disease. A total of four hundred individuals were recruited from three Maya communities with a high T2D incidence. Family pedigrees (100) and 49 nuclear families were included. Genotyping was performed by allelic discrimination with TaqMan probes. This study also included the family-based association test (FBAT) statistic U to assess the genetic associations with T2D, and the multivariate statistical and haplotype analyses. A positive association with TD2 risk was found for WFS1 rs6446482 (p = 0.046, Z = 1.994) under an additive model, and SIRT1 rs7896005 (p = 0.038, Z = 2.073) under the dominant model. Multivariate model analysis, including T2D status, age, and body mass index (BMI), displayed significant covariance in PPARGC-1α rs8192678; SIRT1 rs7896005; TCF7L2 rs7903146 and rs122243326; UCP3 rs3781907; and HHEX rs1111875 with a P < 0.05. This study revealed an association of SIRT1 and WFS1 with T2D risk.
Collapse
|
36
|
Amiri Ghanatsaman Z, Wang GD, Asadollahpour Nanaei H, Asadi Fozi M, Peng MS, Esmailizadeh A, Zhang YP. Whole genome resequencing of the Iranian native dogs and wolves to unravel variome during dog domestication. BMC Genomics 2020; 21:207. [PMID: 32131720 PMCID: PMC7057629 DOI: 10.1186/s12864-020-6619-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Accepted: 02/25/2020] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Advances in genome technology have simplified a new comprehension of the genetic and historical processes crucial to rapid phenotypic evolution under domestication. To get new insight into the genetic basis of the dog domestication process, we conducted whole-genome sequence analysis of three wolves and three dogs from Iran which covers the eastern part of the Fertile Crescent located in Southwest Asia where the independent domestication of most of the plants and animals has been documented and also high haplotype sharing between wolves and dog breeds has been reported. RESULTS Higher diversity was found within the wolf genome compared with the dog genome. A total number of 12.45 million SNPs were detected in all individuals (10.45 and 7.82 million SNPs were identified for all the studied wolves and dogs, respectively) and a total number of 3.49 million small Indels were detected in all individuals (3.11 and 2.24 million small Indels were identified for all the studied wolves and dogs, respectively). A total of 10,571 copy number variation regions (CNVRs) were detected across the 6 individual genomes, covering 154.65 Mb, or 6.41%, of the reference genome (canFam3.1). Further analysis showed that the distribution of deleterious variants in the dog genome is higher than the wolf genome. Also, genomic annotation results from intron and intergenic regions showed that the proportion of variations in the wolf genome is higher than that in the dog genome, while the proportion of the coding sequences and 3'-UTR in the dog genome is higher than that in the wolf genome. The genes related to the olfactory and immune systems were enriched in the set of the structural variants (SVs) identified in this work. CONCLUSIONS Our results showed more deleterious mutations and coding sequence variants in the domestic dog genome than those in wolf genome. By providing the first Iranian dog and wolf variome map, our findings contribute to understanding the genetic architecture of the dog domestication.
Collapse
|
37
|
Lindstrand A, Eisfeldt J, Pettersson M, Carvalho CMB, Kvarnung M, Grigelioniene G, Anderlid BM, Bjerin O, Gustavsson P, Hammarsjö A, Georgii-Hemming P, Iwarsson E, Johansson-Soller M, Lagerstedt-Robinson K, Lieden A, Magnusson M, Martin M, Malmgren H, Nordenskjöld M, Norling A, Sahlin E, Stranneheim H, Tham E, Wincent J, Ygberg S, Wedell A, Wirta V, Nordgren A, Lundin J, Nilsson D. From cytogenetics to cytogenomics: whole-genome sequencing as a first-line test comprehensively captures the diverse spectrum of disease-causing genetic variation underlying intellectual disability. Genome Med 2019; 11:68. [PMID: 31694722 PMCID: PMC6836550 DOI: 10.1186/s13073-019-0675-1] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 10/09/2019] [Indexed: 12/30/2022] Open
Abstract
Background Since different types of genetic variants, from single nucleotide variants (SNVs) to large chromosomal rearrangements, underlie intellectual disability, we evaluated the use of whole-genome sequencing (WGS) rather than chromosomal microarray analysis (CMA) as a first-line genetic diagnostic test. Methods We analyzed three cohorts with short-read WGS: (i) a retrospective cohort with validated copy number variants (CNVs) (cohort 1, n = 68), (ii) individuals referred for monogenic multi-gene panels (cohort 2, n = 156), and (iii) 100 prospective, consecutive cases referred to our center for CMA (cohort 3). Bioinformatic tools developed include FindSV, SVDB, Rhocall, Rhoviz, and vcf2cytosure. Results First, we validated our structural variant (SV)-calling pipeline on cohort 1, consisting of three trisomies and 79 deletions and duplications with a median size of 850 kb (min 500 bp, max 155 Mb). All variants were detected. Second, we utilized the same pipeline in cohort 2 and analyzed with monogenic WGS panels, increasing the diagnostic yield to 8%. Next, cohort 3 was analyzed by both CMA and WGS. The WGS data was processed for large (> 10 kb) SVs genome-wide and for exonic SVs and SNVs in a panel of 887 genes linked to intellectual disability as well as genes matched to patient-specific Human Phenotype Ontology (HPO) phenotypes. This yielded a total of 25 pathogenic variants (SNVs or SVs), of which 12 were detected by CMA as well. We also applied short tandem repeat (STR) expansion detection and discovered one pathologic expansion in ATXN7. Finally, a case of Prader-Willi syndrome with uniparental disomy (UPD) was validated in the WGS data. Important positional information was obtained in all cohorts. Remarkably, 7% of the analyzed cases harbored complex structural variants, as exemplified by a ring chromosome and two duplications found to be an insertional translocation and part of a cryptic unbalanced translocation, respectively. Conclusion The overall diagnostic rate of 27% was more than doubled compared to clinical microarray (12%). Using WGS, we detected a wide range of SVs with high accuracy. Since the WGS data also allowed for analysis of SNVs, UPD, and STRs, it represents a powerful comprehensive genetic test in a clinical diagnostic laboratory setting.
Collapse
|
38
|
Field MA, Burgio G, Chuah A, Al Shekaili J, Hassan B, Al Sukaiti N, Foote SJ, Cook MC, Andrews TD. Recurrent miscalling of missense variation from short-read genome sequence data. BMC Genomics 2019; 20:546. [PMID: 31307400 PMCID: PMC6631443 DOI: 10.1186/s12864-019-5863-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Background Short-read resequencing of genomes produces abundant information of the genetic variation of individuals. Due to their numerous nature, these variants are rarely exhaustively validated. Furthermore, low levels of undetected variant miscalling will have a systematic and disproportionate impact on the interpretation of individual genome sequence information, especially should these also be carried through into in reference databases of genomic variation. Results We find that sequence variation from short-read sequence data is subject to recurrent-yet-intermittent miscalling that occurs in a sequence intrinsic manner and is very sensitive to sequence read length. The miscalls arise from difficulties aligning short reads to redundant genomic regions, where the rate of sequencing error approaches the sequence diversity between redundant regions. We find the resultant miscalled variants to be sensitive to small sequence variations between genomes, and thereby are often intrinsic to an individual, pedigree, strain or human ethnic group. In human exome sequences, we identify 2–300 recurrent false positive variants per individual, almost all of which are present in public databases of human genomic variation. From the exomes of non-reference strains of inbred mice, we identify 3–5000 recurrent false positive variants per mouse – the number of which increasing with greater distance between an individual mouse strain and the reference C57BL6 mouse genome. We show that recurrently miscalled variants may be reproduced for a given genome from repeated simulation rounds of read resampling, realignment and recalling. As such, it is possible to identify more than two-thirds of false positive variation from only ten rounds of simulation. Conclusion Identification and removal of recurrent false positive variants from specific individual variant sets will improve overall data quality. Variant miscalls arising are highly sequence intrinsic and are often specific to an individual, pedigree or ethnicity. Further, read length is a strong determinant of whether given false variants will be called for any given genome – which has profound significance for cohort studies that pool datasets collected and sequenced at different points in time. Electronic supplementary material The online version of this article (10.1186/s12864-019-5863-2) contains supplementary material, which is available to authorized users.
Collapse
|
39
|
Amano Y, Akazawa Y, Yasuda J, Yoshino K, Kojima K, Kobayashi N, Matsuzaki S, Nagasaki M, Kawai Y, Minegishi N, Ishida N, Motoki N, Hachiya A, Nakazawa Y, Yamamoto M, Koike K, Takeshita T. A low-frequency IL4R locus variant in Japanese patients with intravenous immunoglobulin therapy-unresponsive Kawasaki disease. Pediatr Rheumatol Online J 2019; 17:34. [PMID: 31269967 PMCID: PMC6610867 DOI: 10.1186/s12969-019-0337-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 06/07/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Kawasaki disease (KD) is a systemic vasculitis which may be associated with coronary artery aneurysms. A notable risk factor for the development of coronary artery aneurysms is resistance to intravenous immunoglobulin (IVIG) therapy, which comprises standard treatment for the acute phase of KD. The cause of IVIG resistance in KD is largely unknown; however, the contribution of genetic factors, especially variants in immune-related genes, has been suspected. METHODS To explore genetic variants related to IVIG-unresponsiveness, we designated KD patients who did not respond to both first and second courses of IVIG therapy as IVIG-unresponsive patients. Using genomic DNA from 30 IVIG-unresponsive KD patients, we performed pooled genome sequencing targeting 39 immune-related cytokine receptor genes. RESULTS The single nucleotide variant (SNV), rs563535954 (located in the IL4R locus), was concentrated in IVIG-unresponsive KD patients. Individual genotyping showed that the minor allele of rs563535954 was present in 4/33 patients with IVIG-unresponsive KD, compared with 20/1063 individuals in the Japanese genome variation database (odds ratio = 7.19, 95% confidence interval 2.43-21.47). Furthermore, the minor allele of rs563535954 was absent in 42 KD patients who responded to IVIG treatment (P = 0.0337), indicating that a low-frequency variant, rs563535954, is associated with IVIG-unresponsiveness in KD patients. Although rs563535954 is located in the 3'-untranslated region of IL4R, there was no alternation in IL4R expression associated with the mior allele of rs563535954. However, IVIG-unresponsive patients that exhibited the minor allele of rs563535954 tended to be classified into the low-risk group (based on previously reported risk scores) for prediction of IVIG-resistance. Therefore, IVIG-unresponsiveness associated with the minor allele of rs563535954 might differ from IVIG-unresponsiveness associated with previous risk factors used to evaluate IVIG-unresponsiveness in KD. CONCLUSION These findings suggest that the SNV rs563535954 could serve as a predictive indicator of IVIG-unresponsiveness, thereby improving the sensitivity of risk scoring systems, and may aid in prevention of coronary artery lesions in KD patients.
Collapse
|
40
|
Tessier L, Côté O, Bienzle D. Sequence variant analysis of RNA sequences in severe equine asthma. PeerJ 2018; 6:e5759. [PMID: 30324028 PMCID: PMC6186407 DOI: 10.7717/peerj.5759] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Accepted: 09/15/2018] [Indexed: 12/13/2022] Open
Abstract
Background Severe equine asthma is a chronic inflammatory disease of the lung in horses similar to low-Th2 late-onset asthma in humans. This study aimed to determine the utility of RNA-Seq to call gene sequence variants, and to identify sequence variants of potential relevance to the pathogenesis of asthma. Methods RNA-Seq data were generated from endobronchial biopsies collected from six asthmatic and seven non-asthmatic horses before and after challenge (26 samples total). Sequences were aligned to the equine genome with Spliced Transcripts Alignment to Reference software. Read preparation for sequence variant calling was performed with Picard tools and Genome Analysis Toolkit (GATK). Sequence variants were called and filtered using GATK and Ensembl Variant Effect Predictor (VEP) tools, and two RNA-Seq predicted sequence variants were investigated with both PCR and Sanger sequencing. Supplementary analysis of novel sequence variant selection with VEP was based on a score of <0.01 predicted with Sorting Intolerant from Tolerant software, missense nature, location within the protein coding sequence and presence in all asthmatic individuals. For select variants, effect on protein function was assessed with Polymorphism Phenotyping 2 and screening for non-acceptable polymorphism 2 software. Sequences were aligned and 3D protein structures predicted with Geneious software. Difference in allele frequency between the groups was assessed using a Pearson’s Chi-squared test with Yates’ continuity correction, and difference in genotype frequency was calculated using the Fisher’s exact test for count data. Results RNA-Seq variant calling and filtering correctly identified substitution variants in PACRG and RTTN. Sanger sequencing confirmed that the PACRG substitution was appropriately identified in all 26 samples while the RTTN substitution was identified correctly in 24 of 26 samples. These variants of uncertain significance had substitutions that were predicted to result in loss of function and to be non-neutral. Amino acid substitutions projected no change of hydrophobicity and isoelectric point in PACRG, and a change in both for RTTN. For PACRG, no difference in allele frequency between the two groups was detected but a higher proportion of asthmatic horses had the altered RTTN allele compared to non-asthmatic animals. Discussion RNA-Seq was sensitive and specific for calling gene sequence variants in this disease model. Even moderate coverage (<10–20 counts per million) yielded correct identification in 92% of samples, suggesting RNA-Seq may be suitable to detect sequence variants in low coverage samples. The impact of amino acid alterations in PACRG and RTTN proteins, and possible association of the sequence variants with asthma, is of uncertain significance, but their role in ciliary function may be of future interest.
Collapse
|
41
|
Totomoch-Serra A, Muñoz MDL, Burgueño J, Revilla-Monsalve MC, Perez-Muñoz A, Diaz-Badillo Á. The ADRA2A rs553668 variant is associated with type 2 diabetes and five variants were associated at nominal significance levels in a population-based case-control study from Mexico City. Gene 2018; 669:28-34. [PMID: 29800730 DOI: 10.1016/j.gene.2018.05.078] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 05/11/2018] [Accepted: 05/21/2018] [Indexed: 02/07/2023]
Abstract
Type 2 diabetes (T2D) is a disease with a prevalence of 9.4% in Mexicans. Its etiology is complex involving environmental and genetic factors. The aim of this study was to analyse the association between PPARG rs1801282, PPARGC1A rs8192678, VEGFA rs2010963, ADRA2A rs553668, KCNQ1 rs2237892, SIRT1 rs7896005, IGF2BP2 rs4402960, and UCP3 rs3781907 single nucleotide variants (SNVs) with T2D and metabolic traits in a case-control study of a population from Mexico City. A total of 831 blood samples of non-diabetic, with healthy control participants (416) and individuals with T2D (415) were collected over a five-year period. After DNA extraction, genotyping was performed with TaqMan probes using real-time PCR. The genotypes were analysed for association with T2D in linear and logistic regressions adjusting for age, sex, and body mass index using the dominant, recessive, and additive models with a Bonferroni correction for multiple comparisons p < 0.001 and for association with related T2D traits fixed with a p < 2.3 × 10-4. The univariate analysis gives a significant (p < 1 × 10-4) for sex, triglycerides, and HOMA-IR. Significant association with T2D was found for ADRA2A rs553668 under the recessive model (OR = 3.640 and 95% CI of 2.330-5.690 (p < 1 × 10-4); statistical power 0.999) and under the additive model (OR = 1.640 and 95% CI of 1.340-2.000 (p < 1 × 10-4); statistical power 0.997). Variants PPARG rs1801282, PPARGC1A rs8192678, SIRT1 rs7896005, IGF2BP2 rs4402960 and UCP3 rs3781907 were nominally associated (p > 0.001 and <0.050). Results describe association of ADRA2A rs553668 with T2D in a Mexican population. Variants with nominal association with T2D require to be replicated in additional Mexican populations.
Collapse
|
42
|
Ainslie-Garcia MH, Farzan A, Jafarikia M, Lillie BN. Single nucleotide variants in innate immune genes associated with Salmonella shedding and colonization in swine on commercial farms. Vet Microbiol 2018; 219:171-177. [PMID: 29778193 DOI: 10.1016/j.vetmic.2018.04.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2018] [Revised: 04/10/2018] [Accepted: 04/11/2018] [Indexed: 01/05/2023]
Abstract
Foodborne human salmonellosis is an important food safety concern worldwide. Food-producing animals are one of the major sources of human salmonellosis, and thus control of Salmonella at the farm level could reduce Salmonella spread in the food supply system. Genetic selection of pigs with resistance to Salmonella infection may be one way to control Salmonella on swine farms. The objective of this study was to investigate the association between genetic variants in the porcine innate immune system with on-farm Salmonella shedding and Salmonella colonization tested at slaughter. Fourteen groups of pigs (total 809) were followed from birth to slaughter. Fecal samples collected five times at different stages of production and tissue samples obtained from tonsil and lymph nodes at slaughter were cultured for Salmonella. Genomic DNA was extracted and analyzed for 40 single nucleotide variants and two indels within porcine innate immune genes that were previously associated with Salmonella infection or other infectious diseases. A survey was used to collect information on farm management practices. A multilevel mixed-effects logistic regression modelling method was used to identify SNVs that are associated with Salmonella shedding and/or Salmonella colonization. One single nucleotide variant in the C-type lectin MBL1 and one single nucleotide variant in the cytosolic pattern recognition receptor NOD1 was associated with increased risk of on-farm shedding (p = 0.010) and internal colonization tested at slaughter (p = 0.018), respectively. These findings indicate the potential of these variants for genetic selection programs aimed at controlling Salmonella shedding and colonization in pigs.
Collapse
|
43
|
Sendorek DH, Caloian C, Ellrott K, Bare JC, Yamaguchi TN, Ewing AD, Houlahan KE, Norman TC, Margolin AA, Stuart JM, Boutros PC. Germline contamination and leakage in whole genome somatic single nucleotide variant detection. BMC Bioinformatics 2018; 19:28. [PMID: 29385983 PMCID: PMC5793408 DOI: 10.1186/s12859-018-2046-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 01/24/2018] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. RESULTS The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. CONCLUSIONS The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.
Collapse
|
44
|
Strouhal M, Oppelt J, Mikalová L, Arora N, Nieselt K, González-Candelas F, Šmajs D. Reanalysis of Chinese Treponema pallidum samples: all Chinese samples cluster with SS14-like group of syphilis-causing treponemes. BMC Res Notes 2018; 11:16. [PMID: 29325576 PMCID: PMC5765698 DOI: 10.1186/s13104-017-3106-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Accepted: 12/19/2017] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Treponema pallidum subsp. pallidum (TPA) is the causative agent of syphilis. Genetic analyses of TPA reference strains and human clinical isolates have revealed two genetically distinct groups of syphilis-causing treponemes, called Nichols-like and SS14-like groups. So far, no genetic intermediates, i.e. strains containing a mixed pattern of Nichols-like and SS14-like genomic sequences, have been identified. Recently, Sun et al. (Oncotarget 2016. https://doi.org/10.18632/oncotarget.10154 ) described a new "phylogenetic group" (called Lineage 2) among Chinese TPA strains. This lineage exhibited a "mosaic genomic structure" of Nichols-like and SS14-like lineages. RESULTS We reanalyzed the primary sequencing data (Project Number PRJNA305961) from the Sun et al. publication with respect to the molecular basis of Lineage 2. While Sun et al. based the analysis on several selected genomic single nucleotide variants (SNVs) and a subset of highly variable but phylogenetically poorly informative genes, which may confound the phylogenetic analysis, our reanalysis primarily focused on a complete set of whole genomic SNVs. Based on our reanalysis, only two separate TPA clusters were identified: one consisted of Nichols-like TPA strains, the other was formed by the SS14-like TPA strains, including all Chinese strains.
Collapse
|
45
|
Guo Y, Zhao S, Sheng Q, Samuels DC, Shyr Y. The discrepancy among single nucleotide variants detected by DNA and RNA high throughput sequencing data. BMC Genomics 2017; 18:690. [PMID: 28984205 PMCID: PMC5629567 DOI: 10.1186/s12864-017-4022-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND High throughput sequencing technology enables the both the human genome and transcriptome to be screened at the single nucleotide resolution. Tools have been developed to infer single nucleotide variants (SNVs) from both DNA and RNA sequencing data. To evaluate how much difference can be expected between DNA and RNA sequencing data, and among tissue sources, we designed a study to examine the single nucleotide difference among five sources of high throughput sequencing data generated from the same individual, including exome sequencing from blood, tumor and adjacent normal tissue, and RNAseq from tumor and adjacent normal tissue. RESULTS Through careful quality control and analysis of the SNVs, we found little difference between DNA-DNA pairs (1%-2%). However, between DNA-RNA pairs, SNV differences ranged anywhere from 10% to 20%. CONCLUSIONS Only a small portion of these differences can be explained by RNA editing. Instead, the majority of the DNA-RNA differences should be attributed to technical errors from sequencing and post-processing of RNAseq data. Our analysis results suggest that SNV detection using RNAseq is subject to high false positive rates.
Collapse
|
46
|
Rustagi N, Zhou A, Watkins WS, Gedvilaite E, Wang S, Ramesh N, Muzny D, Gibbs RA, Jorde LB, Yu F, Xing J. Extremely low-coverage whole genome sequencing in South Asians captures population genomics information. BMC Genomics 2017; 18:396. [PMID: 28532386 PMCID: PMC5440948 DOI: 10.1186/s12864-017-3767-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 05/07/2017] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. RESULTS South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. CONCLUSIONS Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.
Collapse
|
47
|
Jian X, Liu X. In Silico Prediction of Deleteriousness for Nonsynonymous and Splice-Altering Single Nucleotide Variants in the Human Genome. Methods Mol Biol 2017; 1498:191-197. [PMID: 27709577 DOI: 10.1007/978-1-4939-6472-7_13] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In silico prediction methods have increasingly been valuable and popular in molecular biology, especially in human genetics, for deleteriousness prediction to filter and prioritize huge amounts of DNA variation identified by sequencing human genomes. There is a rich collection of available methods developed upon different levels/aspects of knowledge about how DNA variations affect gene expression. Given the fact that their predictions are not always consistent or even opposite of what was expected, using consensus prediction or majority vote among these methods is preferred to trusting any single one. Because querying different databases for different methods is both tedious and time-consuming for such big data sets, one database integrating predictions from multiple databases can facilitate the process. In this chapter, we describe the general steps of obtaining comprehensive predictions and annotations for large numbers of variants from dbNSFP, the first and probably the most widely used database of its kind.
Collapse
|
48
|
Fleming DS, Koltes JE, Fritz-Waters ER, Rothschild MF, Schmidt CJ, Ashwell CM, Persia ME, Reecy JM, Lamont SJ. Single nucleotide variant discovery of highly inbred Leghorn and Fayoumi chicken breeds using pooled whole genome resequencing data reveals insights into phenotype differences. BMC Genomics 2016; 17:812. [PMID: 27760519 PMCID: PMC5070165 DOI: 10.1186/s12864-016-3147-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Accepted: 10/05/2016] [Indexed: 11/22/2022] Open
Abstract
Background Analyses of sequence variants of two distinct and highly inbred chicken lines allowed characterization of genomic variation that may be associated with phenotypic differences between breeds. These lines were the Leghorn, the major contributing breed to commercial white-egg production lines, and the Fayoumi, representative of an outbred indigenous and robust breed. Unique within- and between-line genetic diversity was used to define the genetic differences of the two breeds through the use of variant discovery and functional annotation. Results Downstream fixation test (FST) analysis and subsequent gene ontology (GO) enrichment analysis elucidated major differences between the two lines. The genes with high FST values for both breeds were used to identify enriched gene ontology terms. Over-enriched GO annotations were uncovered for functions indicative of breed-related traits of pathogen resistance and reproductive ability for Fayoumi and Leghorn, respectively. Conclusions Variant analysis elucidated GO functions indicative of breed-predominant phenotypes related to genomic variation in the lines, showing a possible link between the genetic variants and breed traits. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3147-7) contains supplementary material, which is available to authorized users.
Collapse
|
49
|
González-Peñas J, Amigo J, Santomé L, Sobrino B, Brenlla J, Agra S, Paz E, Páramo M, Carracedo Á, Arrojo M, Costas J. Targeted resequencing of regulatory regions at schizophrenia risk loci: Role of rare functional variants at chromatin repressive states. Schizophr Res 2016; 174:10-16. [PMID: 27066855 DOI: 10.1016/j.schres.2016.03.029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 03/24/2016] [Indexed: 12/30/2022]
Abstract
There is mounting evidence that regulatory variation plays an important role in genetic risk for schizophrenia. Here, we specifically search for regulatory variants at risk by sequencing promoter regions of twenty-three genes implied in schizophrenia by copy number variant or genome-wide association studies. After strict quality control, a total of 55,206bp per sample were analyzed in 526 schizophrenia cases and 516 controls from Galicia, NW Spain, using the Applied Biosystems SOLiD System. Variants were filtered based on frequency from public databases, chromatin states from the RoadMap Epigenomics Consortium at tissues relevant for schizophrenia, such as fetal brain, mid-frontal lobe, and angular gyrus, and prediction of functionality from RegulomeDB. The proportion of rare variants at polycomb repressive chromatin state at relevant tissues was higher in cases than in controls. The proportion of rare variants with predicted regulatory role was significantly higher in cases than in controls (P=0.0028, OR=1.93, 95% C.I.=1.23-3.04). Combination of information from both sources led to the identification of an excess of carriers of rare variants with predicted regulatory role located at polycomb repressive chromatin state at relevant tissues in cases versus controls (P=0.0016, OR=19.34, 95% C.I.=2.45-2495.26). The variants are located at two genes affected by the 17q12 copy number variant, LHX1 and HNF1B. These data strongly suggest that a specific epigenetic mechanism, chromatin remodeling by histone modification during early development, may be impaired in a subset of schizophrenia patients, in agreement with previous data.
Collapse
|
50
|
Yu Y, Wu T, Johnson-Buck A, Li L, Su X. A two-layer assay for single-nucleotide variants utilizing strand displacement and selective digestion. Biosens Bioelectron 2016; 82:248-54. [PMID: 27100949 DOI: 10.1016/j.bios.2016.03.070] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2016] [Revised: 03/21/2016] [Accepted: 03/28/2016] [Indexed: 11/17/2022]
Abstract
Point mutations have emerged as prominent biomarkers for disease diagnosis, particularly in the case of cancer. Discovering single-nucleotide variants (SNVs) is also of great importance for the identification of single-nucleotide polymorphisms within the population. The competing requirements of thermodynamic stability and specificity in conventional nucleic acid hybridization probes make it challenging to achieve highly precise detection of point mutants. Here, we present a fluorescence-based assay for low-abundance mutation detection based on toehold-mediated strand displacement and nuclease-mediated strand digestion that enables highly precise detection of point mutations. We demonstrate that this combined assay provides 50-1000-fold discrimination (mean value: 255) between all possible single-nucleotide mutations and their corresponding wild-type sequence for a model DNA target. Using experiments and kinetic modeling, we investigate probe properties that obtain additive benefits from both strand displacement and nucleolytic digestion, thus providing guidance for the design of enzyme-mediated nucleic acid assays in the future.
Collapse
|