Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Habegger L, Balasubramanian S, Chen DZ, Khurana E, Sboner A, Harmanci A, Rozowsky J, Clarke D, Snyder M, Gerstein M. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment. ACTA ACUST UNITED AC 2012;28:2267-9. [PMID: 22743228 DOI: 10.1093/bioinformatics/bts368] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

For:	Habegger L, Balasubramanian S, Chen DZ, Khurana E, Sboner A, Harmanci A, Rozowsky J, Clarke D, Snyder M, Gerstein M. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment. ACTA ACUST UNITED AC 2012;28:2267-9. [PMID: 22743228 DOI: 10.1093/bioinformatics/bts368] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Number

Cited by Other Article(s)

Huang S, Wu Z, Wang T, Yu R, Song Z, Wang H. MmisAT and MmisP: an efficient and accurate suite of variant analysis toolkit for primary mitochondrial diseases. Hum Genomics 2023;17:108. [PMID: 38012712 PMCID: PMC10683248 DOI: 10.1186/s40246-023-00557-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 11/22/2023] [Indexed: 11/29/2023] Open

Dall'Alba G, Casa PL, Abreu FPD, Notari DL, de Avila E Silva S. A Survey of Biological Data in a Big Data Perspective. BIG DATA 2022;10:279-297. [PMID: 35394342 DOI: 10.1089/big.2020.0383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Koppad S, B A, Gkoutos GV, Acharjee A. Cloud Computing Enabled Big Multi-Omics Data Analytics. Bioinform Biol Insights 2021;15:11779322211035921. [PMID: 34376975 PMCID: PMC8323418 DOI: 10.1177/11779322211035921] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 07/12/2021] [Indexed: 12/27/2022] Open

Hu T, Chitnis N, Monos D, Dinh A. Next-generation sequencing technologies: An overview. Hum Immunol 2021;82:801-811. [PMID: 33745759 DOI: 10.1016/j.humimm.2021.02.012] [Citation(s) in RCA: 223] [Impact Index Per Article: 74.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 02/18/2021] [Accepted: 02/23/2021] [Indexed: 12/14/2022]

Verma A, Halder A, Marathe S, Purwar R, Srivastava S. A proteogenomic approach to target neoantigens in solid tumors. Expert Rev Proteomics 2021;17:797-812. [PMID: 33491499 DOI: 10.1080/14789450.2020.1881889] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Tafazoli A, Wawrusiewicz-Kurylonek N, Posmyk R, Miltyk W. Pharmacogenomics, How to Deal with Different Types of Variants in Next Generation Sequencing Data in the Personalized Medicine Area. J Clin Med 2020;10:jcm10010034. [PMID: 33374421 PMCID: PMC7796098 DOI: 10.3390/jcm10010034] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/21/2020] [Accepted: 12/22/2020] [Indexed: 12/15/2022] Open

Pal LR, Kundu K, Yin Y, Moult J. Matching whole genomes to rare genetic disorders: Identification of potential causative variants using phenotype-weighted knowledge in the CAGI SickKids5 clinical genomes challenge. Hum Mutat 2020;41:347-362. [PMID: 31680375 PMCID: PMC7182498 DOI: 10.1002/humu.23933] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 09/26/2019] [Accepted: 10/13/2019] [Indexed: 02/06/2023]

Jiang Y, Wu C, Zhang Y, Zhang S, Yu S, Lei P, Lu Q, Xi Y, Wang H, Song Z. GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining. BMC Med Genomics 2019;12:193. [PMID: 31856831 PMCID: PMC6923899 DOI: 10.1186/s12920-019-0637-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/26/2019] [Indexed: 02/07/2023] Open

Leveraging protein dynamics to identify cancer mutational hotspots using 3D structures. Proc Natl Acad Sci U S A 2019;116:18962-18970. [PMID: 31462496 PMCID: PMC6754584 DOI: 10.1073/pnas.1901156116] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open

Rao AR, Nelson SF. Calculating the statistical significance of rare variants causal for Mendelian and complex disorders. BMC Med Genomics 2018;11:53. [PMID: 29898714 PMCID: PMC6001062 DOI: 10.1186/s12920-018-0371-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Accepted: 05/25/2018] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

With the expanding use of next-gen sequencing (NGS) to diagnose the thousands of rare Mendelian genetic diseases, it is critical to be able to interpret individual DNA variation. To calculate the significance of finding a rare protein-altering variant in a given gene, one must know the frequency of seeing a variant in the general population that is at least as damaging as the variant in question.

METHODS

We developed a general method to better interpret the likelihood that a rare variant is disease causing if observed in a given gene or genic region mapping to a described protein domain, using genome-wide information from a large control sample. Based on data from 2504 individuals in the 1000 Genomes Project dataset, we calculated the number of individuals who have a rare variant in a given gene for numerous filtering threshold scenarios, which may be used for calculating the significance of an observed rare variant being causal for disease. Additionally, we calculated mutational burden data on the number of individuals with rare variants in genic regions mapping to protein domains.

RESULTS

We describe methods to use the mutational burden data for calculating the significance of observing rare variants in a given proportion of sequenced individuals. We present SORVA, an implementation of these methods as a web tool, and we demonstrate application to 20 relevant but diverse next-gen sequencing studies. Specifically, we calculate the statistical significance of findings involving multi-family studies with rare Mendelian disease and a large-scale study of a complex disorder, autism spectrum disorder. If we use the frequency counts to rank genes based on intolerance for variation, the ranking correlates well with pLI scores derived from the Exome Aggregation Consortium (ExAC) dataset (ρ = 0.515), with the benefit that the scores are directly interpretable.

CONCLUSIONS

We have presented a strategy that is useful for vetting candidate genes from NGS studies and allows researchers to calculate the significance of seeing a variant in a given gene or protein domain. This approach is an important step towards developing a quantitative, statistics-based approach for presenting clinical findings.

Collapse

Gosalia N, Economides AN, Dewey FE, Balasubramanian S. MAPPIN: a method for annotating, predicting pathogenicity and mode of inheritance for nonsynonymous variants. Nucleic Acids Res 2017;45:10393-10402. [PMID: 28977528 PMCID: PMC5737764 DOI: 10.1093/nar/gkx730] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 08/21/2017] [Indexed: 01/24/2023] Open

Balasubramanian S, Fu Y, Pawashe M, McGillivray P, Jin M, Liu J, Karczewski KJ, MacArthur DG, Gerstein M. Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes. Nat Commun 2017;8:382. [PMID: 28851873 PMCID: PMC5575292 DOI: 10.1038/s41467-017-00443-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 06/29/2017] [Indexed: 11/09/2022] Open

Dhingra P, Fu Y, Gerstein M, Khurana E. Using FunSeq2 for Coding and Non‐Coding Variant Annotation and Prioritization. ACTA ACUST UNITED AC 2017;57:15.11.1-15.11.17. [DOI: 10.1002/cpbi.23] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Lee CR, Svardal H, Farlow A, Exposito-Alonso M, Ding W, Novikova P, Alonso-Blanco C, Weigel D, Nordborg M. On the post-glacial spread of human commensal Arabidopsis thaliana. Nat Commun 2017;8:14458. [PMID: 28181519 PMCID: PMC5309843 DOI: 10.1038/ncomms14458] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 01/03/2017] [Indexed: 02/03/2023] Open

A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data. Int J Genomics 2016;2016:7983236. [PMID: 28070503 PMCID: PMC5192301 DOI: 10.1155/2016/7983236] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 10/26/2016] [Indexed: 12/31/2022] Open

Kumar S, Clarke D, Gerstein M. Localized structural frustration for evaluating the impact of sequence variants. Nucleic Acids Res 2016;44:10062-10073. [PMID: 27915290 PMCID: PMC5137452 DOI: 10.1093/nar/gkw927] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 09/30/2016] [Accepted: 10/14/2016] [Indexed: 12/13/2022] Open

eMERGE Phenome-Wide Association Study (PheWAS) identifies clinical associations and pleiotropy for stop-gain variants. BMC Med Genomics 2016;9 Suppl 1:32. [PMID: 27535653 PMCID: PMC4989894 DOI: 10.1186/s12920-016-0191-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

We explored premature stop-gain variants to test the hypothesis that variants, which are likely to have a consequence on protein structure and function, will reveal important insights with respect to the phenotypes associated with them. We performed a phenome-wide association study (PheWAS) exploring the association between a selected list of functional stop-gain genetic variants (variation resulting in truncated proteins or in nonsense-mediated decay) and an extensive group of diagnoses to identify novel associations and uncover potential pleiotropy.

RESULTS

In this study, we selected 25 stop-gain variants: 5 stop-gain variants with previously reported phenotypic associations, and a set of 20 putative stop-gain variants identified using dbSNP. For the PheWAS, we used data from the electronic MEdical Records and GEnomics (eMERGE) Network across 9 sites with a total of 41,057 unrelated patients. We divided all these samples into two datasets by equal proportion of eMERGE site, sex, race, and genotyping platform. We calculated single effect associations between these 25 stop-gain variants and ICD-9 defined case-control diagnoses. We also performed stratified analyses for samples of European and African ancestry. Associations were adjusted for sex, site, genotyping platform and the first three principal components to account for global ancestry. We identified previously known associations, such as variants in LPL associated with hyperglyceridemia indicating that our approach was robust. We also found a total of three significant associations with p < 0.01 in both datasets, with the most significant replicating result being LPL SNP rs328 and ICD-9 code 272.1 "Disorder of Lipoid metabolism" (pdiscovery = 2.59x10-6, preplicating = 2.7x10-4). The other two significant replicated associations identified by this study are: variant rs1137617 in KCNH2 gene associated with ICD-9 code category 244 "Acquired Hypothyroidism" (pdiscovery = 5.31x103, preplicating = 1.15x10-3) and variant rs12060879 in DPT gene associated with ICD-9 code category 996 "Complications peculiar to certain specified procedures" (pdiscovery = 8.65x103, preplicating = 4.16x10-3).

CONCLUSION

In conclusion, this PheWAS revealed novel associations of stop-gained variants with interesting phenotypes (ICD-9 codes) along with pleiotropic effects.

Collapse

Lelieveld SH, Veltman JA, Gilissen C. Novel bioinformatic developments for exome sequencing. Hum Genet 2016;135:603-14. [PMID: 27075447 PMCID: PMC4883269 DOI: 10.1007/s00439-016-1658-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 03/15/2016] [Indexed: 01/19/2023]

Lelieveld SH, Veltman JA, Gilissen C. Novel bioinformatic developments for exome sequencing. Hum Genet 2016. [PMID: 27075447 DOI: 10.1007/s00439‐016‐1658‐6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Calabrese B, Cannataro M. Bioinformatics and Microarray Data Analysis on the Cloud. Methods Mol Biol 2016;1375:25-39. [PMID: 25863787 DOI: 10.1007/7651_2015_236] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Fu Y, Liu Z, Lou S, Bedford J, Mu XJ, Yip KY, Khurana E, Gerstein M. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol 2015;15:480. [PMID: 25273974 PMCID: PMC4203974 DOI: 10.1186/s13059-014-0480-5] [Citation(s) in RCA: 226] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2014] [Indexed: 12/15/2022] Open

Yang H, Wang K. Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat Protoc 2015;10:1556-66. [PMID: 26379229 DOI: 10.1038/nprot.2015.105] [Citation(s) in RCA: 599] [Impact Index Per Article: 66.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Yang H, Robinson PN, Wang K. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases. Nat Methods 2015;12:841-3. [PMID: 26192085 PMCID: PMC4718403 DOI: 10.1038/nmeth.3484] [Citation(s) in RCA: 263] [Impact Index Per Article: 29.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 05/18/2015] [Indexed: 12/21/2022]

Frankish A, Uszczynska B, Ritchie GRS, Gonzalez JM, Pervouchine D, Petryszak R, Mudge JM, Fonseca N, Brazma A, Guigo R, Harrow J. Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction. BMC Genomics 2015;16 Suppl 8:S2. [PMID: 26110515 PMCID: PMC4502323 DOI: 10.1186/1471-2164-16-s8-s2] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Abstract

Background

A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based.

Results

We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most ~30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome.

Conclusions

The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.

Collapse

Podicheti R, Mockaitis K. FEATnotator: A tool for integrated annotation of sequence features and variation, facilitating interpretation in genomics experiments. Methods 2015;79-80:11-7. [PMID: 25934264 DOI: 10.1016/j.ymeth.2015.04.028] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Revised: 03/25/2015] [Accepted: 04/22/2015] [Indexed: 11/16/2022] Open

Li MJ, Deng J, Wang P, Yang W, Ho SL, Sham PC, Wang J, Li M. wKGGSeq: A Comprehensive Strategy-Based and Disease-Targeted Online Framework to Facilitate Exome Sequencing Studies of Inherited Disorders. Hum Mutat 2015;36:496-503. [PMID: 25676918 DOI: 10.1002/humu.22766] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2014] [Accepted: 02/03/2015] [Indexed: 12/19/2022]

Next-generation sequencing data analysis on cloud computing. Genes Genomics 2015. [DOI: 10.1007/s13258-015-0280-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Scheuch M, Höper D, Beer M. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets. BMC Bioinformatics 2015;16:69. [PMID: 25886935 PMCID: PMC4351923 DOI: 10.1186/s12859-015-0503-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 02/20/2015] [Indexed: 01/28/2023] Open

Ritchie GRS, Flicek P. Computational approaches to interpreting genomic sequence variation. Genome Med 2014;6:87. [PMID: 25473426 PMCID: PMC4254438 DOI: 10.1186/s13073-014-0087-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open

Li MJ, Wang J. Current trend of annotating single nucleotide variation in humans--A case study on SNVrap. Methods 2014;79-80:32-40. [PMID: 25308971 DOI: 10.1016/j.ymeth.2014.10.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Revised: 09/25/2014] [Accepted: 10/02/2014] [Indexed: 12/16/2022] Open

Ho ED, Cao Q, Lee SD, Yip KY. VAS: a convenient web portal for efficient integration of genomic features with millions of genetic variants. BMC Genomics 2014;15:886. [PMID: 25306238 PMCID: PMC4210471 DOI: 10.1186/1471-2164-15-886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 10/03/2014] [Indexed: 12/29/2022] Open

Abstract

Background

High-throughput experimental methods have fostered the systematic detection of millions of genetic variants from any human genome. To help explore the potential biological implications of these genetic variants, software tools have been previously developed for integrating various types of information about these genomic regions from multiple data sources. Most of these tools were designed either for studying a small number of variants at a time, or for local execution on powerful machines.

Results

To make exploration of whole lists of genetic variants simple and accessible, we have developed a new Web-based system called VAS (Variant Annotation System, available at https://yiplab.cse.cuhk.edu.hk/vas/). It provides a large variety of information useful for studying both coding and non-coding variants, including whole-genome transcription factor binding, open chromatin and transcription data from the ENCODE consortium. By means of data compression, millions of variants can be uploaded from a client machine to the server in less than 50 megabytes of data. On the server side, our customized data integration algorithms can efficiently link millions of variants with tens of whole-genome datasets. These two enabling technologies make VAS a practical tool for annotating genetic variants from large genomic studies. We demonstrate the use of VAS in annotating genetic variants obtained from a migraine meta-analysis study and multiple data sets from the Personal Genomes Project. We also compare the running time of annotating 6.4 million SNPs of the CEU trio by VAS and another tool, showing that VAS is efficient in handling new variant lists without requiring any pre-computations.

Conclusions

VAS is specially designed to handle annotation tasks with long lists of genetic variants and large numbers of annotating features efficiently. It is complementary to other existing tools with more specific aims such as evaluating the potential impacts of genetic variants in terms of disease risk. We recommend using VAS for a quick first-pass identification of potentially interesting genetic variants, to minimize the time required for other more in-depth downstream analyses.

Collapse

Shanahan HP, Owen AM, Harrison AP. Bioinformatics on the cloud computing platform Azure. PLoS One 2014;9:e102642. [PMID: 25050811 PMCID: PMC4106841 DOI: 10.1371/journal.pone.0102642] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Accepted: 06/20/2014] [Indexed: 12/27/2022] Open

Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 2014;46:818-25. [PMID: 24974849 DOI: 10.1038/ng.3021] [Citation(s) in RCA: 486] [Impact Index Per Article: 48.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2013] [Accepted: 06/06/2014] [Indexed: 12/16/2022]

Dall'Olio GM, Bertranpetit J, Wagner A, Laayouni H. Human genome variation and the concept of genotype networks. PLoS One 2014;9:e99424. [PMID: 24911413 PMCID: PMC4049842 DOI: 10.1371/journal.pone.0099424] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Accepted: 05/14/2014] [Indexed: 12/29/2022] Open

Abstract

Genotype networks are a concept used in systems biology to study sets of genotypes having the same phenotype, and the ability of these to bring forth novel phenotypes. In the past they have been applied to determine the genetic heterogeneity, and stability to mutations, of systems such as metabolic networks and RNA folds. Recently, they have been the base for reconciling the neutralist and selectionist views on evolution. Here, we adapted this concept to the study of population genetics data. Specifically, we applied genotype networks to the human 1000 genomes dataset, and analyzed networks composed of short haplotypes of Single Nucleotide Variants (SNV). The result is a scan of how properties related to genetic heterogeneity and stability to mutations are distributed along the human genome. We found that genes involved in acquired immunity, such as some HLA and MHC genes, tend to have the most heterogeneous and connected networks, and that coding regions tend to be more heterogeneous and stable to mutations than non-coding regions. We also found, using coalescent simulations, that regions under selection have more extended and connected networks. The application of the concept of genotype networks can provide a new opportunity to understand the evolutionary processes that shaped our genome. Learning how the genotype space of each region of our genome has been explored during the evolutionary history of the human species can lead to a better understanding on how selective pressures and neutral factors have shaped genetic diversity within populations and among individuals. Combined with the availability of larger datasets of sequencing data, genotype networks represent a new approach to the study of human genetic diversity that looks to the whole genome, and goes beyond the classical division between selection and neutrality methods.

Collapse

Jäger M, Wang K, Bauer S, Smedley D, Krawitz P, Robinson PN. Jannovar: a java library for exome annotation. Hum Mutat 2014;35:548-55. [PMID: 24677618 DOI: 10.1002/humu.22531] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 02/11/2014] [Indexed: 01/03/2023]

McCarthy DJ, Humburg P, Kanapin A, Rivas MA, Gaulton K, Cazier JB, Donnelly P. Choice of transcripts and software has a large effect on variant annotation. Genome Med 2014;6:26. [PMID: 24944579 PMCID: PMC4062061 DOI: 10.1186/gm543] [Citation(s) in RCA: 131] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Accepted: 03/20/2014] [Indexed: 12/19/2022] Open

Abstract

Background

Variant annotation is a crucial step in the analysis of genome sequencing data. Functional annotation results can have a strong influence on the ultimate conclusions of disease studies. Incorrect or incomplete annotations can cause researchers both to overlook potentially disease-relevant DNA variants and to dilute interesting variants in a pool of false positives. Researchers are aware of these issues in general, but the extent of the dependency of final results on the choice of transcripts and software used for annotation has not been quantified in detail.

Methods

This paper quantifies the extent of differences in annotation of 80 million variants from a whole-genome sequencing study. We compare results using the RefSeq and Ensembl transcript sets as the basis for variant annotation with the software Annovar, and also compare the results from two annotation software packages, Annovar and VEP (Ensembl’s Variant Effect Predictor), when using Ensembl transcripts.

Results

We found only 44% agreement in annotations for putative loss-of-function variants when using the RefSeq and Ensembl transcript sets as the basis for annotation with Annovar. The rate of matching annotations for loss-of-function and nonsynonymous variants combined was 79% and for all exonic variants it was 83%. When comparing results from Annovar and VEP using Ensembl transcripts, matching annotations were seen for only 65% of loss-of-function variants and 87% of all exonic variants, with splicing variants revealed as the category with the greatest discrepancy. Using these comparisons, we characterised the types of apparent errors made by Annovar and VEP and discuss their impact on the analysis of DNA variants in genome sequencing studies.

Conclusions

Variant annotation is not yet a solved problem. Choice of transcript set can have a large effect on the ultimate variant annotations obtained in a whole-genome sequencing study. Choice of annotation software can also have a substantial effect. The annotation step in the analysis of a genome sequencing study must therefore be considered carefully, and a conscious choice made as to which transcript set and software are used for annotation.

Collapse

Lee IH, Lee K, Hsing M, Choe Y, Park JH, Kim SH, Bohn JM, Neu MB, Hwang KB, Green RC, Kohane IS, Kong SW. Prioritizing disease-linked variants, genes, and pathways with an interactive whole-genome analysis pipeline. Hum Mutat 2014;35:537-47. [PMID: 24478219 DOI: 10.1002/humu.22520] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Accepted: 01/23/2014] [Indexed: 01/02/2023]

Revolutionizing Prokaryotic Systematics Through Next-Generation Sequencing. J Microbiol Methods 2014. [DOI: 10.1016/bs.mim.2014.07.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Preeprem T, Gibson G. An association-adjusted consensus deleterious scheme to classify homozygous Mis-sense mutations for personal genome interpretation. BioData Min 2013;6:24. [PMID: 24365473 PMCID: PMC3892026 DOI: 10.1186/1756-0381-6-24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 12/17/2013] [Indexed: 11/22/2022] Open

Abstract

BACKGROUND

Personal genome analysis is now being considered for evaluation of disease risk in healthy individuals, utilizing both rare and common variants. Multiple scores have been developed to predict the deleteriousness of amino acid substitutions, using information on the allele frequencies, level of evolutionary conservation, and averaged structural evidence. However, agreement among these scores is limited and they likely over-estimate the fraction of the genome that is deleterious.

METHOD

This study proposes an integrative approach to identify a subset of homozygous non-synonymous single nucleotide polymorphisms (nsSNPs). An 8-level classification scheme is constructed from the presence/absence of deleterious predictions combined with evidence of association with disease or complex traits. Detailed literature searches and structural validations are then performed for a subset of homozygous 826 mis-sense mutations in 575 proteins found in the genomes of 12 healthy adults.

RESULTS

Implementation of the Association-Adjusted Consensus Deleterious Scheme (AACDS) classifies 11% of all predicted highly deleterious homozygous variants as most likely to influence disease risk. The number of such variants per genome ranges from 0 to 8 with no significant difference between African and Caucasian Americans. Detailed analysis of mutations affecting the APOE, MTMR2, THSB1, CHIA, αMyHC, and AMY2A proteins shows how the protein structure is likely to be disrupted, even though the associated phenotypes have not been documented in the corresponding individuals.

CONCLUSIONS

The classification system for homozygous nsSNPs provides an opportunity to systematically rank nsSNPs based on suggestive evidence from annotations and sequence-based predictions. The ranking scheme, in-depth literature searches, and structural validations of highly prioritized mis-sense mutations compliment traditional sequence-based approaches and should have particular utility for the development of individualized health profiles. An online tool reporting the AACDS score for any variant is provided at the authors' website.

Collapse

Computational approaches to identify functional genetic variants in cancer genomes. Nat Methods 2013;10:723-9. [PMID: 23900255 DOI: 10.1038/nmeth.2562] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Accepted: 06/07/2013] [Indexed: 12/13/2022]

Lescai F, Marasco E, Bacchelli C, Stanier P, Mantovani V, Beales P. Identification and validation of loss of function variants in clinical contexts. Mol Genet Genomic Med 2013;2:58-63. [PMID: 24498629 PMCID: PMC3907911 DOI: 10.1002/mgg3.42] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 09/05/2013] [Indexed: 12/20/2022] Open

Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Gümüş ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, Liluashvili V, Lipkin SM, MacArthur DG, Marth G, Muzny D, Pers TH, Ritchie GRS, Rosenfeld JA, Sisu C, Wei X, Wilson M, Xue Y, Yu F, Dermitzakis ET, Yu H, Rubin MA, Tyler-Smith C, Gerstein M. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 2013;342:1235587. [PMID: 24092746 PMCID: PMC3947637 DOI: 10.1126/science.1235587] [Citation(s) in RCA: 269] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Affiliation(s)

Ekta Khurana Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Yao Fu Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
Vincenza Colonna Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK Institute of Genetics and Biophysics, National Research Council (CNR), 80131 Naples, Italy
Xinmeng Jasmine Mu Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
Hyun Min Kang Center for Statistical Genetics, Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
Tuuli Lappalainen Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, 1211 Geneva, Switzerland Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
Andrea Sboner Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA
Lucas Lochovsky Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
Jieming Chen Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT 06520, USA
Arif Harmanci Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Jishnu Das Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
Alexej Abyzov Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Suganthi Balasubramanian Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Kathryn Beal European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Dimple Chakravarty Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA
Daniel Challis Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
Yuan Chen Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
Declan Clarke Department of Chemistry, Yale University, New Haven, CT 06520, USA
Laura Clarke European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Fiona Cunningham European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Uday S. Evani Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
Paul Flicek European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Robert Fragoza Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
Erik Garrison Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
Richard Gibbs Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
Zeynep H. Gümüş The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, 10065, USA
Javier Herrero European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Naoki Kitabayashi Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA
Yong Kong Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA Keck Biotechnology Resource Laboratory, Yale University, New Haven, CT 06511, USA
Kasper Lage Pediatric Surgical Research Laboratories, MassGeneral Hospital for Children, Massachusetts General Hospital, Boston, MA 02114, USA Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA Harvard Medical School, Boston, MA 02115, USA Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
Vaja Liluashvili The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA Department of Physiology and Biophysics, Weill Cornell Medical College, New York, NY, 10065, USA
Steven M. Lipkin Department of Medicine, Weill Cornell Medical College, New York, NY 10065, USA
Daniel G. MacArthur Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02142, USA
Gabor Marth Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
Donna Muzny Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
Tune H. Pers Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark Division of Endocrinology and Center for Basic and Translational Obesity Research, Children’s Hospital, Boston, MA 02115, USA Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
Graham R. S. Ritchie European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Jeffrey A. Rosenfeld Department of Medicine, Rutgers New Jersey Medical School, Newark, NJ 07101, USA IST/High Performance and Research Computing, Rutgers University Newark, NJ 07101, USA Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
Cristina Sisu Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
Xiaomu Wei Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA Department of Medicine, Weill Cornell Medical College, New York, NY 10065, USA
Michael Wilson Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Child Study Center, Yale University, New Haven, CT 06520, USA
Yali Xue Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
Fuli Yu Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX 77030, USA
1000 Genomes Project Consortium
Emmanouil T. Dermitzakis Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, 1211 Geneva, Switzerland Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
Haiyuan Yu Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
Mark A. Rubin Institute for Precision Medicine and the Department of Pathology and Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian Hospital, New York, NY 10065, USA
Chris Tyler-Smith Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
Mark Gerstein Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA Department of Computer Science, Yale University, New Haven, CT 06520, USA

Collapse

Dorn C, Grunert M, Sperling SR. Application of high-throughput sequencing for studying genomic variations in congenital heart disease. Brief Funct Genomics 2013;13:51-65. [PMID: 24095982 DOI: 10.1093/bfgp/elt040] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Tool for rapid annotation of microbial SNPs (TRAMS): a simple program for rapid annotation of genomic variation in prokaryotes. Antonie Van Leeuwenhoek 2013;104:431-4. [PMID: 23828175 DOI: 10.1007/s10482-013-9953-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Accepted: 06/12/2013] [Indexed: 10/26/2022]

Shen H, Li J, Zhang J, Xu C, Jiang Y, Wu Z, Zhao F, Liao L, Chen J, Lin Y, Tian Q, Papasian CJ, Deng HW. Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four Caucasians. PLoS One 2013;8:e59494. [PMID: 23577066 PMCID: PMC3618277 DOI: 10.1371/journal.pone.0059494] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2012] [Accepted: 02/14/2013] [Indexed: 12/14/2022] Open

Abstract

Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely "knock out" the corresponding genes. Across all the 44 genomes, a total of 182 genes were "knocked-out" in at least one individual genome, among which 46 genes were "knocked out" in over 30% of our samples, suggesting that a number of genes are commonly "knocked-out" in general populations. Gene ontology analysis suggested that these commonly "knocked-out" genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.

Collapse

Affiliation(s)

Hui Shen Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
Jian Li Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
Jigang Zhang Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
Chao Xu Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China
Yan Jiang Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China
Zikai Wu Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China
Fuping Zhao Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
Li Liao Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
Jun Chen Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America
Yong Lin Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China
Qing Tian Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
Christopher J. Papasian School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America
Hong-Wen Deng Center for Bioinformatics and Genomics, Department of Biostatistics and Bioinformatics, School of Public Health and Tropical Medicine, Tulane University, New Orleans, Louisiana, United States of America School of Medicine, University of Missouri-Kansas City, Kansas City, Missouri, United States of America Center of System Biomedical Sciences, University of Shanghai for Science and Technology, Shanghai, P. R. China

Collapse

Computational and bioinformatics frameworks for next-generation whole exome and genome sequencing. ScientificWorldJournal 2013;2013:730210. [PMID: 23365548 PMCID: PMC3556895 DOI: 10.1155/2013/730210] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2012] [Accepted: 11/22/2012] [Indexed: 12/28/2022] Open

Dai L, Gao X, Guo Y, Xiao J, Zhang Z. Bioinformatics clouds for big data manipulation. Biol Direct 2012. [PMID: 23190475 PMCID: PMC3533974 DOI: 10.1186/1745-6150-7-43] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Do R, Kathiresan S, Abecasis GR. Exome sequencing and complex disease: practical aspects of rare variant association studies. Hum Mol Genet 2012;21:R1-9. [PMID: 22983955 PMCID: PMC3459641 DOI: 10.1093/hmg/dds387] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2012] [Accepted: 09/07/2012] [Indexed: 11/13/2022] Open