1
|
Al-Shuhaib MBS, Hashim HO. Mastering DNA chromatogram analysis in Sanger sequencing for reliable clinical analysis. J Genet Eng Biotechnol 2023; 21:115. [PMID: 37955813 PMCID: PMC10643650 DOI: 10.1186/s43141-023-00587-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 11/06/2023] [Indexed: 11/14/2023]
Abstract
BACKGROUND Sanger dideoxy sequencing is vital in clinical analysis due to its accuracy, ability to analyze genetic markers like SNPs and STRs, capability to generate reliable DNA profiles, and its role in resolving complex clinical cases. The precision and robustness of Sanger sequencing contribute significantly to the scientific basis of clinical investigations. Though the reading of chromatograms seems to be a routine step, many errors conducted in PCR may lead to consequent limitations in the readings of AGCT peaks. These errors are possibly associated with improper DNA amplification and its subsequent interpretation of DNA sequencing files, such as noisy peaks, artifacts, and confusion between double-peak technical errors, heterozygosity, and double infection potentials. Thus, it is not feasible to read nucleic acid sequences without giving serious attention to these technical problems. To ensure the accuracy of DNA sequencing outcomes, it is also imperative to detect and rectify technical challenges that may lead to misinterpretation of the DNA sequence, resulting in errors and incongruities in subsequent analyses. SHORT CONCLUSION This overview sheds light on prominent technical concerns that can emerge prior to and during the interpretation of DNA chromatograms in Sanger sequencing, along with offering strategies to address them effectively. The significance of identifying and tackling these technical limitations during the chromatogram analysis is underscored in this review. Recognizing these concerns can aid in enhancing the quality of downstream analyses for Sanger sequencing results, which holds notable improvement in accuracy, reliability, and ability to provide crucial genetic information in clinical analysis.
Collapse
Affiliation(s)
- Mohammed Baqur S Al-Shuhaib
- Department of Animal Production, College of Agriculture, Al-Qasim Green University, Al-Qasim 8, Babil, 51001, Iraq.
| | - Hayder O Hashim
- Department of Clinical Laboratory Sciences, College of Pharmacy, University of Babylon, Babil, 51001, Iraq
| |
Collapse
|
2
|
Li Y, Patel H, Lin Y. Kmer2SNP: Reference-Free Heterozygous SNP Calling Using k-mer Frequency Distributions. Methods Mol Biol 2022; 2493:257-265. [PMID: 35751820 DOI: 10.1007/978-1-0716-2293-3_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
DNA sequencing technologies enable the generation of genetic profiles from many individuals at a rapid rate. Identifying single-nucleotide polymorphism (SNP) between biological samples is fundamental in genetics with various applications, such as disease diagnosis and associations and ancestry and relationship inference. Most methods use a species-specific reference genome for aligning raw sequenced reads for accurate SNP calling. However, high-quality reference genomes may not be available for all species. Therefore, we developed a reference-free algorithm, Kmer2SNP, to identify heterozygous SNPs from raw sequenced reads to facilitate genetic studies in species without the reference genome. Kmer2SNP first calculates the k-mer frequency distribution from reads to determine k-mers containing heterozygous SNPs. Next, these k-mers are rapidly matched with each other to identify pairs of exact heterozygous k-mers that belong to one of the two possible haplotypes in a diploid organism. Finally, using overlapping neighboring k-mers, weights are assigned for SNP assignments; higher weights increase SNP discovery confidence.
Collapse
Affiliation(s)
- Yanbo Li
- School of Computing, Australian National University, Canberra, ACT, Australia
| | - Hardip Patel
- The John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia.
| | - Yu Lin
- School of Computing, Australian National University, Canberra, ACT, Australia
| |
Collapse
|
3
|
Bennett EP, Petersen BL, Johansen IE, Niu Y, Yang Z, Chamberlain CA, Met Ö, Wandall HH, Frödin M. INDEL detection, the 'Achilles heel' of precise genome editing: a survey of methods for accurate profiling of gene editing induced indels. Nucleic Acids Res 2020; 48:11958-11981. [PMID: 33170255 PMCID: PMC7708060 DOI: 10.1093/nar/gkaa975] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 10/05/2020] [Accepted: 10/15/2020] [Indexed: 12/11/2022] Open
Abstract
Advances in genome editing technologies have enabled manipulation of genomes at the single base level. These technologies are based on programmable nucleases (PNs) that include meganucleases, zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated 9 (Cas9) nucleases and have given researchers the ability to delete, insert or replace genomic DNA in cells, tissues and whole organisms. The great flexibility in re-designing the genomic target specificity of PNs has vastly expanded the scope of gene editing applications in life science, and shows great promise for development of the next generation gene therapies. PN technologies share the principle of inducing a DNA double-strand break (DSB) at a user-specified site in the genome, followed by cellular repair of the induced DSB. PN-elicited DSBs are mainly repaired by the non-homologous end joining (NHEJ) and the microhomology-mediated end joining (MMEJ) pathways, which can elicit a variety of small insertion or deletion (indel) mutations. If indels are elicited in a protein coding sequence and shift the reading frame, targeted gene knock out (KO) can readily be achieved using either of the available PNs. Despite the ease by which gene inactivation in principle can be achieved, in practice, successful KO is not only determined by the efficiency of NHEJ and MMEJ repair; it also depends on the design and properties of the PN utilized, delivery format chosen, the preferred indel repair outcomes at the targeted site, the chromatin state of the target site and the relative activities of the repair pathways in the edited cells. These variables preclude accurate prediction of the nature and frequency of PN induced indels. A key step of any gene KO experiment therefore becomes the detection, characterization and quantification of the indel(s) induced at the targeted genomic site in cells, tissues or whole organisms. In this survey, we briefly review naturally occurring indels and their detection. Next, we review the methods that have been developed for detection of PN-induced indels. We briefly outline the experimental steps and describe the pros and cons of the various methods to help users decide a suitable method for their editing application. We highlight recent advances that enable accurate and sensitive quantification of indel events in cells regardless of their genome complexity, turning a complex pool of different indel events into informative indel profiles. Finally, we review what has been learned about PN-elicited indel formation through the use of the new methods and how this insight is helping to further advance the genome editing field.
Collapse
Affiliation(s)
- Eric Paul Bennett
- Copenhagen Center for Glycomics, Department of Odontology and Molecular and Cellular Medicine, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Bent Larsen Petersen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871 Frederiksberg C, Denmark
| | - Ida Elisabeth Johansen
- Department of Plant and Environmental Sciences, University of Copenhagen, DK-1871 Frederiksberg C, Denmark
| | - Yiyuan Niu
- Biotech Research and Innovation Centre (BRIC), Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
- College of Animal Science and Technology, Northwest A&F University, Yangling Shaanxi, China
| | - Zhang Yang
- Copenhagen Center for Glycomics, Department of Odontology and Molecular and Cellular Medicine, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | | | - Özcan Met
- Center for Cancer Immune Therapy, Department of Oncology, Copenhagen University Hospital, Herlev, Denmark
- Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Hans H Wandall
- Copenhagen Center for Glycomics, Department of Odontology and Molecular and Cellular Medicine, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Morten Frödin
- Biotech Research and Innovation Centre (BRIC), Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
4
|
Kim YG, Kim MJ, Lee JS, Lee JA, Song JY, Cho SI, Park SS, Seong MW. SnackVar: An Open-Source Software for Sanger Sequencing Analysis Optimized for Clinical Use. J Mol Diagn 2020; 23:140-148. [PMID: 33246077 DOI: 10.1016/j.jmoldx.2020.11.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 10/23/2020] [Accepted: 11/10/2020] [Indexed: 01/03/2023] Open
Abstract
Despite the wide application of next-generation sequencing, Sanger sequencing still plays a necessary role in clinical laboratories. However, recent developments in the field of bioinformatics have focused mostly on next-generation sequencing, while tools for Sanger sequencing have shown little progress. In this study, SnackVar (https://github.com/Young-gonKim/SnackVar, last accessed June 22, 2020), a novel graphical user interface-based software for Sanger sequencing, was developed. All types of variants, including heterozygous insertion/deletion variants, can be identified by SnackVar with minimal user effort. The featured reference sequences of all of the genes are prestored in SnackVar, allowing for detected variants to be precisely described based on coding DNA references according to the nomenclature of the Human Genome Variation Society. Among 88 previously reported variants from four insertion/deletion-rich genes (BRCA1, APC, CALR, and CEBPA), the result of SnackVar agreed with reported results in 87 variants [98.9% (93.0%; 99.9%)]. The cause of one incorrect variant calling was proven to be erroneous base callings from poor-quality trace files. Compared with commercial software, SnackVar required less than one-half of the time taken for the analysis of a selected set of test cases. We expect SnackVar to be a cost-effective option for clinical laboratories performing Sanger sequencing.
Collapse
Affiliation(s)
- Young-Gon Kim
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Man Jin Kim
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Jee-Soo Lee
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Jung Ae Lee
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Ji Yun Song
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Sung Im Cho
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Sung-Sup Park
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul, Republic of Korea
| | - Moon-Woo Seong
- Department of Laboratory Medicine, Seoul National University Hospital, Seoul, Republic of Korea.
| |
Collapse
|
5
|
Malalla ZH, Al-Serri AE, AlAskar HM, Al-Kandari WY, Al-Bustan SA. Sequence analysis and variant identification at the APOC3 gene locus indicates association of rs5218 with BMI in a sample of Kuwaiti's. Lipids Health Dis 2019; 18:224. [PMID: 31856839 PMCID: PMC6921598 DOI: 10.1186/s12944-019-1165-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 12/03/2019] [Indexed: 11/10/2022] Open
Abstract
Background APOC3 is important in lipid transport and metabolism with limited studies reporting genetic sequence variations in specific ethnic groups. The present study aimed to analyze the full APOC3 sequence among Kuwaiti Arabs and test the association of selected variants with lipid levels and BMI. Methods Variants were identified by Sanger sequencing the entire APOC3 gene in 100 Kuwaiti Arabs. Variants and their genotypes were fully characterized and used to construct haplotype blocks. Four variants (rs5128, rs2854117, rs2070668, KUAPOC3N3 g.5196 A > G) were selected for testing association with serum lipid levels and BMI in a cohort (n = 733). Results APOC3 sequence (4.3 kb) of a Kuwaiti Arab was deposited in Genbank (accession number KJ437193). Forty-two variants including 3 novels were identified including an “A” insertion at genomic positions 116,700,599–116,700,600 (promoter region) and two substitutions in intron 1 at genomic positions 116,700,819 and 116,701,159. Only three variants, (rs5128, rs2854117, and rs2070668) were analyzed for association of which rs5128 showed a trend for association with increased BMI, TG and VLDL levels that was further investigated using multivariate analysis. A significant association of rs5128 with BMI (p < 0.05) was observed following a dominant genetic model with increased risk by an OR of 4.022 (CI: 1.13–14.30). Conclusion The present study is the first to report sequence analysis of APOC3 in an Arab ethnic group. This study supports the inclusion of rs5128 as a marker for assessing genetic risk to dyslipidemia and obesity and the inclusion of the novel variant g.5196 A > G for population stratification of Arabs.
Collapse
Affiliation(s)
- Zainab H Malalla
- Department of Biological Sciences, Faculty of Science, Kuwait University, Kuwait, Kuwait
| | - Ahmad E Al-Serri
- Human Genetics Unit, Department of Pathology, Faculty of Medicine, Kuwait University, Hawally, Kuwait
| | - Huda M AlAskar
- Department of Biological Sciences, Faculty of Science, Kuwait University, Kuwait, Kuwait
| | - Wafaa Y Al-Kandari
- Department of Biological Sciences, Faculty of Science, Kuwait University, Kuwait, Kuwait
| | - Suzanne A Al-Bustan
- Department of Biological Sciences, Faculty of Science, Kuwait University, Kuwait, Kuwait.
| |
Collapse
|
6
|
Tang M, Hasan MS, Zhu H, Zhang L, Wu X. vi-HMM: a novel HMM-based method for sequence variant identification in short-read data. Hum Genomics 2019; 13:9. [PMID: 30795817 PMCID: PMC6387560 DOI: 10.1186/s40246-019-0194-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 01/29/2019] [Indexed: 12/30/2022] Open
Abstract
Background Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD). Results and conclusion We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short-read data. This method allows transitions between hidden states (defined as “SNP,” “Ins,” “Del,” and “Match”) of adjacent genomic bases and determines an optimal hidden state path by using the Viterbi algorithm. The inferred hidden state path provides a direct solution to the identification of SNPs and INDELs. Simulation studies show that, under various sequencing depths, vi-HMM outperforms commonly used variant calling methods in terms of sensitivity and F1 score. When applied to the real data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs. Electronic supplementary material The online version of this article (10.1186/s40246-019-0194-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Man Tang
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, Blacksburg, 24061, VA, USA
| | - Mohammad Shabbir Hasan
- Department of Computer Science, Virginia Tech, 225 Stanger Street, Blacksburg, 24060, VA, USA
| | - Hongxiao Zhu
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, Blacksburg, 24061, VA, USA
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, 225 Stanger Street, Blacksburg, 24060, VA, USA
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, Blacksburg, 24061, VA, USA.
| |
Collapse
|
7
|
Lo E, Bonizzoni M, Hemming-Schroeder E, Ford A, Janies DA, James AA, Afrane Y, Etemesi H, Zhou G, Githeko A, Yan G. Selection and Utility of Single Nucleotide Polymorphism Markers to Reveal Fine-Scale Population Structure in Human Malaria Parasite Plasmodium falciparum. Front Ecol Evol 2018. [DOI: 10.3389/fevo.2018.00145] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
|
8
|
Guo F, Wang D, Wang L. Progressive approach for SNP calling and haplotype assembly using single molecular sequencing data. Bioinformatics 2018; 34:2012-2018. [DOI: 10.1093/bioinformatics/bty059] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 02/17/2018] [Indexed: 12/30/2022] Open
Affiliation(s)
- Fei Guo
- School of Computer Science and Technology, Tianjin University, Tianjin Haihe Education Park, Tianjin, China
| | - Dan Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Lusheng Wang
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
- University of Hong Kong Shenzhen Research Institute, Shenzhen Hi-Tech Industrial Park, Shenzhen, Guangdong, China
| |
Collapse
|
9
|
Worthey EA. Analysis and Annotation of Whole-Genome or Whole-Exome Sequencing Derived Variants for Clinical Diagnosis. ACTA ACUST UNITED AC 2017; 95:9.24.1-9.24.28. [PMID: 29044471 DOI: 10.1002/cphg.49] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Over the last 10 years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing or analysis (given access to appropriate tools), but rather clinical interpretation. Interpretation of genetic findings in a complex and ever changing clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires application of appropriate interpretation tools, as well as development and application of appropriate methodologies and standard procedures. This unit provides an overview of these items. Specific challenges related to implementation of genome-wide sequencing in a clinical setting are discussed. © 2017 by John Wiley & Sons, Inc.
Collapse
|
10
|
Kanchi KL, Johnson KJ, Lu C, McLellan MD, Leiserson MDM, Wendl MC, Zhang Q, Koboldt DC, Xie M, Kandoth C, McMichael JF, Wyczalkowski MA, Larson DE, Schmidt HK, Miller CA, Fulton RS, Spellman PT, Mardis ER, Druley TE, Graubert TA, Goodfellow PJ, Raphael BJ, Wilson RK, Ding L. Integrated analysis of germline and somatic variants in ovarian cancer. Nat Commun 2016; 5:3156. [PMID: 24448499 PMCID: PMC4025965 DOI: 10.1038/ncomms4156] [Citation(s) in RCA: 225] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2013] [Accepted: 12/19/2013] [Indexed: 01/05/2023] Open
Abstract
We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyse germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2 and PALB2. In addition, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B and MLL3). Evidence for loss of heterozygosity was found in 100 and 76% of cases with germline BRCA1 and BRCA2 truncations, respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 222 candidate functional germline truncation and missense variants, including two pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK and MLL pathways.
Collapse
Affiliation(s)
- Krishna L Kanchi
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2]
| | - Kimberly J Johnson
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2] Brown School, Washington University, St. Louis, Missouri 63130, USA [3] Oregon Health and Science University, Portland, Oregon 97239, USA [4]
| | - Charles Lu
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2]
| | - Michael D McLellan
- The Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | - Mark D M Leiserson
- Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA
| | - Michael C Wendl
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2] Department of Genetics, Washington University, St. Louis, Missouri 63108, USA [3] Department of Mathematics, Washington University, St. Louis, Missouri 63108, USA
| | - Qunyuan Zhang
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2] Department of Genetics, Washington University, St. Louis, Missouri 63108, USA
| | - Daniel C Koboldt
- The Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | - Mingchao Xie
- The Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | - Cyriac Kandoth
- The Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | - Joshua F McMichael
- The Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | | | - David E Larson
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2] Department of Genetics, Washington University, St. Louis, Missouri 63108, USA
| | - Heather K Schmidt
- The Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | | | - Robert S Fulton
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2] Department of Genetics, Washington University, St. Louis, Missouri 63108, USA
| | - Paul T Spellman
- Oregon Health and Science University, Portland, Oregon 97239, USA
| | - Elaine R Mardis
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2] Department of Genetics, Washington University, St. Louis, Missouri 63108, USA [3] Siteman Cancer Center, Washington University, St. Louis, Missouri 63108, USA
| | - Todd E Druley
- 1] Department of Genetics, Washington University, St. Louis, Missouri 63108, USA [2] Department of Pediatrics, Washington University, St. Louis, Missouri 63108, USA
| | - Timothy A Graubert
- 1] Siteman Cancer Center, Washington University, St. Louis, Missouri 63108, USA [2] Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Paul J Goodfellow
- The Ohio State University Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio 43210, USA
| | - Benjamin J Raphael
- Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA
| | - Richard K Wilson
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2] Department of Genetics, Washington University, St. Louis, Missouri 63108, USA [3] Siteman Cancer Center, Washington University, St. Louis, Missouri 63108, USA
| | - Li Ding
- 1] The Genome Institute, Washington University, St. Louis, Missouri 63108, USA [2] Department of Genetics, Washington University, St. Louis, Missouri 63108, USA [3] Siteman Cancer Center, Washington University, St. Louis, Missouri 63108, USA [4] Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| |
Collapse
|
11
|
Boschiero C, Gheyas AA, Ralph HK, Eory L, Paton B, Kuo R, Fulton J, Preisinger R, Kaiser P, Burt DW. Detection and characterization of small insertion and deletion genetic variants in modern layer chicken genomes. BMC Genomics 2015; 16:562. [PMID: 26227840 PMCID: PMC4563830 DOI: 10.1186/s12864-015-1711-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 06/22/2015] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Small insertions and deletions (InDels) constitute the second most abundant class of genetic variants and have been found to be associated with many traits and diseases. The present study reports on the detection and characterisation of about 883 K high quality InDels from the whole-genome analysis of several modern layer chicken lines from diverse breeds. RESULTS To reduce the error rates seen in InDel detection, this study used the consensus set from two InDel-calling packages: SAMtools and Dindel, as well as stringent post-filtering criteria. By analysing sequence data from 163 chickens from 11 commercial and 5 experimental layer lines, this study detected about 883 K high quality consensus InDels with 93% validation rate and an average density of 0.78 InDels/kb over the genome. Certain chromosomes, viz, GGAZ, 16, 22 and 25 showed very low densities of InDels whereas the highest rate was observed on GGA6. In spite of the higher recombination rates on microchromosomes, the InDel density on these chromosomes was generally lower relative to macrochromosomes possibly due to their higher gene density. About 43-87% of the InDels were found to be fixed within each line. The majority of detected InDels (86%) were 1-5 bases and about 63% were non-repetitive in nature while the rest were tandem repeats of various motif types. Functional annotation identified 613 frameshift, 465 non-frameshift and 10 stop-gain/loss InDels. Apart from the frameshift and stopgain/loss InDels that are expected to affect the translation of protein sequences and their biological activity, 33% of the non-frameshift were predicted as evolutionary intolerant with potential impact on protein functions. Moreover, about 2.5% of the InDels coincided with the most-conserved elements previously mapped on the chicken genome and are likely to define functional elements. InDels potentially affecting protein function were found to be enriched for certain gene-classes e.g. those associated with cell proliferation, chromosome and Golgi organization, spermatogenesis, and muscle contraction. CONCLUSIONS The large catalogue of InDels presented in this study along with their associated information such as functional annotation, estimated allele frequency, etc. are expected to serve as a rich resource for application in future research and breeding in the chicken.
Collapse
Affiliation(s)
- Clarissa Boschiero
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK. .,Current Address: Departamento de Zootecnia, University of Sao Paulo/ESALQ, Piracicaba, SP, 13418-900, Brazil.
| | - Almas A Gheyas
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Hannah K Ralph
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Lel Eory
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Bob Paton
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Richard Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | | | | | - Pete Kaiser
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - David W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| |
Collapse
|
12
|
Khoshfetrat SM, Ranjbari M, Shayan M, Mehrgardi MA, Kiani A. Wireless Electrochemiluminescence Bipolar Electrode Array for Visualized Genotyping of Single Nucleotide Polymorphism. Anal Chem 2015; 87:8123-31. [DOI: 10.1021/acs.analchem.5b02515] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
| | - Mitra Ranjbari
- Department of Chemistry, University of Isfahan, Isfahan 81746-73441, Iran
| | - Mohsen Shayan
- Department of Chemistry, University of Isfahan, Isfahan 81746-73441, Iran
| | | | - Abolfazl Kiani
- Department of Chemistry, University of Isfahan, Isfahan 81746-73441, Iran
| |
Collapse
|
13
|
Crago AM, Chmielecki J, Rosenberg M, O'Connor R, Byrne C, Wilder FG, Thorn K, Agius P, Kuk D, Socci ND, Qin LX, Meyerson M, Hameed M, Singer S. Near universal detection of alterations in CTNNB1 and Wnt pathway regulators in desmoid-type fibromatosis by whole-exome sequencing and genomic analysis. Genes Chromosomes Cancer 2015; 54:606-15. [PMID: 26171757 DOI: 10.1002/gcc.22272] [Citation(s) in RCA: 111] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Revised: 05/15/2015] [Accepted: 05/18/2015] [Indexed: 12/17/2022] Open
Abstract
CTNNB1 mutations or APC abnormalities have been observed in ∼85% of desmoids examined by Sanger sequencing and are associated with Wnt/β-catenin activation. We sought to identify molecular aberrations in "wild-type" tumors (those without CTNNB1 or APC alteration) and to determine their prognostic relevance. CTNNB1 was examined by Sanger sequencing in 117 desmoids; a mutation was observed in 101 (86%) and 16 were wild type. Wild-type status did not associate with tumor recurrence. Moreover, in unsupervised clustering based on U133A-derived gene expression profiles, wild-type and mutated tumors clustered together. Whole-exome sequencing of eight of the wild-type desmoids revealed that three had a CTNNB1 mutation that had been undetected by Sanger sequencing. The mutation was found in a mean 16% of reads (vs. 37% for mutations identified by Sanger). Of the other five wild-type tumors sequenced, two had APC loss, two had chromosome 6 loss, and one had mutation of BMI1. The finding of low-frequency CTNNB1 mutation or APC loss in wild-type desmoids was validated in the remaining eight wild-type desmoids; directed miSeq identified low-frequency CTNNB1 mutation in four and comparative genomic hybridization identified APC loss in one. These results demonstrate that mutations affecting CTNNB1 or APC occur more frequently in desmoids than previously recognized (111 of 117; 95%), and designation of wild-type genotype is largely determined by sensitivity of detection methods. Even true CTNNB1 wild-type tumors (determined by next-generation sequencing) may have genomic alterations associated with Wnt activation (chromosome 6 loss/BMI1 mutation), supporting Wnt/β-catenin activation as the common pathway governing desmoid initiation.
Collapse
Affiliation(s)
- Aimee M Crago
- Sarcoma Biology Laboratory and Sarcoma Disease Management Program, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY.,Department of Surgery, Weill Cornell Medical College, New York, NY
| | - Juliann Chmielecki
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA.,Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA
| | - Mara Rosenberg
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA
| | - Rachael O'Connor
- Sarcoma Biology Laboratory and Sarcoma Disease Management Program, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Caitlin Byrne
- Bioinformatics Core, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Fatima G Wilder
- Sarcoma Biology Laboratory and Sarcoma Disease Management Program, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Katherine Thorn
- Sarcoma Biology Laboratory and Sarcoma Disease Management Program, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Phaedra Agius
- Bioinformatics Core, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Deborah Kuk
- Biostatistics and Epidemiology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Nicholas D Socci
- Bioinformatics Core, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Li-Xuan Qin
- Biostatistics and Epidemiology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA.,Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA.,Department of Pathology, Harvard Medical School, Boston, MA
| | - Meera Hameed
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Samuel Singer
- Sarcoma Biology Laboratory and Sarcoma Disease Management Program, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY.,Department of Surgery, Weill Cornell Medical College, New York, NY
| |
Collapse
|
14
|
Ruperao P, Edwards D. Bioinformatics: identification of markers from next-generation sequence data. Methods Mol Biol 2015; 1245:29-47. [PMID: 25373747 DOI: 10.1007/978-1-4939-1966-6_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
With the advent of sequencing technology, next-generation sequencing (NGS) technology has dramatically revolutionized plant genomics. NGS technology combined with new software tools enables the discovery, validation, and assessment of genetic markers on a large scale. Among different markers systems, simple sequence repeats (SSRs) and Single nucleotide polymorphisms (SNPs) are the markers of choice for genetics and plant breeding. SSR markers have been a choice for large-scale characterization of germplasm collections, construction of genetic maps, and QTL identification. Similarly, SNPs are the most abundant genetic variations with higher frequencies throughout the genome of plant species. This chapter discusses various tools available for genome assembly and widely focuses on SSR and SNP marker discovery.
Collapse
Affiliation(s)
- Pradeep Ruperao
- School of Agriculture and Food Sciences, University of Queensland, Brisbane, QLD, Australia
| | | |
Collapse
|
15
|
Huang G, Stock C, Bommeljé CC, Weeda VB, Shah K, Bains S, Buss E, Shaha M, Rechler W, Ramanathan SY, Singh B. SCCRO3 (DCUN1D3) antagonizes the neddylation and oncogenic activity of SCCRO (DCUN1D1). J Biol Chem 2014; 289:34728-42. [PMID: 25349211 DOI: 10.1074/jbc.m114.585505] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
The activity of cullin-RING type ubiquitination E3 ligases is regulated by neddylation, a process analogous to ubiquitination that culminates in covalent attachment of the ubiquitin-like protein Nedd8 to cullins. As a component of the E3 for neddylation, SCCRO/DCUN1D1 plays a key regulatory role in neddylation and, consequently, cullin-RING ligase activity. The essential contribution of SCCRO to neddylation is to promote nuclear translocation of the cullin-ROC1 complex. The presence of a myristoyl sequence in SCCRO3, one of four SCCRO paralogues present in humans that localizes to the membrane, raises questions about its function in neddylation. We found that although SCCRO3 binds to CAND1, cullins, and ROC1, it does not efficiently bind to Ubc12, promote cullin neddylation, or conform to the reaction processivity paradigms, suggesting that SCCRO3 does not have E3 activity. Expression of SCCRO3 inhibits SCCRO-promoted neddylation by sequestering cullins to the membrane, thereby blocking its nuclear translocation. Moreover, SCCRO3 inhibits SCCRO transforming activity. The inhibitory effects of SCCRO3 on SCCRO-promoted neddylation and transformation require both an intact myristoyl sequence and PONY domain, confirming that membrane localization and binding to cullins are required for in vivo functions. Taken together, our findings suggest that SCCRO3 functions as a tumor suppressor by antagonizing the neddylation activity of SCCRO.
Collapse
Affiliation(s)
- Guochang Huang
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Cameron Stock
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Claire C Bommeljé
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Víola B Weeda
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Kushyup Shah
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Sarina Bains
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Elizabeth Buss
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Manish Shaha
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Willi Rechler
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Suresh Y Ramanathan
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| | - Bhuvanesh Singh
- From the Department of Surgery, Laboratory of Epithelial Cancer Biology, Memorial Sloan Kettering Cancer Center, New York, New York 10065
| |
Collapse
|
16
|
Hill JT, Demarest BL, Bisgrove BW, Su YC, Smith M, Yost HJ. Poly peak parser: Method and software for identification of unknown indels using sanger sequencing of polymerase chain reaction products. Dev Dyn 2014; 243:1632-6. [PMID: 25160973 DOI: 10.1002/dvdy.24183] [Citation(s) in RCA: 160] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2014] [Revised: 08/04/2014] [Accepted: 08/22/2014] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Genome editing techniques, including ZFN, TALEN, and CRISPR, have created a need to rapidly screen many F1 individuals to identify carriers of indels and determine the sequences of the mutations. Current techniques require multiple clones of the targeted region to be sequenced for each individual, which is inefficient when many individuals must be analyzed. Direct Sanger sequencing of a polymerase chain reaction (PCR) amplified region surrounding the target site is efficient, but Sanger sequencing genomes heterozygous for an indel results in a string of "double peaks" due to the mismatched region. RESULTS To facilitate indel identification, we developed an online tool called Poly Peak Parser (available at http://yost.genetics.utah.edu/software.php) that is able to separate chromatogram data containing ambiguous base calls into wild-type and mutant allele sequences. This tool allows the nature of the indel to be determined from a single sequencing run per individual performed directly on a PCR product spanning the targeted site, without cloning. CONCLUSIONS The method and algorithm described here facilitate rapid identification and sequence characterization of heterozygous mutant carriers generated by genome editing. Although designed for screening F1 individuals, this tool can also be used to identify heterozygous indels in many contexts.
Collapse
Affiliation(s)
- Jonathon T Hill
- Molecular Medicine Program and Department of Neurobiology & Anatomy, University of Utah School of Medicine, Salt Lake City, Utah
| | | | | | | | | | | |
Collapse
|
17
|
Contrasting roles of histone 3 lysine 27 demethylases in acute lymphoblastic leukaemia. Nature 2014; 514:513-7. [PMID: 25132549 PMCID: PMC4209203 DOI: 10.1038/nature13605] [Citation(s) in RCA: 288] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2013] [Accepted: 06/18/2014] [Indexed: 01/20/2023]
Abstract
T cell acute lymphoblastic leukemia (T-ALL) is a hematological malignancy with dismal overall prognosis, exhibiting up to a 25% relapse rate, mainly due to the absence of non-cytotoxic targeted therapy options. Despite the fact that drugs targeting the function of key epigenetic factors have been approved in the context of hematopoietic disorders1 and the recent identification of mutations affecting chromatin modulators in a variety of leukemias2,3, “epigenetic” drugs are not currently used for TALL treatment. Recently, we described a tumor suppressor role of the polycomb repressive complex 2 (PRC2) in this tumor4. Here we sought out to delineate the role of histone 3 lysine 27 (H3K27) demethylases, JMJD3 and UTX. We show that JMJD3 is essential for initiation and maintenance of disease, as it controls important oncogenic gene targets through the modulation of H3K27 methylation. In contrast, UTX acts a tumor suppressor and frequently genetically inactivated in T-ALL. Moreover, we demonstrate that the small molecule inhibitor GSKJ45 affects T-ALL growth, by targeting JMJD3 activity. These findings show that two proteins with similar enzymatic function can play opposing roles in the context of the same disease and pave the way for the use of a new category of epigenetic inhibitors in hematopoietic malignancies.
Collapse
|
18
|
PrimeIndel: four-prime-number genetic code for indel decryption and sequence read alignment. Clin Chim Acta 2014; 436:1-4. [PMID: 24769229 DOI: 10.1016/j.cca.2014.04.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2013] [Revised: 04/09/2014] [Accepted: 04/09/2014] [Indexed: 11/21/2022]
Abstract
BACKGROUND To decrypt a doubly heterozygous sequence (DHS) in order to define the indel mutation for mutation reporting, an algorithm recursively searching the overlapped nucleotide using an offset of nucleotide positions can decrypt the indel without using a reference sequence. However, as genetic code is letter-based, special computer programs are required to run the decryption algorithm. METHODS The previous text-based algorithm was converted to a number-based algorithm by expressing DNA sequence from a 4-letter genetic code to a 4-prime-number genetic code, i.e., converting A, C, G, T to 2, 3, 5, and 7. This algorithm based on prime-number genetic code is called PrimeIndel and is executable by spreadsheet. Using prime number coded DNA sequence, the overlapped nucleotide between any 2 positions of the DHS is represented by the greatest common divisor (GCD) of the multiplication product of 2 prime numbers. This algorithm can also be used for aligning multiple overlapping sequence reads by in-silico DHS formation. The indel size of the in-silico formed DHS indicates the positions in the paired sequences for correct alignment. RESULTS DHSs were successfully decrypted by the prime number-based algorithm and sequence reads were aligned correctly. CONCLUSIONS DNA sequence expressed in prime numbers can be used for the decryption of DHS and the alignment of sequence reads using a well-known mathematical function GCD of a spreadsheet program. PrimeIndel is a useful tool for mutation reporting in clinical laboratories. The software is downloadable from http://www.patho.hku.hk/staff/list/cwlam.htm.
Collapse
|
19
|
Mehdi khoshfetrat S, Mehrgardi MA. Electrochemical Genotyping of Single-Nucleotide Polymorphisms by using Monobase-Conjugated Modified Nanoparticles. ChemElectroChem 2014. [DOI: 10.1002/celc.201300221] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
20
|
Worthey EA. Analysis and annotation of whole-genome or whole-exome sequencing-derived variants for clinical diagnosis. CURRENT PROTOCOLS IN HUMAN GENETICS 2013; 79:9.24.1-9.24.24. [PMID: 24510652 DOI: 10.1002/0471142905.hg0924s79] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Over the last several years, next-generation sequencing (NGS) has transformed genomic research through substantial advances in technology and reduction in the cost of sequencing, and also in the systems required for analysis of these large volumes of data. This technology is now being used as a standard molecular diagnostic test under particular circumstances in some clinical settings. The advances in sequencing have come so rapidly that the major bottleneck in identification of causal variants is no longer the sequencing but rather the analysis and interpretation. Interpretation of genetic findings in a clinical setting is scarcely a new challenge, but the task is increasingly complex in clinical genome-wide sequencing given the dramatic increase in dataset size and complexity. This increase requires the development of novel or repositioned analysis tools, methodologies, and processes. This unit provides an overview of these items. Specific challenges related to implementation in a clinical setting are discussed.
Collapse
Affiliation(s)
- Elizabeth A Worthey
- Department of Pediatrics, Medical College of Wisconsin, Milwaukee, Wisconsin.,The Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, Wisconsin.,Department of Computer Science, University of Wisconsin, Milwaukee, Wisconsin
| |
Collapse
|
21
|
Abstract
Background SLX4 encodes a DNA repair protein that regulates three structure-specific endonucleases and is necessary for resistance to DNA crosslinking agents, topoisomerase I and poly (ADP-ribose) polymerase (PARP) inhibitors. Recent studies have reported mutations in SLX4 in a new subtype of Fanconi anemia (FA), FA-P. Monoallelic defects in several FA genes are known to confer susceptibility to breast and ovarian cancers. Methods and Results To determine if SLX4 is involved in breast cancer susceptibility, we sequenced the entire SLX4 coding region in 738 (270 Jewish and 468 non-Jewish) breast cancer patients with 2 or more family members affected by breast cancer and no known BRCA1 or BRCA2 mutations. We found a novel nonsense (c.2469G>A, p.W823*) mutation in one patient. In addition, we also found 51 missense variants [13 novel, 23 rare (MAF<0.1%), and 15 common (MAF>1%)], of which 22 (5 novel and 17 rare) were predicted to be damaging by Polyphen2 (score = 0.65–1). We performed functional complementation studies using p.W823* and 5 SLX4 variants (4 novel and 1 rare) cDNAs in a human SLX4-null fibroblast cell line, RA3331. While wild type SLX4 and all the other variants fully rescued the sensitivity to mitomycin C (MMC), campthothecin (CPT), and PARP inhibitor (Olaparib) the p.W823* SLX4 mutant failed to do so. Conclusion Loss-of-function mutations in SLX4 may contribute to the development of breast cancer in very rare cases.
Collapse
|
22
|
Patnala R, Clements J, Batra J. Candidate gene association studies: a comprehensive guide to useful in silico tools. BMC Genet 2013; 14:39. [PMID: 23656885 PMCID: PMC3655892 DOI: 10.1186/1471-2156-14-39] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 04/15/2013] [Indexed: 01/01/2023] Open
Abstract
The candidate gene approach has been a pioneer in the field of genetic epidemiology, identifying risk alleles and their association with clinical traits. With the advent of rapidly changing technology, there has been an explosion of in silico tools available to researchers, giving them fast, efficient resources and reliable strategies important to find casual gene variants for candidate or genome wide association studies (GWAS). In this review, following a description of candidate gene prioritisation, we summarise the approaches to single nucleotide polymorphism (SNP) prioritisation and discuss the tools available to assess functional relevance of the risk variant with consideration to its genomic location. The strategy and the tools discussed are applicable to any study investigating genetic risk factors associated with a particular disease. Some of the tools are also applicable for the functional validation of variants relevant to the era of GWAS and next generation sequencing (NGS).
Collapse
Affiliation(s)
- Radhika Patnala
- Australian Prostate Cancer Research Centre - Queensland, Institute of Health and Biomedical Innovation, Queensland University of Technology, Brisbane, QLD 4059, Australia
| | | | | |
Collapse
|
23
|
|
24
|
Draft genome sequence of the flocculating Zymomonas mobilis strain ZM401 (ATCC 31822). J Bacteriol 2013; 194:7008-9. [PMID: 23209250 DOI: 10.1128/jb.01947-12] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Zymomonas mobilis ZM401 is a flocculating strain which can be self-immobilized within fermentors for a high-cell-density culture to improve ethanol productivity, as well as high-gravity fermentation to increase ethanol titer, due to its improved ethanol tolerance associated with the morphological change. Here, we report its draft genome sequence.
Collapse
|
25
|
Fantin YS, Neverov AD, Favorov AV, Alvarez-Figueroa MV, Braslavskaya SI, Gordukova MA, Karandashova IV, Kuleshov KV, Myznikova AI, Polishchuk MS, Reshetov DA, Voiciehovskaya YA, Mironov AA, Chulanov VP. Base-calling algorithm with vocabulary (BCV) method for analyzing population sequencing chromatograms. PLoS One 2013; 8:e54835. [PMID: 23382983 PMCID: PMC3557274 DOI: 10.1371/journal.pone.0054835] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2012] [Accepted: 12/19/2012] [Indexed: 02/01/2023] Open
Abstract
Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3-14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing.
Collapse
Affiliation(s)
- Yuri S. Fantin
- Federal State Institution of Science Central Research Institute of Epidemiology, Moscow, Russia
| | - Alexey D. Neverov
- Federal State Institution of Science Central Research Institute of Epidemiology, Moscow, Russia
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Alexander V. Favorov
- Department of Oncology, Division of Biostatistics and Bioinformatics, Johns Hopkins University School of Medicine, Baltimore, Maryland, United States of America
- State Research Institute of Genetics and Selection of Industrial Microorganisms GosNIIGenetika, Moscow, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | | | | | - Maria A. Gordukova
- Federal State Institution of Science Central Research Institute of Epidemiology, Moscow, Russia
| | - Inga V. Karandashova
- Federal State Institution of Science Central Research Institute of Epidemiology, Moscow, Russia
| | - Konstantin V. Kuleshov
- Federal State Institution of Science Central Research Institute of Epidemiology, Moscow, Russia
| | - Anna I. Myznikova
- Federal State Institution of Science Central Research Institute of Epidemiology, Moscow, Russia
| | - Maya S. Polishchuk
- Engelhardt Institute of Molecular Biology Russian Academy of Sciences, Moscow, Russia
- Department of Statistics, University of California, Berkeley, California, United States of America
| | - Denis A. Reshetov
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Yana A. Voiciehovskaya
- Federal State Institution of Science Central Research Institute of Epidemiology, Moscow, Russia
| | - Andrei A. Mironov
- Department of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
- Institute for Information Transmission Problems (the Kharkevich Institute), Moscow, Russia
| | - Vladimir P. Chulanov
- Federal State Institution of Science Central Research Institute of Epidemiology, Moscow, Russia
| |
Collapse
|
26
|
Recurrent somatic mutation of FAT1 in multiple human cancers leads to aberrant Wnt activation. Nat Genet 2013; 45:253-61. [PMID: 23354438 PMCID: PMC3729040 DOI: 10.1038/ng.2538] [Citation(s) in RCA: 257] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2012] [Accepted: 01/02/2013] [Indexed: 12/11/2022]
Abstract
Aberrant Wnt signaling can drive cancer development. In many cancer types, the genetic basis of Wnt pathway activation remains incompletely understood. Here, we report recurrent somatic mutations of the Drosophila melanogaster tumor suppressor-related gene FAT1 in glioblastoma (20.5%), colorectal cancer (7.7%), and head and neck cancer (6.7%). FAT1 encodes a cadherin-like protein, which we found is able to potently suppress cancer cell growth in vitro and in vivo by binding β-catenin and antagonizing its nuclear localization. Inactivation of FAT1 via mutation therefore promotes Wnt signaling and tumorigenesis and affects patient survival. Taken together, these data strongly point to FAT1 as a tumor suppressor gene driving loss of chromosome 4q35, a prevalent region of deletion in cancer. Loss of FAT1 function is a frequent event during oncogenesis. These findings address two outstanding issues in cancer biology: the basis of Wnt activation in non-colorectal tumors and the identity of a 4q35 tumor suppressor.
Collapse
|
27
|
Holmfeldt L, Wei L, Diaz-Flores E, Walsh M, Zhang J, Ding L, Payne-Turner D, Churchman M, Andersson A, Chen SC, McCastlain K, Becksfort J, Ma J, Wu G, Patel SN, Heatley SL, Phillips LA, Song G, Easton J, Parker M, Chen X, Rusch M, Boggs K, Vadodaria B, Hedlund E, Drenberg C, Baker S, Pei D, Cheng C, Huether R, Lu C, Fulton RS, Fulton LL, Tabib Y, Dooling DJ, Ochoa K, Minden M, Lewis ID, To LB, Marlton P, Roberts AW, Raca G, Stock W, Neale G, Drexler HG, Dickins RA, Ellison DW, Shurtleff SA, Pui CH, Ribeiro RC, Devidas M, Carroll AJ, Heerema NA, Wood B, Borowitz MJ, Gastier-Foster JM, Raimondi SC, Mardis ER, Wilson RK, Downing JR, Hunger SP, Loh ML, Mullighan CG. The genomic landscape of hypodiploid acute lymphoblastic leukemia. Nat Genet 2013; 45:242-52. [PMID: 23334668 DOI: 10.1038/ng.2532] [Citation(s) in RCA: 482] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 12/21/2012] [Indexed: 12/17/2022]
Abstract
The genetic basis of hypodiploid acute lymphoblastic leukemia (ALL), a subtype of ALL characterized by aneuploidy and poor outcome, is unknown. Genomic profiling of 124 hypodiploid ALL cases, including whole-genome and exome sequencing of 40 cases, identified two subtypes that differ in the severity of aneuploidy, transcriptional profiles and submicroscopic genetic alterations. Near-haploid ALL with 24-31 chromosomes harbor alterations targeting receptor tyrosine kinase signaling and Ras signaling (71%) and the lymphoid transcription factor gene IKZF3 (encoding AIOLOS; 13%). In contrast, low-hypodiploid ALL with 32-39 chromosomes are characterized by alterations in TP53 (91.2%) that are commonly present in nontumor cells, IKZF2 (encoding HELIOS; 53%) and RB1 (41%). Both near-haploid and low-hypodiploid leukemic cells show activation of Ras-signaling and phosphoinositide 3-kinase (PI3K)-signaling pathways and are sensitive to PI3K inhibitors, indicating that these drugs should be explored as a new therapeutic strategy for this aggressive form of leukemia.
Collapse
Affiliation(s)
- Linda Holmfeldt
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
29
|
Hucthagowder V, Meyer R, Mullins C, Nagarajan R, DiPersio JF, Vij R, Tomasson MH, Kulkarni S. Resequencing analysis of the candidate tyrosine kinase and RAS pathway gene families in multiple myeloma. Cancer Genet 2012; 205:474-8. [PMID: 22939401 DOI: 10.1016/j.cancergen.2012.06.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Revised: 06/21/2012] [Accepted: 06/23/2012] [Indexed: 10/27/2022]
Abstract
Multiple myeloma (MM) is an incurable, B-cell malignancy characterized by the clonal proliferation and accumulation of malignant plasma cells in bone marrow. Despite recent advances in the understanding of genomic aberrations, a comprehensive catalogue of clinically actionable mutations in MM is just beginning to emerge. The tyrosine kinase (TK) and RAS oncogenes, which encode important regulators of various signaling pathways, are among the most frequently altered gene families in cancer. To clarify the role of TK and RAS genes in the pathogenesis of MM, we performed a systematic, targeted screening of mutations on prioritized RAS and TK genes, in CD138-sorted bone marrow specimens from 42 untreated patients. We identified a total of 24 mutations in the KRAS, PIK3CA, INSR, LTK, and MERTK genes. In particular, seven novel mutations in addition to known KRAS mutations were observed. Prediction analysis tools PolyPhen and Sorting Intolerant from Tolerant (SIFT) were used to assess the functional significance of these novel mutations. Our analysis predicted that these mutations may have a deleterious effect, resulting in the functional alteration of proteins involved in the pathogenesis of myeloma. While further investigation is needed to determine the functional consequences of these proteins, mutational testing of the RAS and TK genes in larger myeloma cohorts might also be useful to establish the recurrent nature of these mutations.
Collapse
Affiliation(s)
- Vishwanathan Hucthagowder
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data. STATISTICS IN BIOSCIENCES 2012; 5:3-25. [PMID: 24489615 DOI: 10.1007/s12561-012-9067-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Massively parallel sequencing (MPS), since its debut in 2005, has transformed the field of genomic studies. These new sequencing technologies have resulted in the successful identification of causal variants for several rare Mendelian disorders. They have also begun to deliver on their promise to explain some of the missing heritability from genome-wide association studies (GWAS) of complex traits. We anticipate a rapidly growing number of MPS-based studies for a diverse range of applications in the near future. One crucial and nearly inevitable step is to detect SNPs and call genotypes at the detected polymorphic sites from the sequencing data. Here, we review statistical methods that have been proposed in the past five years for this purpose. In addition, we discuss emerging issues and future directions related to SNP detection and genotype calling from MPS data.
Collapse
|
31
|
Chang CT, Tsai CN, Tang CY, Chen CH, Lian JH, Hu CY, Tsai CL, Chao A, Lai CH, Wang TH, Lee YS. Mixed sequence reader: a program for analyzing DNA sequences with heterozygous base calling. ScientificWorldJournal 2012; 2012:365104. [PMID: 22778697 PMCID: PMC3385616 DOI: 10.1100/2012/365104] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2012] [Accepted: 04/01/2012] [Indexed: 01/21/2023] Open
Abstract
The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3.
Collapse
Affiliation(s)
- Chun-Tien Chang
- Department of Computer Science, National Tsing Hua University, Hsin-Chu, Taiwan
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Lu JT, Wang Y, Gibbs RA, Yu F. Characterizing linkage disequilibrium and evaluating imputation power of human genomic insertion-deletion polymorphisms. Genome Biol 2012; 13:R15. [PMID: 22377349 PMCID: PMC3334570 DOI: 10.1186/gb-2012-13-2-r15] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2011] [Revised: 02/14/2012] [Accepted: 02/29/2012] [Indexed: 02/07/2023] Open
Abstract
Background Indels are an important cause of human variation and central to the study of human disease. The 1000 Genomes Project Low-Coverage Pilot identified over 1.3 million indels shorter than 50 bp, of which over 890 were identified as potentially disruptive variants. Yet, despite their ubiquity, the local genomic characteristics of indels remain unexplored. Results Herein we describe population- and minor allele frequency-based differences in linkage disequilibrium and imputation characteristics for indels included in the 1000 Genomes Project Low-Coverage Pilot for the CEU, YRI and CHB+JPT populations. Common indels were well tagged by nearby SNPs in all studied populations, and were also tagged at a similar rate to common SNPs. Both neutral and functionally deleterious common indels were imputed with greater than 95% concordance from HapMap Phase 3 and OMNI SNP sites. Further, 38 to 56% of low frequency indels were tagged by low frequency SNPs. We were able to impute heterozygous low frequency indels with over 50% concordance. Lastly, our analysis also revealed evidence of ascertainment bias. This bias prevents us from extending the applicability of our results to highly polymorphic indels that could not be identified in the Low-Coverage Pilot. Conclusions Although further scope exists to improve the imputation of low frequency indels, our study demonstrates that there are already ample opportunities to retrospectively impute indels for prior genome-wide association studies and to incorporate indel imputation into future case/control studies.
Collapse
Affiliation(s)
- James T Lu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | | | | | | |
Collapse
|
33
|
IDH1 mutation is sufficient to establish the glioma hypermethylator phenotype. Nature 2012; 483:479-83. [PMID: 22343889 DOI: 10.1038/nature10866] [Citation(s) in RCA: 1406] [Impact Index Per Article: 117.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Accepted: 01/17/2012] [Indexed: 02/07/2023]
Abstract
Both genome-wide genetic and epigenetic alterations are fundamentally important for the development of cancers, but the interdependence of these aberrations is poorly understood. Glioblastomas and other cancers with the CpG island methylator phenotype (CIMP) constitute a subset of tumours with extensive epigenomic aberrations and a distinct biology. Glioma CIMP (G-CIMP) is a powerful determinant of tumour pathogenicity, but the molecular basis of G-CIMP remains unresolved. Here we show that mutation of a single gene, isocitrate dehydrogenase 1 (IDH1), establishes G-CIMP by remodelling the methylome. This remodelling results in reorganization of the methylome and transcriptome. Examination of the epigenome of a large set of intermediate-grade gliomas demonstrates a distinct G-CIMP phenotype that is highly dependent on the presence of IDH mutation. Introduction of mutant IDH1 into primary human astrocytes alters specific histone marks, induces extensive DNA hypermethylation, and reshapes the methylome in a fashion that mirrors the changes observed in G-CIMP-positive lower-grade gliomas. Furthermore, the epigenomic alterations resulting from mutant IDH1 activate key gene expression programs, characterize G-CIMP-positive proneural glioblastomas but not other glioblastomas, and are predictive of improved survival. Our findings demonstrate that IDH mutation is the molecular basis of CIMP in gliomas, provide a framework for understanding oncogenesis in these gliomas, and highlight the interplay between genomic and epigenomic changes in human cancers.
Collapse
|
34
|
Oricchio E, Nanjangud G, Wolfe AL, Schatz JH, Mavrakis KJ, Jiang M, Liu X, Bruno J, Heguy A, Olshen AB, Socci ND, Teruya-Feldstein J, Weis-Garcia F, Tam W, Shaknovich R, Melnick A, Himanen JP, Chaganti RSK, Wendel HG. The Eph-receptor A7 is a soluble tumor suppressor for follicular lymphoma. Cell 2011; 147:554-64. [PMID: 22036564 DOI: 10.1016/j.cell.2011.09.035] [Citation(s) in RCA: 134] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2011] [Revised: 06/16/2011] [Accepted: 09/21/2011] [Indexed: 01/28/2023]
Abstract
Insights into cancer genetics can lead to therapeutic opportunities. By cross-referencing chromosomal changes with an unbiased genetic screen we identify the ephrin receptor A7 (EPHA7) as a tumor suppressor in follicular lymphoma (FL). EPHA7 is a target of 6q deletions and inactivated in 72% of FLs. Knockdown of EPHA7 drives lymphoma development in a murine FL model. In analogy to its physiological function in brain development, a soluble splice variant of EPHA7 (EPHA7(TR)) interferes with another Eph-receptor and blocks oncogenic signals in lymphoma cells. Consistent with this drug-like activity, administration of the purified EPHA7(TR) protein produces antitumor effects against xenografted human lymphomas. Further, by fusing EPHA7(TR) to the anti-CD20 antibody (rituximab) we can directly target this tumor suppressor to lymphomas in vivo. Our study attests to the power of combining descriptive tumor genomics with functional screens and reveals EPHA7(TR) as tumor suppressor with immediate therapeutic potential.
Collapse
MESH Headings
- Animals
- Antibodies, Monoclonal, Murine-Derived/therapeutic use
- Cell Line, Tumor
- Chromosomes, Human, Pair 6
- Genes, Tumor Suppressor
- Genomics
- Humans
- Lymphoma, Follicular/drug therapy
- Lymphoma, Follicular/genetics
- Lymphoma, Follicular/metabolism
- Male
- Mice
- Neoplasm Transplantation
- RNA Interference
- Receptor, EphA7/metabolism
- Rituximab
- Transplantation, Heterologous
Collapse
Affiliation(s)
- Elisa Oricchio
- Cancer Biology and Genetics Program, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Paux E, Sourdille P, Mackay I, Feuillet C. Sequence-based marker development in wheat: advances and applications to breeding. Biotechnol Adv 2011; 30:1071-88. [PMID: 21989506 DOI: 10.1016/j.biotechadv.2011.09.015] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Revised: 08/24/2011] [Accepted: 09/25/2011] [Indexed: 01/04/2023]
Abstract
In the past two decades, the wheat community has made remarkable progress in developing molecular resources for breeding. A wide variety of molecular tools has been established to accelerate genetic and physical mapping for facilitating the efficient identification of molecular markers linked to genes and QTL of agronomic interest. Already, wheat breeders are benefiting from a wide range of techniques to follow the introgression of the most favorable alleles in elite material and develop improved varieties. Breeders soon will be able to take advantage of new technological developments based on Next Generation Sequencing. In this paper, we review the molecular toolbox available to wheat scientists and breeders for performing fundamental genomic studies and breeding. Special emphasis is given on the production and detection of single nucleotide polymorphisms (SNPs) that should enable a step change in saturating the wheat genome for more efficient genetic studies and for the development of new selection methods. The perspectives offered by the access to an ordered full genome sequence for further marker development and enhanced precision breeding is also discussed. Finally, we discuss the advantages and limitations of marker-assisted selection for supporting wheat improvement.
Collapse
Affiliation(s)
- Etienne Paux
- INRA-UBP 1095, Genetics Diversity and Ecophysiology of Cereals, 234 Avenue du Brézet, Clermont-Ferrand, France
| | | | | | | |
Collapse
|
36
|
Missirian V, Comai L, Filkov V. Statistical mutation calling from sequenced overlapping DNA pools in TILLING experiments. BMC Bioinformatics 2011. [PMID: 21756356 DOI: 10.1186/1471‐2105‐12‐287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND TILLING (Targeting induced local lesions IN genomes) is an efficient reverse genetics approach for detecting induced mutations in pools of individuals. Combined with the high-throughput of next-generation sequencing technologies, and the resolving power of overlapping pool design, TILLING provides an efficient and economical platform for functional genomics across thousands of organisms. RESULTS We propose a probabilistic method for calling TILLING-induced mutations, and their carriers, from high throughput sequencing data of overlapping population pools, where each individual occurs in two pools. We assign a probability score to each sequence position by applying Bayes' Theorem to a simplified binomial model of sequencing error and expected mutations, taking into account the coverage level. We test the performance of our method on variable quality, high-throughput sequences from wheat and rice mutagenized populations. CONCLUSIONS We show that our method effectively discovers mutations in large populations with sensitivity of 92.5% and specificity of 99.8%. It also outperforms existing SNP detection methods in detecting real mutations, especially at higher levels of coverage variability across sequenced pools, and in lower quality short reads sequence data. The implementation of our method is available from: http://www.cs.ucdavis.edu/filkov/CAMBa/.
Collapse
Affiliation(s)
- Victor Missirian
- Department of Computer Science, UC Davis, 1 Shields Ave., Davis, CA 95616, USA
| | | | | |
Collapse
|
37
|
Missirian V, Comai L, Filkov V. Statistical mutation calling from sequenced overlapping DNA pools in TILLING experiments. BMC Bioinformatics 2011; 12:287. [PMID: 21756356 PMCID: PMC3150297 DOI: 10.1186/1471-2105-12-287] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 07/14/2011] [Indexed: 11/17/2022] Open
Abstract
Background TILLING (Targeting induced local lesions IN genomes) is an efficient reverse genetics approach for detecting induced mutations in pools of individuals. Combined with the high-throughput of next-generation sequencing technologies, and the resolving power of overlapping pool design, TILLING provides an efficient and economical platform for functional genomics across thousands of organisms. Results We propose a probabilistic method for calling TILLING-induced mutations, and their carriers, from high throughput sequencing data of overlapping population pools, where each individual occurs in two pools. We assign a probability score to each sequence position by applying Bayes' Theorem to a simplified binomial model of sequencing error and expected mutations, taking into account the coverage level. We test the performance of our method on variable quality, high-throughput sequences from wheat and rice mutagenized populations. Conclusions We show that our method effectively discovers mutations in large populations with sensitivity of 92.5% and specificity of 99.8%. It also outperforms existing SNP detection methods in detecting real mutations, especially at higher levels of coverage variability across sequenced pools, and in lower quality short reads sequence data. The implementation of our method is available from: http://www.cs.ucdavis.edu/filkov/CAMBa/.
Collapse
Affiliation(s)
- Victor Missirian
- Department of Computer Science, UC Davis, 1 Shields Ave., Davis, CA 95616, USA
| | | | | |
Collapse
|
38
|
Dereeper A, Nicolas S, Le Cunff L, Bacilieri R, Doligez A, Peros JP, Ruiz M, This P. SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects. BMC Bioinformatics 2011; 12:134. [PMID: 21545712 PMCID: PMC3102043 DOI: 10.1186/1471-2105-12-134] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Accepted: 05/05/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. RESULTS In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. CONCLUSIONS Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.
Collapse
Affiliation(s)
- Alexis Dereeper
- Diversity, Genetics and Genomics of grapevine, UMR DIAPC, INRA, Montpellier, France.
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Zhidkov I, Cohen R, Geifman N, Mishmar D, Rubin E. CHILD: a new tool for detecting low-abundance insertions and deletions in standard sequence traces. Nucleic Acids Res 2011; 39:e47. [PMID: 21278161 PMCID: PMC3074157 DOI: 10.1093/nar/gkq1354] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Several methods have been proposed for detecting insertion/deletions (indels) from chromatograms generated by Sanger sequencing. However, most such methods are unsuitable when the mutated and normal variants occur at unequal ratios, such as is expected to be the case in cancer, with organellar DNA or with alternatively spliced RNAs. In addition, the current methods do not provide robust estimates of the statistical confidence of their results, and the sensitivity of this approach has not been rigorously evaluated. Here, we present CHILD, a tool specifically designed for indel detection in mixtures where one variant is rare. CHILD makes use of standard sequence alignment statistics to evaluate the significance of the results. The sensitivity of CHILD was tested by sequencing controlled mixtures of deleted and undeleted plasmids at various ratios. Our results indicate that CHILD can identify deleted molecules present as just 5% of the mixture. Notably, the results were plasmid/primer-specific; for some primers and/or plasmids, the deleted molecule was only detected when it comprised 10% or more of the mixture. The false positive rate was estimated to be lower than 0.4%. CHILD was implemented as a user-oriented web site, providing a sensitive and experimentally validated method for the detection of rare indel-carrying molecules in common Sanger sequence reads.
Collapse
Affiliation(s)
- Ilia Zhidkov
- National Institute for Biotechnology in the Negev, Dept. of Life Sciences, Dept. of Computer Sciences and Shraga Segal Dept. of Microbiology and Immunology, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Raphael Cohen
- National Institute for Biotechnology in the Negev, Dept. of Life Sciences, Dept. of Computer Sciences and Shraga Segal Dept. of Microbiology and Immunology, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Nophar Geifman
- National Institute for Biotechnology in the Negev, Dept. of Life Sciences, Dept. of Computer Sciences and Shraga Segal Dept. of Microbiology and Immunology, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Dan Mishmar
- National Institute for Biotechnology in the Negev, Dept. of Life Sciences, Dept. of Computer Sciences and Shraga Segal Dept. of Microbiology and Immunology, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
| | - Eitan Rubin
- National Institute for Biotechnology in the Negev, Dept. of Life Sciences, Dept. of Computer Sciences and Shraga Segal Dept. of Microbiology and Immunology, Ben Gurion University of the Negev, Beer Sheva 84105, Israel
- *To whom correspondence should be addressed. Tel: +972 8 6477180; Fax: +972 8 6479197;
| |
Collapse
|
40
|
Tongsima S, Assawamakin A, Piriyapongsa J, Shaw PJ. Comparative view of in silico DNA sequencing analysis tools. Methods Mol Biol 2011; 760:207-221. [PMID: 21779999 DOI: 10.1007/978-1-61779-176-5_13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
DNA sequencing is an important tool for discovery of genetic variants. The task of detecting single-nucleotide variants is complicated by noise and sequencing artifacts in sequencing data. Several in silico tools have been developed to assist this process. These tools interpret the raw chromatogram data and perform a specialized base-calling and quality-control assessment procedure to identify variants. The approach used to identify variants differs between the tools, with some specific to SNPs and other for Indels. The choice of a tool is guided by the design of the sequencing project and the nature of the variant to be discovered. In this chapter, these tools are compared to facilitate the choice of a tool used for variant discovery.
Collapse
Affiliation(s)
- Sissades Tongsima
- National Center for Genetic Engineering and Biotechnology (BIOTEC), 113 Thailand Science Park, Pahonyothin Road, Klong 1, Klong Luang, 12120, Pathum Thani, Thailand.
| | | | | | | |
Collapse
|
41
|
Sana ME, Iascone M, Marchetti D, Palatini J, Galasso M, Volinia S. GAMES identifies and annotates mutations in next-generation sequencing projects. Bioinformatics 2010; 27:9-13. [PMID: 20971986 DOI: 10.1093/bioinformatics/btq603] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Next-generation sequencing (NGS) methods have the potential for changing the landscape of biomedical science, but at the same time pose several problems in analysis and interpretation. Currently, there are many commercial and public software packages that analyze NGS data. However, the limitations of these applications include output which is insufficiently annotated and of difficult functional comprehension to end users. RESULTS We developed GAMES (Genomic Analysis of Mutations Extracted by Sequencing), a pipeline aiming to serve as an efficient middleman between data deluge and investigators. GAMES attains multiple levels of filtering and annotation, such as aligning the reads to a reference genome, performing quality control and mutational analysis, integrating results with genome annotations and sorting each mismatch/deletion according to a range of parameters. Variations are matched to known polymorphisms. The prediction of functional mutations is achieved by using different approaches. Overall GAMES enables an effective complexity reduction in large-scale DNA-sequencing projects. AVAILABILITY GAMES is available free of charge to academic users and may be obtained from http://aqua.unife.it/GAMES.
Collapse
Affiliation(s)
- Maria Elena Sana
- DAMA, Data Mining for Analysis of DNA, Department of Morphology and Embryology and TecnoPolo for Life Sciences, University of Ferrara, Ferrara, Italy
| | | | | | | | | | | |
Collapse
|
42
|
Mullaney JM, Mills RE, Pittard WS, Devine SE. Small insertions and deletions (INDELs) in human genomes. Hum Mol Genet 2010; 19:R131-6. [PMID: 20858594 DOI: 10.1093/hmg/ddq400] [Citation(s) in RCA: 214] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
In this review, we focus on progress that has been made with detecting small insertions and deletions (INDELs) in human genomes. Over the past decade, several million small INDELs have been discovered in human populations and personal genomes. The amount of genetic variation that is caused by these small INDELs is substantial. The number of INDELs in human genomes is second only to the number of single nucleotide polymorphisms (SNPs), and, in terms of base pairs of variation, INDELs cause similar levels of variation as SNPs. Many of these INDELs map to functionally important sites within human genes, and thus, are likely to influence human traits and diseases. Therefore, small INDEL variation will play a prominent role in personalized medicine.
Collapse
Affiliation(s)
- Julienne M Mullaney
- Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore Street, 615 BioPark II, Baltimore, MD 21201, USA
| | | | | | | |
Collapse
|
43
|
Kelkar YD, Strubczewski N, Hile SE, Chiaromonte F, Eckert KA, Makova KD. What is a microsatellite: a computational and experimental definition based upon repeat mutational behavior at A/T and GT/AC repeats. Genome Biol Evol 2010; 2:620-35. [PMID: 20668018 PMCID: PMC2940325 DOI: 10.1093/gbe/evq046] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Microsatellites are abundant in eukaryotic genomes and have high rates of strand slippage-induced repeat number alterations. They are popular genetic markers, and their mutations are associated with numerous neurological diseases. However, the minimal number of repeats required to constitute a microsatellite has been debated, and a definition of a microsatellite that considers its mutational behavior has been lacking. To define a microsatellite, we investigated slippage dynamics for a range of repeat sizes, utilizing two approaches. Computationally, we assessed length polymorphism at repeat loci in ten ENCODE regions resequenced in four human populations, assuming that the occurrence of polymorphism reflects strand slippage rates. Experimentally, we determined the in vitro DNA polymerase-mediated strand slippage error rates as a function of repeat number. In both approaches, we compared strand slippage rates at tandem repeats with the background slippage rates. We observed two distinct modes of mutational behavior. At small repeat numbers, slippage rates were low and indistinguishable from background measurements. A marked transition in mutability was observed as the repeat array lengthened, such that slippage rates at large repeat numbers were significantly higher than the background rates. For both mononucleotide and dinucleotide microsatellites studied, the transition length corresponded to a similar number of nucleotides (approximately 10). Thus, microsatellite threshold is determined not by the presence/absence of strand slippage at repeats but by an abrupt alteration in slippage rates relative to background. These findings have implications for understanding microsatellite mutagenesis, standardization of genome-wide microsatellite analyses, and predicting polymorphism levels of individual microsatellite loci.
Collapse
|
44
|
Preliminary assessment of COSII gene diversity in lulo and a relative species: initial identification of genes potentially associated with domestication. Gene 2010; 458:27-36. [PMID: 20302924 DOI: 10.1016/j.gene.2010.03.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2009] [Revised: 03/08/2010] [Accepted: 03/11/2010] [Indexed: 11/21/2022]
Abstract
Among the Solanum genus, Solanum quitoense Lam. (lulo) is a promising species of Neotropical Solanaceae to become a premium crop in international markets. Wild relatives of S. quitoense are a source of desirable characteristics to be exploited for genetic improvement. To enhance the understanding of and access to the genetic diversity in landrace and wild relatives of lulo, we estimated the relative sequence diversity among them and their wild relative Solanum hirtum. With the use of COSII markers, we established that diversity of cultivated lulo (S. quitoense) is significantly lower than that of its wild relative S. hirtum. In the same way, we found that diversity of lulo is similar to that previously reported for tomato, while the diversity of S. hirtum is comparable to that of other wild relatives of cultivated plants. Our results suggest that high variability of some genes associated to abiotic stress response and pathogen resistance has been favored in wild and cultivated lulo plants.
Collapse
|
45
|
A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput Biol 2010; 6:e1000734. [PMID: 20376170 PMCID: PMC2845654 DOI: 10.1371/journal.pcbi.1000734] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2009] [Accepted: 03/03/2010] [Indexed: 02/05/2023] Open
Abstract
To understand whether any human-specific new genes may be associated with human brain functions, we computationally screened the genetic vulnerable factors identified through Genome-Wide Association Studies and linkage analyses of nicotine addiction and found one human-specific de novo protein-coding gene, FLJ33706 (alternative gene symbol C20orf203). Cross-species analysis revealed interesting evolutionary paths of how this gene had originated from noncoding DNA sequences: insertion of repeat elements especially Alu contributed to the formation of the first coding exon and six standard splice junctions on the branch leading to humans and chimpanzees, and two subsequent substitutions in the human lineage escaped two stop codons and created an open reading frame of 194 amino acids. We experimentally verified FLJ33706's mRNA and protein expression in the brain. Real-Time PCR in multiple tissues demonstrated that FLJ33706 was most abundantly expressed in brain. Human polymorphism data suggested that FLJ33706 encodes a protein under purifying selection. A specifically designed antibody detected its protein expression across human cortex, cerebellum and midbrain. Immunohistochemistry study in normal human brain cortex revealed the localization of FLJ33706 protein in neurons. Elevated expressions of FLJ33706 were detected in Alzheimer's brain samples, suggesting the role of this novel gene in human-specific pathogenesis of Alzheimer's disease. FLJ33706 provided the strongest evidence so far that human-specific de novo genes can have protein-coding potential and differential protein expression, and be involved in human brain functions.
Collapse
|
46
|
Dalca AV, Brudno M. Genome variation discovery with high-throughput sequencing data. Brief Bioinform 2010; 11:3-14. [PMID: 20053733 DOI: 10.1093/bib/bbp058] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The advent of high-throughput sequencing (HTS) technologies is enabling sequencing of human genomes at a significantly lower cost. The availability of these genomes is hoped to enable novel medical diagnostics and treatment, specific to the individual, thus launching the era of personalized medicine. The data currently generated by HTS machines require extensive computational analysis in order to identify genomic variants present in the sequenced individual. In this paper, we overview HTS technologies and discuss several of the plethora of algorithms and tools designed to analyze HTS data, including algorithms for read mapping, as well as methods for identification of single-nucleotide polymorphisms, insertions/deletions and large-scale structural variants and copy-number variants from these mappings.
Collapse
|
47
|
Veeriah S, Taylor BS, Meng S, Fang F, Yilmaz E, Vivanco I, Janakiraman M, Schultz N, Hanrahan AJ, Pao W, Ladanyi M, Sander C, Heguy A, Holland EC, Paty PB, Mischel PS, Liau L, Cloughesy TF, Mellinghoff IK, Solit DB, Chan TA. Somatic mutations of the Parkinson's disease-associated gene PARK2 in glioblastoma and other human malignancies. Nat Genet 2009; 42:77-82. [PMID: 19946270 DOI: 10.1038/ng.491] [Citation(s) in RCA: 297] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Accepted: 10/23/2009] [Indexed: 11/09/2022]
Abstract
Mutation of the gene PARK2, which encodes an E3 ubiquitin ligase, is the most common cause of early-onset Parkinson's disease. In a search for multisite tumor suppressors, we identified PARK2 as a frequently targeted gene on chromosome 6q25.2-q27 in cancer. Here we describe inactivating somatic mutations and frequent intragenic deletions of PARK2 in human malignancies. The PARK2 mutations in cancer occur in the same domains, and sometimes at the same residues, as the germline mutations causing familial Parkinson's disease. Cancer-specific mutations abrogate the growth-suppressive effects of the PARK2 protein. PARK2 mutations in cancer decrease PARK2's E3 ligase activity, compromising its ability to ubiquitinate cyclin E and resulting in mitotic instability. These data strongly point to PARK2 as a tumor suppressor on 6q25.2-q27. Thus, PARK2, a gene that causes neuronal dysfunction when mutated in the germline, may instead contribute to oncogenesis when altered in non-neuronal somatic cells.
Collapse
Affiliation(s)
- Selvaraju Veeriah
- Human Oncology and Pathogenesis Program, Memorial Sloan-Kettering Cancer Center, New York, New York, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Abstract
There is a growing gap between the generation of massively parallel sequencing output and the ability to process and analyze the resulting data. New users are left to navigate a bewildering maze of base calling, alignment, assembly and analysis tools with often incomplete documentation and no idea how to compare and validate their outputs. Bridging this gap is essential, or the coveted $1,000 genome will come with a $20,000 analysis price tag.
Collapse
|
49
|
Wendl MC, Wilson RK. The theory of discovering rare variants via DNA sequencing. BMC Genomics 2009; 10:485. [PMID: 19843339 PMCID: PMC2778663 DOI: 10.1186/1471-2164-10-485] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2009] [Accepted: 10/20/2009] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Rare population variants are known to have important biomedical implications, but their systematic discovery has only recently been enabled by advances in DNA sequencing. The design process of a discovery project remains formidable, being limited to ad hoc mixtures of extensive computer simulation and pilot sequencing. Here, the task is examined from a general mathematical perspective. RESULTS We pose and solve the population sequencing design problem and subsequently apply standard optimization techniques that maximize the discovery probability. Emphasis is placed on cases whose discovery thresholds place them within reach of current technologies. We find that parameter values characteristic of rare-variant projects lead to a general, yet remarkably simple set of optimization rules. Specifically, optimal processing occurs at constant values of the per-sample redundancy, refuting current notions that sample size should be selected outright. Optimal project-wide redundancy and sample size are then shown to be inversely proportional to the desired variant frequency. A second family of constants governs these relationships, permitting one to immediately establish the most efficient settings for a given set of discovery conditions. Our results largely concur with the empirical design of the Thousand Genomes Project, though they furnish some additional refinement. CONCLUSION The optimization principles reported here dramatically simplify the design process and should be broadly useful as rare-variant projects become both more important and routine in the future.
Collapse
Affiliation(s)
- Michael C Wendl
- The Genome Center and Department of Genetics, Washington University, St. Louis MO 63108, USA
| | - Richard K Wilson
- The Genome Center and Department of Genetics, Washington University, St. Louis MO 63108, USA
| |
Collapse
|
50
|
Zhang Y, Lu S, Zhao S, Zheng X, Long M, Wei L. Positive selection for the male functionality of a co-retroposed gene in the hominoids. BMC Evol Biol 2009; 9:252. [PMID: 19832993 PMCID: PMC2773790 DOI: 10.1186/1471-2148-9-252] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2009] [Accepted: 10/15/2009] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND New genes generated by retroposition are widespread in humans and other mammalian species. Usually, this process copies a single parental gene and inserts it into a distant genomic location. However, retroposition of two adjacent parental genes, i.e. co-retroposition, had not been reported until the hominoid chimeric gene, PIPSL, was identified recently. It was shown how two genes linked in tandem (phosphatidylinositol-4-phosphate 5-kinase, type I, alpha, PIP5K1A and proteasome 26S subunit, non-ATPase, 4, PSMD4) could be co-retroposed from a single RNA molecule to form this novel chimeric gene. However, understanding of the origination and biological function of PIPSL requires determination of the coding potential of this gene as well as the evolutionary forces acting on its hominoid copies. RESULTS We tackled these problems by analyzing the evolutionary signature in both within-species variation and between species divergence in the sequence and structure of the gene. We revealed a significant evolutionary signature: the coding region has significantly lower sequence variation, especially insertions and deletions, suggesting that the human copy may encode a protein. Moreover, a survey across five different hominoid species revealed that all adaptive changes of PSMD4-derived regions occurred on branches leading to human and chimp rather than other hominoid lineages. Finally, computational analysis suggests testis-specific transcription of PIPSL is regulated by tissue-dependent methylation rather than some transcriptional leakage. CONCLUSION Therefore, this set of analyses showed that PIPSL is an extraordinary co-retroposed protein-coding gene that may participate in the male functions of humans and its close relatives.
Collapse
Affiliation(s)
- Yong Zhang
- Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, 100871, PR China.
| | | | | | | | | | | |
Collapse
|