1
|
Dahn HA, Mountcastle J, Balacco J, Winkler S, Bista I, Schmitt AD, Pettersson OV, Formenti G, Oliver K, Smith M, Tan W, Kraus A, Mac S, Komoroske LM, Lama T, Crawford AJ, Murphy RW, Brown S, Scott AF, Morin PA, Jarvis ED, Fedrigo O. Benchmarking ultra-high molecular weight DNA preservation methods for long-read and long-range sequencing. Gigascience 2022; 11:6659719. [PMID: 35946988 PMCID: PMC9364683 DOI: 10.1093/gigascience/giac068] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 01/26/2022] [Accepted: 06/16/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Studies in vertebrate genomics require sampling from a broad range of tissue types, taxa, and localities. Recent advancements in long-read and long-range genome sequencing have made it possible to produce high-quality chromosome-level genome assemblies for almost any organism. However, adequate tissue preservation for the requisite ultra-high molecular weight DNA (uHMW DNA) remains a major challenge. Here we present a comparative study of preservation methods for field and laboratory tissue sampling, across vertebrate classes and different tissue types. RESULTS We find that storage temperature was the strongest predictor of uHMW fragment lengths. While immediate flash-freezing remains the sample preservation gold standard, samples preserved in 95% EtOH or 20-25% DMSO-EDTA showed little fragment length degradation when stored at 4°C for 6 hours. Samples in 95% EtOH or 20-25% DMSO-EDTA kept at 4°C for 1 week after dissection still yielded adequate amounts of uHMW DNA for most applications. Tissue type was a significant predictor of total DNA yield but not fragment length. Preservation solution had a smaller but significant influence on both fragment length and DNA yield. CONCLUSION We provide sample preservation guidelines that ensure sufficient DNA integrity and amount required for use with long-read and long-range sequencing technologies across vertebrates. Our best practices generated the uHMW DNA needed for the high-quality reference genomes for phase 1 of the Vertebrate Genomes Project, whose ultimate mission is to generate chromosome-level reference genome assemblies of all ∼70,000 extant vertebrate species.
Collapse
|
2
|
Mohr DW, Gaughran SJ, Paschall J, Naguib A, Pang AWC, Dudchenko O, Aiden EL, Church DM, Scott AF. A Chromosome-Length Assembly of the Hawaiian Monk Seal (Neomonachus schauinslandi): A History of “Genetic Purging” and Genomic Stability. Genes (Basel) 2022; 13:genes13071270. [PMID: 35886053 PMCID: PMC9323584 DOI: 10.3390/genes13071270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 06/29/2022] [Accepted: 07/07/2022] [Indexed: 12/04/2022] Open
Abstract
The Hawaiian monk seal (HMS) is the single extant species of tropical earless seals of the genus Neomonachus. The species survived a severe bottleneck in the late 19th century and experienced subsequent population declines until becoming the subject of a NOAA-led species recovery effort beginning in 1976 when the population was fewer than 1000 animals. Like other recovering species, the Hawaiian monk seal has been reported to have reduced genetic heterogeneity due to the bottleneck and subsequent inbreeding. Here, we report a chromosomal reference assembly for a male animal produced using a variety of methods. The final assembly consisted of 16 autosomes, an X, and portions of the Y chromosomes. We compared variants in this animal to other HMS and to a frequently sequenced human sample, confirming about 12% of the variation seen in man. To confirm that the reference animal was representative of the HMS, we compared his sequence to that of 10 other individuals and noted similarly low variation in all. Variation in the major histocompatibility (MHC) genes was nearly absent compared to the orthologous human loci. Demographic analysis predicts that Hawaiian monk seals have had a long history of small populations preceding the bottleneck, and their current low levels of heterozygosity may indicate specialization to a stable environment. When we compared our reference assembly to that of other species, we observed significant conservation of chromosomal architecture with other pinnipeds, especially other phocids. This reference should be a useful tool for future evolutionary studies as well as the long-term management of this species.
Collapse
|
3
|
Tamazian G, Dobrynin P, Zhuk A, Zhernakova DV, Perelman PL, Serdyukova NA, Graphodatsky AS, Komissarov A, Kliver S, Cherkasov N, Scott AF, Mohr DW, Koepfli KP, O'Brien SJ, Krasheninnikova K. Draft de novo Genome Assembly of the Elusive Jaguarundi, Puma yagouaroundi. J Hered 2021; 112:540-548. [PMID: 34146095 PMCID: PMC8558579 DOI: 10.1093/jhered/esab036] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 06/17/2021] [Indexed: 11/12/2022] Open
Abstract
The Puma lineage within the family Felidae consists of 3 species that last shared a common ancestor around 4.9 million years ago. Whole-genome sequences of 2 species from the lineage were previously reported: the cheetah (Acinonyx jubatus) and the mountain lion (Puma concolor). The present report describes a whole-genome assembly of the remaining species, the jaguarundi (Puma yagouaroundi). We sequenced the genome of a male jaguarundi with 10X Genomics linked reads and assembled the whole-genome sequence. The assembled genome contains a series of scaffolds that reach the length of chromosome arms and is similar in scaffold contiguity to the genome assemblies of cheetah and puma, with a contig N50 = 100.2 kbp and a scaffold N50 = 49.27 Mbp. We assessed the assembled sequence of the jaguarundi genome using BUSCO, aligned reads of the sequenced individual and another published female jaguarundi to the assembled genome, annotated protein-coding genes, repeats, genomic variants and their effects with respect to the protein-coding genes, and analyzed differences of the 2 jaguarundis from the reference mitochondrial genome. The jaguarundi genome assembly and its annotation were compared in quality, variants, and features to the previously reported genome assemblies of puma and cheetah. Computational analyzes used in the study were implemented in transparent and reproducible way to allow their further reuse and modification.
Collapse
|
4
|
Scott AF, Deery E, Lawrence AD, Warren MJ. Plasmodium falciparum hydroxymethylbilane synthase does not house any cosynthase activity within the haem biosynthetic pathway. MICROBIOLOGY-SGM 2021; 167. [PMID: 34661520 PMCID: PMC8698207 DOI: 10.1099/mic.0.001095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Uroporphyrinogen III, the universal progenitor of macrocyclic, modified tetrapyrroles, is produced from aminolaevulinic acid (ALA) by a conserved pathway involving three enzymes: porphobilinogen synthase (PBGS), hydroxymethylbilane synthase (HmbS) and uroporphyrinogen III synthase (UroS). The gene encoding uroporphyrinogen III synthase has not yet been identified in Plasmodium falciparum, but it has been suggested that this activity is housed inside a bifunctional hybroxymethylbilane synthase (HmbS). Additionally, an unknown protein encoded by PF3D7_1247600 has also been predicted to possess UroS activity. In this study it is demonstrated that neither of these proteins possess UroS activity and the real UroS remains to be identified. This was demonstrated by the failure of codon-optimized genes to complement a defined Escherichia coli hemD− mutant (SASZ31) deficient in UroS activity. Furthermore, HPLC analysis of the oxidized reaction product from recombinant, purified P. falciparum HmbS showed that only uroporphyrin I could be detected (corresponding to hydroxymethylbilane production). No uroporphyrin III was detected, showing that P. falciparum HmbS does not have UroS activity and can only catalyze the formation of hydroxymethylbilane from porphobilinogen.
Collapse
|
5
|
Scott AF, Amberger JS. The genes of OMIM: A legacy of Victor McKusick. Am J Med Genet A 2021; 185:3276-3283. [PMID: 34214258 DOI: 10.1002/ajmg.a.62415] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/02/2021] [Accepted: 06/05/2021] [Indexed: 01/31/2023]
|
6
|
Hamosh A, Amberger JS, Bocchini C, Scott AF, Rasmussen SA. Online Mendelian Inheritance in Man (OMIM®): Victor McKusick's magnum opus. Am J Med Genet A 2021; 185:3259-3265. [PMID: 34169650 PMCID: PMC8596664 DOI: 10.1002/ajmg.a.62407] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 06/05/2021] [Accepted: 06/11/2021] [Indexed: 11/16/2022]
Abstract
Victor McKusick's many contributions to medicine are legendary, but his magnum opus is Mendelian Inheritance in Man (MIM), his catalog of Mendelian phenotypes and their associated genes. The catalog, originally published in 1966 in book form, became available on the internet as Online Mendelian Inheritance in Man (OMIM®) in 1987. The first of 12 editions of MIM included 1486 entries; this number has increased to over 25,000 entries in OMIM as of April 2021, which demonstrates the growth of knowledge about Mendelian phenotypes and their genes through the years. OMIM now has over 20,000 unique users a day, including users from every country in the world. Many of the early decisions made by McKusick, such as to maintain MIM data in a computer‐readable format, to separate phenotype entries from those for genes, and to give phenotypes and genes MIM numbers, have proved essential to the long‐term utility and flexibility of his catalog. Based on his extensive knowledge of genetics and vision of its future in the field of medicine, he developed a framework for the capture and summary of information from the published literature on phenotypes and their associated genes; this catalog continues to serve as an indispensable resource to the genetics community.
Collapse
|
7
|
Zhang W, Venkataraghavan S, Hetmanski JB, Leslie EJ, Marazita ML, Feingold E, Weinberg SM, Ruczinski I, Taub MA, Scott AF, Ray D, Beaty TH. Detecting Gene-Environment Interaction for Maternal Exposures Using Case-Parent Trios Ascertained Through a Case With Non-Syndromic Orofacial Cleft. Front Cell Dev Biol 2021; 9:621018. [PMID: 33937227 PMCID: PMC8085423 DOI: 10.3389/fcell.2021.621018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 03/15/2021] [Indexed: 12/13/2022] Open
Abstract
Two large studies of case-parent trios ascertained through a proband with a non-syndromic orofacial cleft (OFC, which includes cleft lip and palate, cleft lip alone, or cleft palate alone) were used to test for possible gene-environment (G × E) interaction between genome-wide markers (both observed and imputed) and self-reported maternal exposure to smoking, alcohol consumption, and multivitamin supplementation during pregnancy. The parent studies were as follows: GENEVA, which included 1,939 case-parent trios recruited largely through treatment centers in Europe, the United States, and Asia, and 1,443 case-parent trios from the Pittsburgh Orofacial Cleft Study (POFC) also ascertained through a proband with an OFC including three major racial/ethnic groups (European, Asian, and Latin American). Exposure rates to these environmental risk factors (maternal smoking, alcohol consumption, and multivitamin supplementation) varied across studies and among racial/ethnic groups, creating substantial differences in power to detect G × E interaction, but the trio design should minimize spurious results due to population stratification. The GENEVA and POFC studies were analyzed separately, and a meta-analysis was conducted across both studies to test for G × E interaction using the 2 df test of gene and G × E interaction and the 1 df test for G × E interaction alone. The 2 df test confirmed effects for several recognized risk genes, suggesting modest G × E effects. This analysis did reveal suggestive evidence for G × Vitamin interaction for CASP9 on 1p36 located about 3 Mb from PAX7, a recognized risk gene. Several regions gave suggestive evidence of G × E interaction in the 1 df test. For example, for G × Smoking interaction, the 1 df test suggested markers in MUSK on 9q31.3 from meta-analysis. Markers near SLCO3A1 also showed suggestive evidence in the 1 df test for G × Alcohol interaction, and rs41117 near RETREG1 (a.k.a. FAM134B) also gave suggestive significance in the meta-analysis of the 1 df test for G × Vitamin interaction. While it remains quite difficult to obtain definitive evidence for G × E interaction in genome-wide studies, perhaps due to small effect sizes of individual genes combined with low exposure rates, this analysis of two large case-parent trio studies argues for considering possible G × E interaction in any comprehensive study of complex and heterogeneous disorders such as OFC.
Collapse
|
8
|
Player RA, Forsyth ER, Verratti KJ, Mohr DW, Scott AF, Bradburne CE. A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS. Life Sci Alliance 2021; 4:4/4/e202000902. [PMID: 33514656 PMCID: PMC7898556 DOI: 10.26508/lsa.202000902] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 01/07/2021] [Accepted: 01/13/2021] [Indexed: 11/24/2022] Open
Abstract
Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly used. Here, we present a phased reference genome for Canis lupus familiaris using high molecular weight DNA-sequencing technologies. We tested wet laboratory and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The de novo assembly required eight Oxford Nanopore R9.4 flowcells (∼23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (∼88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K (USD). Mapping of short-read data from 10 Labrador Retrievers against this reference resulted in 1% more aligned reads versus the current reference (CanFam3.1, P < 0.001), and a 15% reduction of variant calls, increasing the chance of identifying true, low-effect size variants in a genome-wide association studies. We believe that by incorporating the cost to produce a full genome assembly into any large-scale genotyping project, an investigator can improve study power, decrease costs, and optimize the overall scientific value of their study.
Collapse
|
9
|
Humble E, Dobrynin P, Senn H, Chuven J, Scott AF, Mohr DW, Dudchenko O, Omer AD, Colaric Z, Lieberman Aiden E, Al Dhaheri SS, Wildt D, Oliaji S, Tamazian G, Pukazhenthi B, Ogden R, Koepfli KP. Chromosomal-level genome assembly of the scimitar-horned oryx: Insights into diversity and demography of a species extinct in the wild. Mol Ecol Resour 2020; 20:1668-1681. [PMID: 32365406 DOI: 10.1111/1755-0998.13181] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/09/2020] [Accepted: 04/24/2020] [Indexed: 01/04/2023]
Abstract
Captive populations provide a valuable insurance against extinctions in the wild. However, they are also vulnerable to the negative impacts of inbreeding, selection and drift. Genetic information is therefore considered a critical aspect of conservation management. Recent developments in sequencing technologies have the potential to improve the outcomes of management programmes; however, the transfer of these approaches to applied conservation has been slow. The scimitar-horned oryx (Oryx dammah) is a North African antelope that has been extinct in the wild since the early 1980s and is the focus of a large-scale and long-term reintroduction project. To enable the selection of suitable founder individuals, facilitate post-release monitoring and improve captive breeding management, comprehensive genomic resources are required. Here, we used 10X Chromium sequencing together with Hi-C contact mapping to develop a chromosomal-level genome assembly for the species. The resulting assembly contained 29 chromosomes with a scaffold N50 of 100.4 Mb, and displayed strong chromosomal synteny with the cattle genome. Using resequencing data from six additional individuals, we demonstrated relatively high genetic diversity in the scimitar-horned oryx compared to other mammals, despite it having experienced a strong founding event in captivity. Additionally, the level of diversity across populations varied according to management strategy. Finally, we uncovered a dynamic demographic history that coincided with periods of climate variation during the Pleistocene. Overall, our study provides a clear example of how genomic data can uncover valuable insights into captive populations and contributes important resources to guide future management decisions of an endangered species.
Collapse
|
10
|
Scott AF, Luk LY, Tuñón I, Moliner V, Allemann RK. Heavy Enzymes and the Rational Redesign of Protein Catalysts. Chembiochem 2019; 20:2807-2812. [PMID: 31016852 PMCID: PMC6900096 DOI: 10.1002/cbic.201900134] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Indexed: 11/21/2022]
Abstract
An unsolved mystery in biology concerns the link between enzyme catalysis and protein motions. Comparison between isotopically labelled "heavy" dihydrofolate reductases and their natural-abundance counterparts has suggested that the coupling of protein motions to the chemistry of the catalysed reaction is minimised in the case of hydride transfer. In alcohol dehydrogenases, unnatural, bulky substrates that induce additional electrostatic rearrangements of the active site enhance coupled motions. This finding could provide a new route to engineering enzymes with altered substrate specificity, because amino acid residues responsible for dynamic coupling with a given substrate present as hotspots for mutagenesis. Detailed understanding of the biophysics of enzyme catalysis based on insights gained from analysis of "heavy" enzymes might eventually allow routine engineering of enzymes to catalyse reactions of choice.
Collapse
|
11
|
Fu JM, Leslie EJ, Scott AF, Murray JC, Marazita ML, Beaty TH, Scharpf RB, Ruczinski I. Detection of de novo copy number deletions from targeted sequencing of trios. Bioinformatics 2019; 35:571-578. [PMID: 30084993 PMCID: PMC6378941 DOI: 10.1093/bioinformatics/bty677] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Revised: 07/25/2018] [Accepted: 08/01/2018] [Indexed: 11/25/2022] Open
Abstract
MOTIVATION De novo copy number deletions have been implicated in many diseases, but there is no formal method to date that identifies de novo deletions in parent-offspring trios from capture-based sequencing platforms. RESULTS We developed Minimum Distance for Targeted Sequencing (MDTS) to fill this void. MDTS has similar sensitivity (recall), but a much lower false positive rate compared to less specific CNV callers, resulting in a much higher positive predictive value (precision). MDTS also exhibited much better scalability. AVAILABILITY AND IMPLEMENTATION MDTS is freely available as open source software from the Bioconductor repository. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
12
|
Bureau A, Begum F, Taub MA, Hetmanski J, Parker MM, Albacha-Hejazi H, Scott AF, Murray JC, Marazita ML, Bailey-Wilson JE, Beaty TH, Ruczinski I. Inferring disease risk genes from sequencing data in multiplex pedigrees through sharing of rare variants. Genet Epidemiol 2019; 43:37-49. [PMID: 30246882 PMCID: PMC6330140 DOI: 10.1002/gepi.22155] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Revised: 07/11/2018] [Accepted: 07/15/2018] [Indexed: 12/23/2022]
Abstract
We previously demonstrated how sharing of rare variants (RVs) in distant affected relatives can be used to identify variants causing a complex and heterogeneous disease. This approach tested whether single RVs were shared by all sequenced affected family members. However, as with other study designs, joint analysis of several RVs (e.g., within genes) is sometimes required to obtain sufficient statistical power. Further, phenocopies can lead to false negatives for some causal RVs if complete sharing among affected is required. Here, we extend our methodology (Rare Variant Sharing, RVS) to address these issues. Specifically, we introduce gene-based analyses, a partial sharing test based on RV sharing probabilities for subsets of affected relatives and a haplotype-based RV definition. RVS also has the desirable feature of not requiring external estimates of variant frequency or control samples, provides functionality to assess and address violations of key assumptions, and is available as open source software for genome-wide analysis. Simulations including phenocopies, based on the families of an oral cleft study, revealed the partial and complete sharing versions of RVS achieved similar statistical power compared with alternative methods (RareIBD and the Gene-Based Segregation Test), and had superior power compared with the pedigree Variant Annotation, Analysis, and Search Tool (pVAAST) linkage statistic. In studies of multiplex cleft families, analysis of rare single nucleotide variants in the exome of 151 affected relatives from 54 families revealed no significant excess sharing in any one gene, but highlighted different patterns of sharing revealed by the complete and partial sharing tests.
Collapse
|
13
|
Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res 2019; 47:D1038-D1043. [PMID: 30445645 PMCID: PMC6323937 DOI: 10.1093/nar/gky1151] [Citation(s) in RCA: 456] [Impact Index Per Article: 91.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 11/07/2018] [Indexed: 01/28/2023] Open
Abstract
For over 50 years Mendelian Inheritance in Man has chronicled the collective knowledge of the field of medical genetics. It initially cataloged the known X-linked, autosomal recessive and autosomal dominant inherited disorders, but grew to be the primary repository of curated information on both genes and genetic phenotypes and the relationships between them. Each phenotype and gene is given a separate entry assigned a stable, unique identifier. The entries contain structured summaries of new and important information based on expert review of the biomedical literature. OMIM.org provides interactive access to the knowledge repository, including genomic coordinate searches of the gene map, views of genetic heterogeneity of phenotypes in Phenotypic Series, and side-by-side comparisons of clinical synopses. OMIM.org also supports computational queries via a robust API. All entries have extensive targeted links to other genomic resources and additional references. Updates to OMIM can be found on the update list or followed through the MIMmatch service. Updated user guides and tutorials are available on the website. As of September 2018, OMIM had over 24,600 entries, and the OMIM Morbid Map Scorecard had 6,259 molecularized phenotypes connected to 3,961 genes.
Collapse
|
14
|
Holzinger ER, Li Q, Parker MM, Hetmanski JB, Marazita ML, Mangold E, Ludwig KU, Taub MA, Begum F, Murray JC, Albacha‐Hejazi H, Alqosayer K, Al‐Souki G, Albasha Hejazi A, Scott AF, Beaty TH, Bailey‐Wilson JE. Analysis of sequence data to identify potential risk variants for oral clefts in multiplex families. Mol Genet Genomic Med 2017; 5:570-579. [PMID: 28944239 PMCID: PMC5606860 DOI: 10.1002/mgg3.320] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2017] [Revised: 06/12/2017] [Accepted: 06/14/2017] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Nonsyndromic oral clefts are craniofacial malformations, which include cleft lip with or without cleft palate. The etiology for oral clefts is complex with both genetic and environmental factors contributing to risk. Previous genome-wide association (GWAS) studies have identified multiple loci with small effects; however, many causal variants remain elusive. METHODS In this study, we address this by specifically looking for rare, potentially damaging variants in family-based data. We analyzed both whole exome sequence (WES) data and whole genome sequence (WGS) data in multiplex cleft families to identify variants shared by affected individuals. RESULTS Here we present the results from these analyses. Our most interesting finding was from a single Syrian family, which showed enrichment of nonsynonymous and potentially damaging rare variants in two genes: CASP9 and FAT4. CONCLUSION Neither of these candidate genes has previously been associated with oral clefts and, if confirmed as contributing to disease risk, may indicate novel biological pathways in the genetic etiology for oral clefts.
Collapse
|
15
|
Strande NT, Riggs ER, Buchanan AH, Ceyhan-Birsoy O, DiStefano M, Dwight SS, Goldstein J, Ghosh R, Seifert BA, Sneddon TP, Wright MW, Milko LV, Cherry JM, Giovanni MA, Murray MF, O'Daniel JM, Ramos EM, Santani AB, Scott AF, Plon SE, Rehm HL, Martin CL, Berg JS. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet 2017; 100:895-906. [PMID: 28552198 PMCID: PMC5473734 DOI: 10.1016/j.ajhg.2017.04.015] [Citation(s) in RCA: 337] [Impact Index Per Article: 48.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 04/26/2017] [Indexed: 10/19/2022] Open
Abstract
With advances in genomic sequencing technology, the number of reported gene-disease relationships has rapidly expanded. However, the evidence supporting these claims varies widely, confounding accurate evaluation of genomic variation in a clinical setting. Despite the critical need to differentiate clinically valid relationships from less well-substantiated relationships, standard guidelines for such evaluation do not currently exist. The NIH-funded Clinical Genome Resource (ClinGen) has developed a framework to define and evaluate the clinical validity of gene-disease pairs across a variety of Mendelian disorders. In this manuscript we describe a proposed framework to evaluate relevant genetic and experimental evidence supporting or contradicting a gene-disease relationship and the subsequent validation of this framework using a set of representative gene-disease pairs. The framework provides a semiquantitative measurement for the strength of evidence of a gene-disease relationship that correlates to a qualitative classification: "Definitive," "Strong," "Moderate," "Limited," "No Reported Evidence," or "Conflicting Evidence." Within the ClinGen structure, classifications derived with this framework are reviewed and confirmed or adjusted based on clinical expertise of appropriate disease experts. Detailed guidance for utilizing this framework and access to the curation interface is available on our website. This evidence-based, systematic method to assess the strength of gene-disease relationships will facilitate more knowledgeable utilization of genomic variants in clinical and research settings.
Collapse
|
16
|
Fu J, Beaty TH, Scott AF, Hetmanski J, Parker MM, Wilson JEB, Marazita ML, Mangold E, Albacha-Hejazi H, Murray JC, Bureau A, Carey J, Cristiano S, Ruczinski I, Scharpf RB. Whole exome association of rare deletions in multiplex oral cleft families. Genet Epidemiol 2017; 41:61-69. [PMID: 27910131 PMCID: PMC5154821 DOI: 10.1002/gepi.22010] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Revised: 09/21/2016] [Accepted: 09/21/2016] [Indexed: 11/11/2022]
Abstract
By sequencing the exomes of distantly related individuals in multiplex families, rare mutational and structural changes to coding DNA can be characterized and their relationship to disease risk can be assessed. Recently, several rare single nucleotide variants (SNVs) were associated with an increased risk of nonsyndromic oral cleft, highlighting the importance of rare sequence variants in oral clefts and illustrating the strength of family-based study designs. However, the extent to which rare deletions in coding regions of the genome occur and contribute to risk of nonsyndromic clefts is not well understood. To identify putative structural variants underlying risk, we developed a pipeline for rare hemizygous deletions in families from whole exome sequencing and statistical inference based on rare variant sharing. Among 56 multiplex families with 115 individuals, we identified 53 regions with one or more rare hemizygous deletions. We found 45 of the 53 regions contained rare deletions occurring in only one family member. Members of the same family shared a rare deletion in only eight regions. We also devised a scalable global test for enrichment of shared rare deletions.
Collapse
|
17
|
Hunter JE, Irving SA, Biesecker LG, Buchanan A, Jensen B, Lee K, Martin CL, Milko L, Muessig K, Niehaus AD, O'Daniel J, Piper MA, Ramos EM, Schully SD, Scott AF, Slavotinek A, Sobreira N, Strande N, Weaver M, Webber EM, Williams MS, Berg JS, Evans JP, Goddard KA. A standardized, evidence-based protocol to assess clinical actionability of genetic disorders associated with genomic variation. Genet Med 2016; 18:1258-1268. [PMID: 27124788 PMCID: PMC5085884 DOI: 10.1038/gim.2016.40] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 02/22/2016] [Indexed: 01/21/2023] Open
Abstract
PURPOSE Genome and exome sequencing can identify variants unrelated to the primary goal of sequencing. Detecting pathogenic variants associated with an increased risk of a medical disorder enables clinical interventions to improve future health outcomes in patients and their at-risk relatives. The Clinical Genome Resource, or ClinGen, aims to assess clinical actionability of genes and associated disorders as part of a larger effort to build a central resource of information regarding the clinical relevance of genomic variation for use in precision medicine and research. METHODS We developed a practical, standardized protocol to identify available evidence and generate qualitative summary reports of actionability for disorders and associated genes. We applied a semiquantitative metric to score actionability. RESULTS We generated summary reports and actionability scores for the 56 genes and associated disorders recommended by the American College of Medical Genetics and Genomics for return as secondary findings from clinical genome-scale sequencing. We also describe the challenges that arose during the development of the protocol that highlight important issues in characterizing actionability across a range of disorders. CONCLUSION The ClinGen framework for actionability assessment will assist research and clinical communities in making clear, efficient, and consistent determinations of actionability based on transparent criteria to guide analysis and reporting of findings from clinical genome-scale sequencing.Genet Med 18 12, 1258-1268.
Collapse
|
18
|
Mathias RA, Taub MA, Gignoux CR, Fu W, Musharoff S, O'Connor TD, Vergara C, Torgerson DG, Pino-Yanes M, Shringarpure SS, Huang L, Rafaels N, Boorgula MP, Johnston HR, Ortega VE, Levin AM, Song W, Torres R, Padhukasahasram B, Eng C, Mejia-Mejia DA, Ferguson T, Qin ZS, Scott AF, Yazdanbakhsh M, Wilson JG, Marrugo J, Lange LA, Kumar R, Avila PC, Williams LK, Watson H, Ware LB, Olopade C, Olopade O, Oliveira R, Ober C, Nicolae DL, Meyers D, Mayorga A, Knight-Madden J, Hartert T, Hansel NN, Foreman MG, Ford JG, Faruque MU, Dunston GM, Caraballo L, Burchard EG, Bleecker E, Araujo MI, Herrera-Paz EF, Gietzen K, Grus WE, Bamshad M, Bustamante CD, Kenny EE, Hernandez RD, Beaty TH, Ruczinski I, Akey J, Barnes KC. A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome. Nat Commun 2016; 7:12522. [PMID: 27725671 PMCID: PMC5062574 DOI: 10.1038/ncomms12522] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 07/12/2016] [Indexed: 01/20/2023] Open
Abstract
The African Diaspora in the Western Hemisphere represents one of the largest forced migrations in history and had a profound impact on genetic diversity in modern populations. To date, the fine-scale population structure of descendants of the African Diaspora remains largely uncharacterized. Here we present genetic variation from deeply sequenced genomes of 642 individuals from North and South American, Caribbean and West African populations, substantially increasing the lexicon of human genomic variation and suggesting much variation remains to be discovered in African-admixed populations in the Americas. We summarize genetic variation in these populations, quantifying the postcolonial sex-biased European gene flow across multiple regions. Moreover, we refine estimates on the burden of deleterious variants carried across populations and how this varies with African ancestry. Our data are an important resource for empowering disease mapping studies in African-admixed individuals and will facilitate gene discovery for diseases disproportionately affecting individuals of African ancestry.
Collapse
|
19
|
Qin HD, Liao XY, Chen YB, Huang SY, Xue WQ, Li FF, Ge XS, Liu DQ, Cai Q, Long J, Li XZ, Hu YZ, Zhang SD, Zhang LJ, Lehrman B, Scott AF, Lin D, Zeng YX, Shugart YY, Jia WH. Genomic Characterization of Esophageal Squamous Cell Carcinoma Reveals Critical Genes Underlying Tumorigenesis and Poor Prognosis. Am J Hum Genet 2016; 98:709-27. [PMID: 27058444 DOI: 10.1016/j.ajhg.2016.02.021] [Citation(s) in RCA: 108] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 02/24/2016] [Indexed: 12/17/2022] Open
Abstract
The genetic mechanisms underlying the poor prognosis of esophageal squamous cell carcinoma (ESCC) are not well understood. Here, we report somatic mutations found in ESCC from sequencing 10 whole-genome and 57 whole-exome matched tumor-normal sample pairs. Among the identified genes, we characterized mutations in VANGL1 and showed that they accelerated cell growth in vitro. We also found that five other genes, including three coding genes (SHANK2, MYBL2, FADD) and two non-coding genes (miR-4707-5p, PCAT1), were involved in somatic copy-number alterations (SCNAs) or structural variants (SVs). A survival analysis based on the expression profiles of 321 individuals with ESCC indicated that these genes were significantly associated with poorer survival. Subsequently, we performed functional studies, which showed that miR-4707-5p and MYBL2 promoted proliferation and metastasis. Together, our results shed light on somatic mutations and genomic events that contribute to ESCC tumorigenesis and prognosis and might suggest therapeutic targets.
Collapse
|
20
|
Mitchell CJ, Getnet D, Kim MS, Manda SS, Kumar P, Huang TC, Pinto SM, Nirujogi RS, Iwasaki M, Shaw PG, Wu X, Zhong J, Chaerkady R, Marimuthu A, Muthusamy B, Sahasrabuddhe NA, Raju R, Bowman C, Danilova L, Cutler J, Kelkar DS, Drake CG, Prasad TSK, Marchionni L, Murakami PN, Scott AF, Shi L, Thierry-Mieg J, Thierry-Mieg D, Irizarry R, Cope L, Ishihama Y, Wang C, Gowda H, Pandey A. A multi-omic analysis of human naïve CD4+ T cells. BMC SYSTEMS BIOLOGY 2015; 9:75. [PMID: 26542228 PMCID: PMC4636073 DOI: 10.1186/s12918-015-0225-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 10/28/2015] [Indexed: 12/21/2022]
Abstract
Background Cellular function and diversity are orchestrated by complex interactions of fundamental biomolecules including DNA, RNA and proteins. Technological advances in genomics, epigenomics, transcriptomics and proteomics have enabled massively parallel and unbiased measurements. Such high-throughput technologies have been extensively used to carry out broad, unbiased studies, particularly in the context of human diseases. Nevertheless, a unified analysis of the genome, epigenome, transcriptome and proteome of a single human cell type to obtain a coherent view of the complex interplay between various biomolecules has not yet been undertaken. Here, we report the first multi-omic analysis of human primary naïve CD4+ T cells isolated from a single individual. Results Integrating multi-omics datasets allowed us to investigate genome-wide methylation and its effect on mRNA/protein expression patterns, extent of RNA editing under normal physiological conditions and allele specific expression in naïve CD4+ T cells. In addition, we carried out a multi-omic comparative analysis of naïve with primary resting memory CD4+ T cells to identify molecular changes underlying T cell differentiation. This analysis provided mechanistic insights into how several molecules involved in T cell receptor signaling are regulated at the DNA, RNA and protein levels. Phosphoproteomics revealed downstream signaling events that regulate these two cellular states. Availability of multi-omics data from an identical genetic background also allowed us to employ novel proteogenomics approaches to identify individual-specific variants and putative novel protein coding regions in the human genome. Conclusions We utilized multiple high-throughput technologies to derive a comprehensive profile of two primary human cell types, naïve CD4+ T cells and memory CD4+ T cells, from a single donor. Through vertical as well as horizontal integration of whole genome sequencing, methylation arrays, RNA-Seq, miRNA-Seq, proteomics, and phosphoproteomics, we derived an integrated and comparative map of these two closely related immune cells and identified potential molecular effectors of immune cell differentiation following antigen encounter. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0225-4) contains supplementary material, which is available to authorized users.
Collapse
|
21
|
Bu L, Chen Q, Wang H, Zhang T, Hetmanski JB, Schwender H, Parker M, Chou YHW, Yeow V, Chong SS, Zhang B, Jabs EW, Scott AF, Beaty TH. Novel evidence of association with nonsyndromic cleft lip with or without cleft palate was shown for single nucleotide polymorphisms in FOXF2 gene in an Asian population. BIRTH DEFECTS RESEARCH. PART A, CLINICAL AND MOLECULAR TERATOLOGY 2015; 103:857-62. [PMID: 26278207 PMCID: PMC5180447 DOI: 10.1002/bdra.23413] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2015] [Revised: 06/15/2015] [Accepted: 06/28/2015] [Indexed: 11/07/2022]
Abstract
BACKGROUND The forkhead box F2 gene (FOXF2) located in chromosome 6p25.3 has been shown to play a crucial role in palatal development in mouse and rat models. To date, no evidence of linkage or association has been reported for this gene in humans with oral clefts. METHODS Allelic transmission disequilibrium tests were used to robustly assess evidence of linkage and association with nonsyndromic cleft lip with or without cleft palate for nine single nucleotide polymorphisms (SNPs) in and around FOXF2 in both Asian and European trios using PLINK. RESULTS Statistically significant evidence of linkage and association was shown for two SNPs (rs1711968 and rs732835) in 216 Asian trios where the empiric P values with permutation tests were 0.0016 and 0.005, respectively. The corresponding estimated odds ratios for carrying the minor allele at these SNPs were 2.05 (95% confidence interval = 1.41, 2.98) and 1.77 (95% confidence interval = 1.26, 2.49), respectively. CONCLUSION Our results provided statistical evidence of linkage and association between FOXF2 and nonsyndromic cleft lip with or without cleft palate.
Collapse
|
22
|
Younkin SG, Scharpf RB, Schwender H, Parker MM, Scott AF, Marazita ML, Beaty TH, Ruczinski I. A genome-wide study of inherited deletions identified two regions associated with nonsyndromic isolated oral clefts. ACTA ACUST UNITED AC 2015; 103:276-83. [PMID: 25776870 DOI: 10.1002/bdra.23362] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND DNA copy number variants play an important part in the development of common birth defects such as oral clefts. Individual patients with multiple birth defects (including oral clefts) have been shown to carry small and large chromosomal deletions. METHODS We investigated the role of polymorphic copy number deletions by comparing transmission rates of deletions from parents to offspring in case-parent trios of European ancestry ascertained through a cleft proband with trios ascertained through a normal offspring. DNA copy numbers in trios were called using the joint hidden Markov model in the freely available PennCNV software. All statistical analyses were performed using Bioconductor tools in the open source environment R. RESULTS We identified a 67 kb region in the gene MGAM on chromosome 7q34, and a 206 kb region overlapping genes ADAM3A and ADAM5 on chromosome 8p11, where deletions are more frequently transmitted to cleft offspring than control offspring. CONCLUSIONS These genes or nearby regulatory elements may be involved in the etiology of oral clefts.
Collapse
|
23
|
Leslie EJ, Taub MA, Liu H, Steinberg KM, Koboldt DC, Zhang Q, Carlson JC, Hetmanski JB, Wang H, Larson DE, Fulton RS, Kousa YA, Fakhouri WD, Naji A, Ruczinski I, Begum F, Parker MM, Busch T, Standley J, Rigdon J, Hecht JT, Scott AF, Wehby GL, Christensen K, Czeizel AE, Deleyiannis FWB, Schutte BC, Wilson RK, Cornell RA, Lidral AC, Weinstock GM, Beaty TH, Marazita ML, Murray JC. Identification of functional variants for cleft lip with or without cleft palate in or near PAX7, FGFR2, and NOG by targeted sequencing of GWAS loci. Am J Hum Genet 2015; 96:397-411. [PMID: 25704602 PMCID: PMC4375420 DOI: 10.1016/j.ajhg.2015.01.004] [Citation(s) in RCA: 126] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 01/09/2015] [Indexed: 11/21/2022] Open
Abstract
Although genome-wide association studies (GWASs) for nonsyndromic orofacial clefts have identified multiple strongly associated regions, the causal variants are unknown. To address this, we selected 13 regions from GWASs and other studies, performed targeted sequencing in 1,409 Asian and European trios, and carried out a series of statistical and functional analyses. Within a cluster of strongly associated common variants near NOG, we found that one, rs227727, disrupts enhancer activity. We furthermore identified significant clusters of non-coding rare variants near NTN1 and NOG and found several rare coding variants likely to affect protein function, including four nonsense variants in ARHGAP29. We confirmed 48 de novo mutations and, based on best biological evidence available, chose two of these for functional assays. One mutation in PAX7 disrupted the DNA binding of the encoded transcription factor in an in vitro assay. The second, a non-coding mutation, disrupted the activity of a neural crest enhancer downstream of FGFR2 both in vitro and in vivo. This targeted sequencing study provides strong functional evidence implicating several specific variants as primary contributory risk alleles for nonsyndromic clefting in humans.
Collapse
|
24
|
Scott AF, Mohr DW, Kasch LM, Barton JA, Pittiglio R, Ingersoll R, Craig B, Marosy BA, Doheny KF, Bromley WC, Roderick TH, Chassaing N, Calvas P, Prabhu SS, Jabs EW. Identification of an HMGB3 frameshift mutation in a family with an X-linked colobomatous microphthalmia syndrome using whole-genome and X-exome sequencing. JAMA Ophthalmol 2015; 132:1215-20. [PMID: 24993872 DOI: 10.1001/jamaophthalmol.2014.1731] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
IMPORTANCE Microphthalmias are rare disorders whose genetic bases are not fully understood. HMGB3 is a new candidate gene for X-linked forms of this disease. OBJECTIVE To identify the causative gene in a pedigree with an X-linked colobomatous microphthalmos phenotype. DESIGN, SETTING, AND PARTICIPANTS Whole-genome sequencing and chromosome X-exome-targeted sequencing were performed at the High Throughput Sequencing Laboratory of the Genetic Resources Core Facility at the Johns Hopkins University School of Medicine on the DNA of the male proband and informatically filtered to identify rare variants. Polymerase chain reaction and Sanger sequencing were used to confirm the variant in the proband and the carrier status of his mother. Thirteen unrelated male patients with a similar phenotype were also screened. MAIN OUTCOMES AND MEASURES Whole-genome and X-exome sequencing to identify a frameshift variant in HMGB3. RESULTS A 2-base pair frameshift insertion (c.477_478insTA, coding for p.Lys161Ilefs*54) in the HGMB3 gene was found in the proband and his carrier mother but not in the unrelated patients. The mutation, confirmed by 3 orthogonal methods, alters an evolutionarily conserved region of the HMGB3 protein from a negatively charged polyglutamic acid tract to a positively charged arginine-rich motif that is likely to interfere with normal protein function. CONCLUSIONS AND RELEVANCE In this family, microphthalmia, microcephaly, intellectual disability, and short stature are associated with a mutation on the X chromosome in the HMGB3 gene. HMGB3 should be considered when performing genetic studies of patients with similar phenotypes.
Collapse
|
25
|
Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res 2014; 43:D789-98. [PMID: 25428349 PMCID: PMC4383985 DOI: 10.1093/nar/gku1205] [Citation(s) in RCA: 1399] [Impact Index Per Article: 139.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Online Mendelian Inheritance in Man, OMIM®, is a comprehensive, authoritative and timely research resource of curated descriptions of human genes and phenotypes and the relationships between them. The new official website for OMIM, OMIM.org (http://omim.org), was launched in January 2011. OMIM is based on the published peer-reviewed biomedical literature and is used by overlapping and diverse communities of clinicians, molecular biologists and genome scientists, as well as by students and teachers of these disciplines. Genes and phenotypes are described in separate entries and are given unique, stable six-digit identifiers (MIM numbers). OMIM entries have a structured free-text format that provides the flexibility necessary to describe the complex and nuanced relationships between genes and genetic phenotypes in an efficient manner. OMIM also has a derivative table of genes and genetic phenotypes, the Morbid Map. OMIM.org has enhanced search capabilities such as genome coordinate searching and thesaurus-enhanced search term options. Phenotypic series have been created to facilitate viewing genetic heterogeneity of phenotypes. Clinical synopsis features are enhanced with UMLS, Human Phenotype Ontology and Elements of Morphology terms and image links. All OMIM data are available for FTP download and through an API. MIMmatch is a novel outreach feature to disseminate updates and encourage collaboration.
Collapse
|