26
|
Hildebrand MS, DeLuca AP, Taylor KR, Hoskinson DP, Hur IA, Tack D, McMordie SJ, Huygen PLM, Casavant TL, Smith RJH. A contemporary review of AudioGene audioprofiling: a machine-based candidate gene prediction tool for autosomal dominant nonsyndromic hearing loss. Laryngoscope 2009; 119:2211-5. [PMID: 19780026 DOI: 10.1002/lary.20664] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
Review |
16 |
31 |
27
|
Shrout JD, Scheetz TE, Casavant TL, Parkin GF. Isolation and characterization of autotrophic, hydrogen-utilizing, perchlorate-reducing bacteria. Appl Microbiol Biotechnol 2004; 67:261-8. [PMID: 15834721 DOI: 10.1007/s00253-004-1725-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2004] [Revised: 06/25/2004] [Accepted: 07/09/2004] [Indexed: 10/26/2022]
Abstract
Recent studies have shown that perchlorate (ClO(4) (-)) can be degraded by some pure-culture and mixed-culture bacteria with the addition of hydrogen. This paper describes the isolation of two hydrogen-utilizing perchlorate-degrading bacteria capable of using inorganic carbon for growth. These autotrophic bacteria are within the genus Dechloromonas and are the first Dechloromonas species that are microaerophilic and incapable of growth at atmospheric oxygen concentrations. Dechloromonas sp. JDS5 and Dechloromonas sp. JDS6 are the first perchlorate-degrading autotrophs isolated from a perchlorate-contaminated site. Measured hydrogen thresholds were higher than for other environmentally significant, hydrogen-utilizing, anaerobic bacteria (e.g., halorespirers). The chlorite dismutase activity of these bacteria was greater for autotrophically grown cells than for cells grown heterotrophically on lactate. These bacteria used fumarate as an alternate electron acceptor, which is the first report of growth on an organic electron acceptor by perchlorate-reducing bacteria.
Collapse
|
|
21 |
31 |
28
|
Scheetz TE, Fingert JH, Wang K, Kuehn MH, Knudtson KL, Alward WLM, Boldt HC, Russell SR, Folk JC, Casavant TL, Braun TA, Clark AF, Stone EM, Sheffield VC. A genome-wide association study for primary open angle glaucoma and macular degeneration reveals novel Loci. PLoS One 2013; 8:e58657. [PMID: 23536807 PMCID: PMC3594156 DOI: 10.1371/journal.pone.0058657] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Accepted: 02/07/2013] [Indexed: 11/22/2022] Open
Abstract
Glaucoma and age-related macular degeneration (AMD) are the two leading causes of visual loss in the United States. We utilized a novel study design to perform a genome-wide association for both primary open angle glaucoma (POAG) and AMD. This study design utilized a two-stage process for hypothesis generation and validation, in which each disease cohort was utilized as a control for the other. A total of 400 POAG patients and 400 AMD patients were ascertained and genotyped at 500,000 loci. This study identified a novel association of complement component 7 (C7) to POAG. Additionally, an association of central corneal thickness, a known risk factor for POAG, was found to be associated with ribophorin II (RPN2). Linked monogenic loci for POAG and AMD were also evaluated for evidence of association, none of which were found to be significantly associated. However, several yielded putative associations requiring validation. Our data suggest that POAG is more genetically complex than AMD, with no common risk alleles of large effect.
Collapse
|
Research Support, N.I.H., Extramural |
12 |
30 |
29
|
Kwitek AE, Tonellato PJ, Chen D, Gullings-Handley J, Cheng YS, Twigger S, Scheetz TE, Casavant TL, Stoll M, Nobrega MA, Shiozawa M, Soares MB, Sheffield VC, Jacob HJ. Automated construction of high-density comparative maps between rat, human, and mouse. Genome Res 2001; 11:1935-43. [PMID: 11691858 PMCID: PMC311144 DOI: 10.1101/gr.173701] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Animal models have been used primarily as surrogates for humans, having similar disease-based phenotypes. Genomic organization also tends to be conserved between species, leading to the generation of comparative genome maps. The emergence of radiation hybrid (RH) maps, coupled with the large numbers of available Expressed Sequence Tags (ESTs), has revolutionized the way comparative maps can be built. We used publicly available rat, mouse, and human data to identify genes and ESTs with interspecies sequence identity (homology), identified their UniGene relationships, and incorporated their RH map positions to build integrated comparative maps with >2100 homologous UniGenes mapped in more than one species (approximately 6% of all mammalian genes). The generation of these maps is iterative and labor intensive; therefore, we developed a series of computer tools (not described here) based on our algorithm that identifies anchors between species and produces printable and on-line clickable comparative maps that link to a wide variety of useful tools and databases. The maps were constructed using sequence-based comparisons, thus creating "hooks" for further sequence-based annotation of human, mouse, and rat sequences. Currently, this map enables investigators to link the physiology of the rat with the genetics of the mouse and the clinical significance of the human.
Collapse
|
research-article |
24 |
28 |
30
|
Scheetz T, Bartlett JA, Walters JD, Schutte BC, Casavant TL, McCray PB. Genomics-based approaches to gene discovery in innate immunity. Immunol Rev 2002; 190:137-45. [PMID: 12493011 DOI: 10.1034/j.1600-065x.2002.19010.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The completion of draft sequences of the human and mouse genomes offers many opportunities for gene discovery in the field of immunology through the application of the methods of computational genomics. One arm of the innate immune system includes the antimicrobial peptides that protect multicellular organisms from a diverse spectrum of microorganisms. The beta-defensins comprise an important family of mammalian antimicrobial peptides. To better define the beta-defensin gene family, we developed an approach to search genomic databases for conserved motifs present in the beta-defensin family using HMMER, a computational search tool based on hidden Markov models (HMMs), in combination with the basic local alignment search tool. The approach was first used to identify candidate second-exon coding regions, and later applied to finding associated first exons. This strategy discovered 28 new human and 43 new mouse beta-defensin genes in five syntenic chromosomal regions. Within each syntenic cluster, the gene sequences and organization were similar, suggesting that each cluster pair arose from a common ancestor and was retained because of conserved functions. These findings demonstrate an important proof-of-principle for a genome-wide search strategy to identify genes with conserved structural motifs. Such an approach may be readily adopted to address other questions of relevance to immunology.
Collapse
|
Comparative Study |
23 |
26 |
31
|
Scheetz TE, Trivedi N, Roberts CA, Kucaba T, Berger B, Robinson NL, Birkett CL, Gavin AJ, O'Leary B, Braun TA, Bonaldo MF, Robinson JP, Sheffield VC, Soares MB, Casavant TL. ESTprep: preprocessing cDNA sequence reads. Bioinformatics 2003; 19:1318-24. [PMID: 12874042 DOI: 10.1093/bioinformatics/btg159] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. RESULTS This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. AVAILABILITY The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html
Collapse
|
Comparative Study |
22 |
24 |
32
|
Monson ET, de Klerk K, Gaynor SC, Wagner AH, Breen ME, Parsons M, Casavant TL, Zandi PP, Potash JB, Willour VL. Whole-gene sequencing investigation of SAT1 in attempted suicide. Am J Med Genet B Neuropsychiatr Genet 2016; 171:888-95. [PMID: 27229768 PMCID: PMC5814250 DOI: 10.1002/ajmg.b.32462] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 05/11/2016] [Indexed: 11/10/2022]
Abstract
Suicidal behavior imposes a tremendous cost, with current US estimates reporting approximately 1.3 million suicide attempts and more than 40,000 suicide deaths each year. Several recent research efforts have identified an association between suicidal behavior and the expression level of the spermidine/spermine N1-acetyltransferase 1 (SAT1) gene. To date, several SAT1 genetic variants have been inconsistently associated with altered gene expression and/or directly with suicidal behavior. To clarify the role SAT1 genetic variation plays in suicidal behavior risk, we present a whole-gene sequencing effort of SAT1 in 476 bipolar disorder subjects with a history of suicide attempt and 473 subjects with bipolar disorder but no suicide attempts. Agilent SureSelect target enrichment was used to sequence all exons, introns, promoter regions, and putative regulatory regions identified from the ENCODE project within 10 kb of SAT1. Individual variant, haplotype, and collapsing variant tests were performed. Our results identified no variant or assessed region of SAT1 that showed a significant association with attempted suicide, nor did any assessment show evidence for replication of previously reported associations. Overall, no evidence for SAT1 sequence variation contributing to the risk for attempted suicide could be identified. It is possible that past associations of SAT1 expression with suicidal behavior arise from variation not captured in this study, or that causal variants in the region are too rare to be detected within our sample. Larger sample sizes and broader sequencing efforts will likely be required to identify the source of SAT1 expression level associations with suicidal behavior. © 2016 Wiley Periodicals, Inc.
Collapse
|
research-article |
9 |
23 |
33
|
Scheetz TE, Laffin JJ, Berger B, Holte S, Baumes SA, Brown R, Chang S, Coco J, Conklin J, Crouch K, Donohue M, Doonan G, Estes C, Eyestone M, Fishler K, Gardiner J, Guo L, Johnson B, Keppel C, Kreger R, Lebeck M, Marcelino R, Miljkovich V, Perdue M, Qui L, Rehmann J, Reiter RS, Rhoads B, Schaefer K, Smith C, Sunjevaric I, Trout K, Wu N, Birkett CL, Bischof J, Gackle B, Gavin A, Grundstad AJ, Mokrzycki B, Moressi C, O'Leary B, Pedretti K, Roberts C, Robinson NL, Smith M, Tack D, Trivedi N, Kucaba T, Freeman T, Lin JJC, Bonaldo MF, Casavant TL, Sheffield VC, Soares MB. High-throughput gene discovery in the rat. Genome Res 2004; 14:733-41. [PMID: 15060017 PMCID: PMC383320 DOI: 10.1101/gr.1414204] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The rat is an important animal model for human diseases and is widely used in physiology. In this article we present a new strategy for gene discovery based on the production of ESTs from serially subtracted and normalized cDNA libraries, and we describe its application for the development of a comprehensive nonredundant collection of rat ESTs. Our new strategy appears to yield substantially more EST clusters per ESTs sequenced than do previous approaches that did not use serial subtraction. However, multiple rounds of library subtraction resulted in high frequencies of otherwise rare internally primed cDNAs, defining the limits of this powerful approach. To date, we have generated >200,000 3' ESTs from >100 cDNA libraries representing a wide range of tissues and developmental stages of the laboratory rat. Most importantly, we have contributed to approximately 50,000 rat UniGene clusters. We have identified, arrayed, and derived 5' ESTs from >30,000 unique rat cDNA clones. Complete information, including radiation hybrid mapping data, is also maintained locally at http://genome.uiowa.edu/clcg.html. All of the sequences described in this article have been submitted to the dbEST division of the NCBI.
Collapse
|
Research Support, U.S. Gov't, P.H.S. |
21 |
23 |
34
|
Scheetz TE, Zabner J, Welsh MJ, Coco J, Eyestone MDF, Bonaldo M, Kucaba T, Casavant TL, Soares MB, McCray PB. Large-scale gene discovery in human airway epithelia reveals novel transcripts. Physiol Genomics 2004; 17:69-77. [PMID: 14701920 DOI: 10.1152/physiolgenomics.00188.2003] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The airway epithelium represents an important barrier between the host and the environment. It is a first site of contact with pathogens, particulates, and other stimuli, and has evolved the means to dynamically respond to these challenges. In an effort to define the transcript profile of airway epithelia, we created and sequenced cDNA libraries from cystic fibrosis (CF) and non-CF epithelia and from human lung tissue. Sequencing of these libraries produced approximately 53,000 3'-expressed sequence tags (3'-ESTs). From these, a nonredundant UniGene set of more than 19,000 sequences was generated. Despite the relatively small contribution of airway epithelia to the total mass of the lung, focused gene discovery in this tissue yielded novel results. The ESTs included several thousand transcripts (6,416) not previously identified from cDNA sequences as expressed in the lung. Among the abundant transcripts were several genes involved in host defense. Most importantly, the set also included 879 3'-ESTs that appear to be novel sequences not previously represented in the National Center for Biotechnology Information UniGene collection. This UniGene set should be useful for studies of pulmonary diseases involving the airway epithelium including cystic fibrosis, respiratory infections and asthma. It also provides a reagent for large-scale expression profiling.
Collapse
|
Research Support, U.S. Gov't, P.H.S. |
21 |
23 |
35
|
Keen HL, Halabi CM, Beyer AM, de Lange WJ, Liu X, Maeda N, Faraci FM, Casavant TL, Sigmund CD. Bioinformatic analysis of gene sets regulated by ligand-activated and dominant-negative peroxisome proliferator-activated receptor gamma in mouse aorta. Arterioscler Thromb Vasc Biol 2009; 30:518-25. [PMID: 20018933 DOI: 10.1161/atvbaha.109.200733] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
OBJECTIVE Drugs that activate peroxisome proliferator-activated receptor (PPAR) gamma improve glucose sensitivity and lower blood pressure, whereas dominant-negative mutations in PPARgamma cause severe insulin resistance and hypertension. We hypothesize that these PPARgamma mutants regulate target genes opposite to those of ligand-mediated activation, and we tested this hypothesis on a genomewide scale. METHODS AND RESULTS We integrated gene expression data in aorta specimens from mice treated with the PPARgamma ligand rosiglitazone with data from mice containing a globally expressed knockin of the PPARgamma P465L dominant-negative mutation. We also integrated our data with publicly available data sets containing the following: (1) gene expression profiles in many human tissues, (2) PPARgamma target genes in 3T3-L1 adipocytes, and (3) experimentally validated PPARgamma binding sites throughout the genome. Many classic PPARgamma target genes were induced by rosiglitazone and repressed by dominant-negative PPARgamma. A similar pattern was observed for about 90% of the gene sets regulated by both rosiglitazone and dominant-negative PPARgamma. Genes exhibiting this pattern of contrasting regulation were significantly enriched for nearby PPARgamma binding sites. CONCLUSIONS These results provide convincing evidence that the PPARgamma P465L mutation causes transcriptional effects that are opposite to those mediated by PPARgamma ligand, thus validating mice carrying the mutation as a model of PPARgamma interference.
Collapse
|
Research Support, Non-U.S. Gov't |
16 |
23 |
36
|
Scheetz TE, Raymond MR, Nishimura DY, McClain A, Roberts C, Birkett C, Gardiner J, Zhang J, Butters N, Sun C, Kwitek-Black A, Jacob H, Casavant TL, Soares MB, Sheffield VC. Generation of a high-density rat EST map. Genome Res 2001; 11:497-502. [PMID: 11230173 PMCID: PMC311028 DOI: 10.1101/gr.gr-1516r] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We have developed a high-density EST map of the rat, consisting of >11,000 ESTs. These ESTs were placed on a radiation hybrid framework map of genetic markers spanning all 20 rat autosomes, plus the X chromosome. The framework maps have a total size of approximately 12,400 cR, giving an average correspondence of 240 kb/cR. The frameworks are all LOD 3 chromosomal maps consisting of 775 radiation-hybrid-mapped genetic markers and ESTs. To date, we have generated radiation-hybrid-mapping data for >14,000 novel ESTs identified by our Rat Gene Discovery and Mapping Project (http://ratEST.uiowa.edu), from which we have placed >11,000 on our framework maps. To minimize mapping errors, ESTs were mapped in duplicate and consensus RH vectors produced for use in the placement procedure. This EST map was then used to construct high-density comparative maps between rat and human and rat and mouse. These maps will be a useful resource for positional cloning of genes for rat models of human diseases and in the creation and verification of a tiling set of map order for the upcoming rat-genome sequencing.
Collapse
|
research-article |
24 |
22 |
37
|
Gavin AJ, Scheetz TE, Roberts CA, O'Leary B, Braun TA, Sheffield VC, Soares MB, Robinson JP, Casavant TL. Pooled library tissue tags for EST-based gene discovery. Bioinformatics 2002; 18:1162-6. [PMID: 12217907 DOI: 10.1093/bioinformatics/18.9.1162] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION In gene discovery projects based on EST sequencing, effective post-sequencing identification methods are important in determining tissue sources of ESTs within pooled cDNA libraries. In the past, such identification efforts have been characterized by higher than necessary failure rates due to the presence of errors within the subsequence containing the oligo tag intended to define the tissue source for each EST. RESULTS A large-scale EST-based gene discovery program at The University of Iowa has led to the creation of a unique software method named UITagCreator usable in the creation of large sets of synthetic tissue identification tags. The identification tags provide error detection and correction capability and, in conjunction with automated annotation software, result in a substantial improvement in the accurate identification of the tissue source in the presence of sequencing and base-calling errors. These identification rates are favorable, relative to past paradigms. AVAILABILITY The UITagCreator source code and installation instructions, along with detection software usable in concert with created tag sets, is freely available at http://genome.uiowa.edu/pubsoft/software.html CONTACT tomc@eng.uiowa.edu
Collapse
|
|
23 |
19 |
38
|
Taylor KR, Deluca AP, Shearer AE, Hildebrand MS, Black-Ziegelbein EA, Anand VN, Sloan CM, Eppsteiner RW, Scheetz TE, Huygen PLM, Smith RJH, Braun TA, Casavant TL. AudioGene: predicting hearing loss genotypes from phenotypes to guide genetic screening. Hum Mutat 2013; 34:539-45. [PMID: 23280582 DOI: 10.1002/humu.22268] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 12/18/2012] [Indexed: 12/31/2022]
Abstract
Autosomal dominant nonsyndromic hearing loss (ADNSHL) is a common and often progressive sensory deficit. ADNSHL displays a high degree of genetic heterogeneity and varying rates of progression. Accurate, comprehensive, and cost-effective genetic testing facilitates genetic counseling and provides valuable prognostic information to affected individuals. In this article, we describe the algorithm underlying AudioGene, a software system employing machine-learning techniques that utilizes phenotypic information derived from audiograms to predict the genetic cause of hearing loss in persons segregating ADNSHL. Our data show that AudioGene has an accuracy of 68% in predicting the causative gene within its top three predictions, as compared with 44% for a majority classifier. We also show that AudioGene remains effective for audiograms with high levels of clinical measurement noise. We identify audiometric outliers for each genetic locus and hypothesize that outliers may reflect modifying genetic effects. As personalized genomic medicine becomes more common, AudioGene will be increasingly useful as a phenotypic filter to assess pathogenicity of variants identified by massively parallel sequencing.
Collapse
|
Research Support, N.I.H., Extramural |
12 |
19 |
39
|
Zhao SH, Simmons DG, Cross JC, Scheetz TE, Casavant TL, Soares MB, Tuggle CK. PLET1 (C11orf34), a highly expressed and processed novel gene in pig and mouse placenta, is transcribed but poorly spliced in human. Genomics 2004; 84:114-25. [PMID: 15203209 DOI: 10.1016/j.ygeno.2004.02.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2003] [Accepted: 02/12/2004] [Indexed: 11/28/2022]
Abstract
Sequencing of porcine cDNAs identified a novel EST with high frequency in placenta tissue. Full-length PLET1 (placenta-expressed transcript 1, also called C11orf34) matched a mouse cDNA and many bovine and mouse ESTs but no human transcripts or ESTs. However, the porcine cDNA matched several putative exons within a human genomic DNA fragment on chromosome 11. This human locus is in a region of conserved synteny with pig chromosome 9, to which the porcine gene was subsequently mapped. RNA blot hybridization showed that this gene had high expression in porcine and mouse conceptus and throughout placenta development. In situ hybridization using mouse placenta showed PLET1 expression in trophoblast cells of the labyrinth, as well as in spongiotrophoblast and glycogen trophoblast cells. However, no expression of PLET1 was detected by RNA blot analysis of human placenta, although RT-PCR analysis detected very small amounts of partially spliced RNA that were significantly less abundant than the RNA levels in mouse placenta. Donor and acceptor splicing site sequences in the exons of the human gene are poorly conserved and may be the cause of inefficient splicing found specifically in human tissue. Our data correct GenomeScan annotation of this region of the human genome and describe functional gene discovery in mammals not recognized in human EST projects.
Collapse
|
|
21 |
18 |
40
|
Bonaldo MF, Bair TB, Scheetz TE, Snir E, Akabogu I, Bair JL, Berger B, Crouch K, Davis A, Eyestone ME, Keppel C, Kucaba TA, Lebeck M, Lin JL, de Melo AIR, Rehmann J, Reiter RS, Schaefer K, Smith C, Tack D, Trout K, Sheffield VC, Lin JJC, Casavant TL, Soares MB. 1274 full-open reading frames of transcripts expressed in the developing mouse nervous system. Genome Res 2004; 14:2053-63. [PMID: 15489326 PMCID: PMC528920 DOI: 10.1101/gr.2601304] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
As part of the trans-National Institutes of Health (NIH) Mouse Brain Molecular Anatomy Project (BMAP), and in close coordination with the NIH Mammalian Gene Collection Program (MGC), we initiated a large-scale project to clone, identify, and sequence the complete open reading frame (ORF) of transcripts expressed in the developing mouse nervous system. Here we report the analysis of the ORF sequence of 1274 cDNAs, obtained from 47 full-length-enriched cDNA libraries, constructed by using a novel approach, herein described. cDNA libraries were derived from size-fractionated cytoplasmic mRNA isolated from brain and eye tissues obtained at several embryonic stages and postnatal days. Altogether, including the full-ORF MGC sequences derived from these libraries by the MGC sequencing team, NIH_BMAP full-ORF sequences correspond to approximately 20% of all transcripts currently represented in mouse MGC. We show that NIH_BMAP clones comprise 68% of mouse MGC cDNAs > or =5 kb, and 54% of those > or =4 kb, as of March 15, 2004. Importantly, we identified transcripts, among the 1274 full-ORF sequences, that are exclusively or predominantly expressed in brain and eye tissues, many of which encode yet uncharacterized proteins.
Collapse
|
Research Support, U.S. Gov't, P.H.S. |
21 |
17 |
41
|
Rendleman MC, Buatti JM, Braun TA, Smith BJ, Nwakama C, Beichel RR, Brown B, Casavant TL. Machine learning with the TCGA-HNSC dataset: improving usability by addressing inconsistency, sparsity, and high-dimensionality. BMC Bioinformatics 2019; 20:339. [PMID: 31208324 PMCID: PMC6580485 DOI: 10.1186/s12859-019-2929-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2018] [Accepted: 06/04/2019] [Indexed: 12/15/2022] Open
Abstract
Background In the era of precision oncology and publicly available datasets, the amount of information available for each patient case has dramatically increased. From clinical variables and PET-CT radiomics measures to DNA-variant and RNA expression profiles, such a wide variety of data presents a multitude of challenges. Large clinical datasets are subject to sparsely and/or inconsistently populated fields. Corresponding sequencing profiles can suffer from the problem of high-dimensionality, where making useful inferences can be difficult without correspondingly large numbers of instances. In this paper we report a novel deployment of machine learning techniques to handle data sparsity and high dimensionality, while evaluating potential biomarkers in the form of unsupervised transformations of RNA data. We apply preprocessing, MICE imputation, and sparse principal component analysis (SPCA) to improve the usability of more than 500 patient cases from the TCGA-HNSC dataset for enhancing future oncological decision support for Head and Neck Squamous Cell Carcinoma (HNSCC). Results Imputation was shown to improve prognostic ability of sparse clinical treatment variables. SPCA transformation of RNA expression variables reduced runtime for RNA-based models, though changes to classifier performance were not significant. Gene ontology enrichment analysis of gene sets associated with individual sparse principal components (SPCs) are also reported, showing that both high- and low-importance SPCs were associated with cell death pathways, though the high-importance gene sets were found to be associated with a wider variety of cancer-related biological processes. Conclusions MICE imputation allowed us to impute missing values for clinically informative features, improving their overall importance for predicting two-year recurrence-free survival by incorporating variance from other clinical variables. Dimensionality reduction of RNA expression profiles via SPCA reduced both computation cost and model training/evaluation time without affecting classifier performance, allowing researchers to obtain experimental results much more quickly. SPCA simultaneously provided a convenient avenue for consideration of biological context via gene ontology enrichment analysis.
Collapse
|
Journal Article |
6 |
14 |
42
|
Tollefson MR, Gogal RA, Weaver AM, Schaefer AM, Marini RJ, Azaiez H, Kolbe DL, Wang D, Weaver AE, Casavant TL, Braun TA, Smith RJH, Schnieders MJ. Assessing variants of uncertain significance implicated in hearing loss using a comprehensive deafness proteome. Hum Genet 2023; 142:819-834. [PMID: 37086329 PMCID: PMC10182131 DOI: 10.1007/s00439-023-02559-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 04/11/2023] [Indexed: 04/23/2023]
Abstract
Hearing loss is the leading sensory deficit, affecting ~ 5% of the population. It exhibits remarkable heterogeneity across 223 genes with 6328 pathogenic missense variants, making deafness-specific expertise a prerequisite for ascribing phenotypic consequences to genetic variants. Deafness-implicated variants are curated in the Deafness Variation Database (DVD) after classification by a genetic hearing loss expert panel and thorough informatics pipeline. However, seventy percent of the 128,167 missense variants in the DVD are "variants of uncertain significance" (VUS) due to insufficient evidence for classification. Here, we use the deep learning protein prediction algorithm, AlphaFold2, to curate structures for all DVD genes. We refine these structures with global optimization and the AMOEBA force field and use DDGun3D to predict folding free energy differences (∆∆GFold) for all DVD missense variants. We find that 5772 VUSs have a large, destabilizing ∆∆GFold that is consistent with pathogenic variants. When also filtered for CADD scores (> 25.7), we determine 3456 VUSs are likely pathogenic at a probability of 99.0%. Of the 224 genes in the DVD, 166 genes (74%) exhibit one or more missense variants predicted to cause a pathogenic change in protein folding stability. The VUSs prioritized here affect 119 patients (~ 3% of cases) sequenced by the OtoSCOPE targeted panel. Approximately half of these patients previously received an inconclusive report, and reclassification of these VUSs as pathogenic provides a new genetic diagnosis for six patients.
Collapse
|
research-article |
2 |
12 |
43
|
Bayouth JE, Casavant TL, Graham MM, Sonka M, Muruganandham M, Buatti JM. Image-based biomarkers in clinical practice. Semin Radiat Oncol 2011; 21:157-66. [PMID: 21356483 PMCID: PMC4270476 DOI: 10.1016/j.semradonc.2010.11.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The growth of functional and metabolically informative imaging is eclipsing anatomic imaging alone in clinical practice. The recognition that magnetic resonance (MR) and positron emission tomography (PET)-based treatment planning and response assessment are essential components of clinical practice and furthermore offer the potential of quantitative analysis being important. Extracting the greatest benefit from these imaging techniques will require refining the best combinations of multimodality imaging through well-designed clinical trials that use robust image-analysis tools and require substantial computer based infrastructure. Through these changes and enhancements, image-based biomarkers will enhance clinical decision making and accelerate the progress that is made through clinical trial research.
Collapse
|
Research Support, N.I.H., Extramural |
14 |
11 |
44
|
Levy MA, Freymann JB, Kirby JS, Fedorov A, Fennessy FM, Eschrich SA, Berglund AE, Fenstermacher DA, Tan Y, Guo X, Casavant TL, Brown BJ, Braun TA, Dekker A, Roelofs E, Mountz JM, Boada F, Laymon C, Oborski M, Rubin DL. Informatics methods to enable sharing of quantitative imaging research data. Magn Reson Imaging 2012; 30:1249-56. [PMID: 22770688 PMCID: PMC3466343 DOI: 10.1016/j.mri.2012.04.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Revised: 04/16/2012] [Accepted: 04/18/2012] [Indexed: 10/28/2022]
Abstract
INTRODUCTION The National Cancer Institute Quantitative Research Network (QIN) is a collaborative research network whose goal is to share data, algorithms and research tools to accelerate quantitative imaging research. A challenge is the variability in tools and analysis platforms used in quantitative imaging. Our goal was to understand the extent of this variation and to develop an approach to enable sharing data and to promote reuse of quantitative imaging data in the community. METHODS We performed a survey of the current tools in use by the QIN member sites for representation and storage of their QIN research data including images, image meta-data and clinical data. We identified existing systems and standards for data sharing and their gaps for the QIN use case. We then proposed a system architecture to enable data sharing and collaborative experimentation within the QIN. RESULTS There are a variety of tools currently used by each QIN institution. We developed a general information system architecture to support the QIN goals. We also describe the remaining architecture gaps we are developing to enable members to share research images and image meta-data across the network. CONCLUSIONS As a research network, the QIN will stimulate quantitative imaging research by pooling data, algorithms and research tools. However, there are gaps in current functional requirements that will need to be met by future informatics development. Special attention must be given to the technical requirements needed to translate these methods into the clinical research workflow to enable validation and qualification of these novel imaging biomarkers.
Collapse
|
Research Support, N.I.H., Extramural |
13 |
11 |
45
|
Walls WD, Moteki H, Thomas TR, Nishio SY, Yoshimura H, Iwasa Y, Frees KL, Nishimura CJ, Azaiez H, Booth KT, Marini RJ, Kolbe DL, Weaver AM, Schaefer AM, Wang K, Braun TA, Usami SI, Barr-Gillespie PG, Richardson GP, Smith RJ, Casavant TL. A comparative analysis of genetic hearing loss phenotypes in European/American and Japanese populations. Hum Genet 2020; 139:1315-1323. [PMID: 32382995 DOI: 10.1007/s00439-020-02174-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 04/29/2020] [Indexed: 01/04/2023]
Abstract
We present detailed comparative analyses to assess population-level differences in patterns of genetic deafness between European/American and Japanese cohorts with non-syndromic hearing loss. One thousand eighty-three audiometric test results (921 European/American and 162 Japanese) from members of 168 families (48 European/American and 120 Japanese) with non-syndromic hearing loss secondary to pathogenic variants in one of three genes (KCNQ4, TECTA, WFS1) were studied. Audioprofile characteristics, specific mutation types, and protein domains were considered in the comparative analyses. Our findings support differences in audioprofiles driven by both mutation type (non-truncating vs. truncating) and ethnic background. The former finding confirms data that ascribe a phenotypic consequence to different mutation types in KCNQ4; the latter finding suggests that there are ethnic-specific effects (genetic and/or environmental) that impact gene-specific audioprofiles for TECTA and WFS1. Identifying the drivers of ethnic differences will refine our understanding of phenotype-genotype relationships and the biology of hearing and deafness.
Collapse
|
Journal Article |
5 |
9 |
46
|
Eppsteiner RW, Shearer AE, Hildebrand MS, Taylor KR, Deluca AP, Scherer S, Huygen P, Scheetz TE, Braun TA, Casavant TL, Smith RJH. Using the phenome and genome to improve genetic diagnosis for deafness. Otolaryngol Head Neck Surg 2012; 147:975-7. [PMID: 22785243 DOI: 10.1177/0194599812454271] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
Research Support, N.I.H., Extramural |
13 |
8 |
47
|
Braun TA, Shankar SP, Davis S, O'Leary B, Scheetz TE, Clark AF, Sheffield VC, Casavant TL, Stone EM. Prioritizing regions of candidate genes for efficient mutation screening. Hum Mutat 2006; 27:195-200. [PMID: 16395665 DOI: 10.1002/humu.20247] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The availability of the complete sequence of the human genome has dramatically facilitated the search for disease-causing sequence variations. In fact, the rate-limiting step has shifted from the discovery and characterization of candidate genes to the actual screening of human populations and the subsequent interpretation of observed variations. In this study we tested the hypothesis that some segments of candidate genes are more likely than others to contain disease-causing variations and that these segments can be predicted bioinformatically. A bioinformatic technique, prioritization of annotated regions (PAR), was developed to predict the likelihood that a specific coding region of a gene will harbor a disease-causing mutation based on conserved protein functional domains and protein secondary structures. This method was evaluated by using it to analyze 710 genes that collectively harbor 4,498 previously identified mutations. Nearly 50% of the genes were recognized as disease-associated after screening only 9% of the complete coding sequence. The PAR technique identified 90% of the genes as containing at least one mutation, with less than 40% of the screening resources that traditional approaches would require. These results suggest that prioritization strategies such as PAR can accelerate disease-gene identification through more efficient use of screening resources.
Collapse
|
Research Support, Non-U.S. Gov't |
19 |
8 |
48
|
Taylor KR, Booth KT, Azaiez H, Sloan CM, Kolbe DL, Glanz EN, Shearer AE, DeLuca AP, Anand VN, Hildebrand MS, Simpson AC, Eppsteiner RW, Scheetz TE, Braun TA, Huygen PLM, Smith RJH, Casavant TL. Audioprofile Surfaces: The 21st Century Audiogram. Ann Otol Rhinol Laryngol 2015; 125:361-8. [PMID: 26530094 DOI: 10.1177/0003489415614863] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
OBJECTIVE To present audiometric data in 3 dimensions by considering age as an addition dimension. METHODS Audioprofile surfaces (APSs) were fitted to a set of audiograms by plotting each measurement of an audiogram as an independent point in 3 dimensions with the x, y, and z axes representing frequency, hearing loss in dB, and age, respectively. RESULTS Using the Java-based APS viewer as a standalone application, APSs were pre-computed for 34 loci. By selecting APSs for the appropriate genetic locus, a clinician can compare this APS-generated average surface to a specific patient's audiogram. CONCLUSION Audioprofile surfaces provide an easily interpreted visual representation of a person's hearing acuity relative to others with the same genetic cause of hearing loss. Audioprofile surfaces will support the generation and testing of sophisticated hypotheses to further refine our understanding of the biology of hearing.
Collapse
|
Research Support, Non-U.S. Gov't |
10 |
7 |
49
|
Corrigan RA, Qi G, Thiel AC, Lynn JR, Walker BD, Casavant TL, Lagardere L, Piquemal JP, Ponder JW, Ren P, Schnieders MJ. Implicit Solvents for the Polarizable Atomic Multipole AMOEBA Force Field. J Chem Theory Comput 2021; 17:2323-2341. [PMID: 33769814 DOI: 10.1021/acs.jctc.0c01286] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Computational protein design, ab initio protein/RNA folding, and protein-ligand screening can be too computationally demanding for explicit treatment of solvent. For these applications, implicit solvent offers a compelling alternative, which we describe here for the polarizable atomic multipole AMOEBA force field based on three treatments of continuum electrostatics: numerical solutions to the nonlinear and linearized versions of the Poisson-Boltzmann equation (PBE), the domain-decomposition conductor-like screening model (ddCOSMO) approximation to the PBE, and the analytic generalized Kirkwood (GK) approximation. The continuum electrostatics models are combined with a nonpolar estimator based on novel cavitation and dispersion terms. Electrostatic model parameters are numerically optimized using a least-squares style target function based on a library of 103 small-molecule solvation free energy differences. Mean signed errors for the adaptive Poisson-Boltzmann solver (APBS), ddCOSMO, and GK models are 0.05, 0.00, and 0.00 kcal/mol, respectively, while the mean unsigned errors are 0.70, 0.63, and 0.58 kcal/mol, respectively. Validation of the electrostatic response of the resulting implicit solvents, which are available in the Tinker (or Tinker-HP), OpenMM, and Force Field X software packages, is based on comparisons to explicit solvent simulations for a series of proteins and nucleic acids. Overall, the emergence of performative implicit solvent models for polarizable force fields opens the door to their use for folding and design applications.
Collapse
|
|
4 |
7 |
50
|
Laffin JJS, Scheetz TE, Bonaldo MDF, Reiter RS, Chang S, Eyestone M, Abdulkawy H, Brown B, Roberts C, Tack D, Kucaba T, Lin JJC, Sheffield VC, Casavant TL, Soares MB. A comprehensive nonredundant expressed sequence tag collection for the developing Rattus norvegicus heart. Physiol Genomics 2004; 17:245-52. [PMID: 14762174 DOI: 10.1152/physiolgenomics.00186.2003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Congenital heart defects affect ∼1,000,000 people in the United States, with 40,000 new births contributing to that number every year. A large percentage of these defects can be attributed to septal defects. We assembled a nonredundant collection of over 12,000 expressed sequence tags (ESTs) from a total of 30,000 ESTs, with the ultimate goal of identifying spatially and/or temporally regulated genes during heart septation. These ESTs were compiled from nonnormalized, normalized, and serially subtracted cDNA libraries derived from two sets of tissue samples. The first includes microdissected rat hearts from embryonic (E) days E13, E15, and E16.5–E18.5 and adult heart. The second includes hearts from embryonic days E17, E19, and E21 and postnatal (P) days P1, P12, P74, and P200. Over 6,000 novel ESTs were identified in the libraries derived from these two sets of tissues, all of which have been contributed to the NCBI rat UniGene collection. It is anticipated that such EST and cDNA clone resources will prove invaluable to gene expression studies aimed at the understanding of the molecular mechanisms underlying heart septation defects.
Collapse
|
|
21 |
7 |