101
|
Abstract
This chapter introduces the problem of ancestral sequence reconstruction: given a set of extant orthologous DNA genomic sequences (or even whole-genomes), together with a phylogenetic tree relating these sequences, predict the DNA sequence of all ancestral species in the tree. Blanchette et al. (1) have shown that for certain sets of species (in particular, for eutherian mammals), very accurate reconstruction can be obtained. We explain the main steps involved in this process, including multiple sequence alignment, insertion and deletion inference, substitution inference, and gene arrangement inference. We also describe a simulation-based procedure to assess the accuracy of the reconstructed sequences. The whole reconstruction process is illustrated using a set of mammalian sequences from the CFTR region.
Collapse
|
102
|
Abstract
The UC Santa Cruz Genome Browser provides a number of resources that can be used for phylogenomic studies, including (1) whole-genome sequence data from a number of vertebrate species, (2) pairwise alignments of the human genome sequence to a number of other vertebrate genome, (3) a simultaneous alignment of 17 vertebrate genomes (most of them incompletely sequenced) that covers all of the human sequence, (4) several independent sets of multiple alignments covering 1% of the human genome (ENCODE regions), (5) extensive sequence annotation for interpreting those sequences and alignments, and (6) sequence, alignments, and annotations from certain other species, including an alignment of nine insect genomes. We illustrate the use of these resources in the context of assigning rare genomic changes to the branch of the phylogenetic tree where they appear to have occurred, or of looking for evidence supporting a particular possible tree topology. Sample source code for performing such studies is available.
Collapse
|
103
|
Abstract
The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.edu/.
Collapse
|
104
|
28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genes Dev 2007; 17:1797-808. [PMID: 17984227 PMCID: PMC2099589 DOI: 10.1101/gr.6761107] [Citation(s) in RCA: 207] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 08/30/2007] [Indexed: 01/17/2023]
Abstract
This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.
Collapse
|
105
|
Recommendations from an international expert panel on the use of neoadjuvant (primary) systemic treatment of operable breast cancer: new perspectives 2006. Ann Oncol 2007; 18:1927-34. [DOI: 10.1093/annonc/mdm201] [Citation(s) in RCA: 296] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
106
|
Abstract
A full understanding of primate morphological and genomic evolution requires the identification of their closest living relative. In order to resolve the ancestral relationships among primates and their closest relatives, we searched multispecies genome alignments for phylogenetically informative rare genomic changes within the superordinal group Euarchonta, which includes the orders Primates, Dermoptera (colugos), and Scandentia (treeshrews). We also constructed phylogenetic trees from 14 kilobases of nuclear genes for representatives from most major primate lineages, both extant colugos, and multiple treeshrews, including the pentail treeshrew, Ptilocercus lowii, the only living member of the family Ptilocercidae. A relaxed molecular clock analysis including Ptilocercus suggests that treeshrews arose approximately 63 million years ago. Our data show that colugos are the closest living relatives of primates and indicate that their divergence occurred in the Cretaceous.
Collapse
|
107
|
|
108
|
Abstract
Although the application of sequencing-by-synthesis techniques to DNA extracted from bones has revolutionized the study of ancient DNA, it has been plagued by large fractions of contaminating environmental DNA. The genetic analyses of hair shafts could be a solution: We present 10 previously unexamined Siberian mammoth (Mammuthus primigenius) mitochondrial genomes, sequenced with up to 48-fold coverage. The observed levels of damage-derived sequencing errors were lower than those observed in previously published frozen bone samples, even though one of the specimens was >50,000 14C years old and another had been stored for 200 years at room temperature. The method therefore sets the stage for molecular-genetic analysis of museum collections.
Collapse
|
109
|
Lattice gas modeling of nanowhisker growth. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 76:031601. [PMID: 17930250 DOI: 10.1103/physreve.76.031601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2007] [Indexed: 05/25/2023]
Abstract
Building upon the ideas of Gerischer et al., we have developed a cellular automaton for the growth dynamics of nanowhiskers. We present two models for the whisker growth. The first is a simple extension of the surface model, whereas the second includes diffusion on the rim of the whiskers. Results for one-dimensional calculations are presented and discussed, together with a comparison between the two models and with experimental results as well.
Collapse
|
110
|
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res 2007; 17:760-74. [PMID: 17567995 PMCID: PMC1891336 DOI: 10.1101/gr.6034307] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.
Collapse
|
111
|
Abstract
Identification of functional genomic regions using interspecies comparison will be most effective when the full span of relationships between genomic function and evolutionary constraint are utilized. We find that sets of putative transcriptional regulatory sequences, defined by ENCODE experimental data, have a wide span of evolutionary histories, ranging from stringent constraint shown by deep phylogenetic comparisons to recent selection on lineage-specific elements. This diversity of evolutionary histories can be captured, at least in part, by the suite of available comparative genomics tools, especially after correction for regional differences in the neutral substitution rate. Putative transcriptional regulatory regions show alignability in different clades, and the genes associated with them are enriched for distinct functions. Some of the putative regulatory regions show evidence for recent selection, including a primate-specific, distal promoter that may play a novel role in regulation.
Collapse
|
112
|
A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res 2007; 17:960-4. [PMID: 17568012 PMCID: PMC1891355 DOI: 10.1101/gr.5578007] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2(ENCODE), that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2(ENCODE) allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2(ENCODE) to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2(ENCODE) with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2(ENCODE) and the screencasts can be accessed at http://g2.bx.psu.edu.
Collapse
|
113
|
Performing armchair roundoff analyses of statistical algorithms. COMMUN STAT-SIMUL C 2007. [DOI: 10.1080/03610917808812074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
114
|
Ratiometric dosing of irinotecan (IRI) and floxuridine (FLOX) in a phase I trial: A new approach for enhancing the activity of combination chemotherapy. J Clin Oncol 2007. [DOI: 10.1200/jco.2007.25.18_suppl.2549] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
2549 Background: Like many pairs of chemotherapy agents, the combination of IRI and FLOX displays ratio-dependent activity in vitro. CPX-1, a liposome formulation of IRI:FLOX, was developed to maintain a synergistic 1:1 molar ratio in vivo, was highly active in preclinical models, and was evaluated in a phase 1 trial (CLTR0104–101). Methods: Doses were escalated from 30U/m2 (1U= 1 mg IRI + 0.36 mg FLOX) to 270 U/m2 given on day 1 and 15 of each 28 day cycle. Adult patients (pts) with advanced solid tumors, ECOG PS<2, adequate bone marrow, liver, and renal function were eligible; 4 pts per cohort. After defining the MTD, additional pts with CRC were enrolled (extension phase). IRI completed greater than 12 months prior to this trial was allowed in the absence of resistance to IRI. PK was done on day 1 and 15 of the 1st cycle. Results: Safety: The dose escalation phase enrolled 24 pts in 6 cohorts and added 2 pts in the 5th cohort (210U/m2; the MTD) after noting dose limiting diarrhea (3 pts) and neutropenia (1 pt) including one death from dehydration and renal failure due to prolonged diarrhea (gr3) & vomiting (gr2) at 270U/m2. An additional 7 pts with CRC received 210U/m2 in the extension phase. Grade 3/4 adverse events included diarrhea, nausea, vomiting, neutropenia and thrombocytopenia with most occurring at 270U/m2. No new toxicities were observed for this combination. Response: 30/33 pts were evaluable with 2 confirmed PRs (NSCLC and CRC), 21 SD and 7 PD. Median PFS was 5.4 mos. (0.3–11.8 mos.) in 15 pts w/CRC. PK: All pts maintained synergistic plasma IRI:FLOX ratios for 24h. IRI and FLOX AUCs (0-inf) were greater for CPX-1 than expected for conventional drugs. AUCs for SN-38 and 5FU at 210U/m2 were 0.8 ± 0.1 and 10 ± 8.7 μg-hr/mL, respectively, indicating bioavailability for both drugs. Conclusion: CPX-1 was well tolerated in the outpatient setting and evidence of anti-tumor activity was obtained. This is the first clinical evaluation of ratiometric dosing in which a synergistic drug ratio, pre-selected in vitro based on optimal anti-tumor activity, was maintained systemically to enhance therapeutic benefit. [Table: see text]
Collapse
|
115
|
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007; 447:799-816. [PMID: 17571346 PMCID: PMC2212820 DOI: 10.1038/nature05874] [Citation(s) in RCA: 3782] [Impact Index Per Article: 222.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Collapse
|
116
|
SU-FF-T-384: Statistical Analysis of a System for Radiation Treatment Positioning Accuracy. Med Phys 2007. [DOI: 10.1118/1.2761109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
117
|
In-hospital Course of Initial CPR Survivors. Acad Emerg Med 2007. [DOI: 10.1197/j.aem.2007.03.1188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
118
|
Delirium Tremens: An Analysis of Factors Associated with Mortality. Acad Emerg Med 2007. [DOI: 10.1197/j.aem.2007.03.834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
119
|
Abstract
The completion of the draft sequence of the rhesus macaque genome allowed us to study the genomic composition and evolution of transposable elements in this representative of the Old World monkey lineage, a group of diverse primates closely related to humans. The L1 family of long interspersed elements appears to have evolved as a single lineage, and Alu elements have evolved into four currently active lineages. We also found evidence of elevated horizontal transmissions of retroviruses and the absence of DNA transposon activity in the Old World monkey lineage. In addition, approximately 100 precursors of composite SVA (short interspersed element, variable number of tandem repeat, and Alu) elements were identified, with the majority being shared by the common ancestor of humans and rhesus macaques. Mobile elements compose roughly 50% of primate genomes, and our findings illustrate their diversity and strong influence on genome evolution between closely related species.
Collapse
|
120
|
Abstract
The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
Collapse
|
121
|
Using genomic data to unravel the root of the placental mammal phylogeny. Genes Dev 2007; 17:413-21. [PMID: 17322288 PMCID: PMC1832088 DOI: 10.1101/gr.5918807] [Citation(s) in RCA: 316] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2006] [Accepted: 12/20/2006] [Indexed: 11/24/2022]
Abstract
The phylogeny of placental mammals is a critical framework for choosing future genome sequencing targets and for resolving the ancestral mammalian genome at the nucleotide level. Despite considerable recent progress defining superordinal relationships, several branches remain poorly resolved, including the root of the placental tree. Here we analyzed the genome sequence assemblies of human, armadillo, elephant, and opossum to identify informative coding indels that would serve as rare genomic changes to infer early events in placental mammal phylogeny. We also expanded our species sampling by including sequence data from >30 ongoing genome projects, followed by PCR and sequencing validation of each indel in additional taxa. Our data provide support for a sister-group relationship between Afrotheria and Xenarthra (the Atlantogenata hypothesis), which is in turn the sister-taxon to Boreoeutheria. We failed to recover any indels in support of a basal position for Xenarthra (Epitheria), which is suggested by morphology and a recent retroposon analysis, or a hypothesis with Afrotheria basal (Exafricoplacentalia), which is favored by phylogenetic analysis of large nuclear gene data sets. In addition, we identified two retroposon insertions that also support Atlantogenata and none for the alternative hypotheses. A revised molecular timescale based on these phylogenetic inferences suggests Afrotheria and Xenarthra diverged from other placental mammals approximately 103 (95-114) million years ago. We discuss the impacts of this topology on earlier phylogenetic reconstructions and repeat-based inferences of phylogeny.
Collapse
|
122
|
Comparative genomics to find function in noncoding DNA. Blood Cells Mol Dis 2007. [DOI: 10.1016/j.bcmd.2006.10.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
123
|
Validation of predicted erythroid cis-regulatory modules. Blood Cells Mol Dis 2007. [DOI: 10.1016/j.bcmd.2006.10.158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
124
|
HbVar database for human hemoglobin variants and thalassemia mutations. Blood Cells Mol Dis 2007. [DOI: 10.1016/j.bcmd.2006.10.109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
125
|
Abstract
HbVar (http://globin.bx.psu.edu/hbvar) is a locus-specific database (LSDB) developed in 2001 by a multi-center academic effort to provide timely information on the genomic sequence changes leading to hemoglobin variants and all types of thalassemia and hemoglobinopathies. Database records include extensive phenotypic descriptions, biochemical and hematological effects, associated pathology, and ethnic occurrence, accompanied by mutation frequencies and references. In addition to the regular updates to entries, we report significant advances and updates, which can be useful not only for HbVar users but also for other LSDB development and curation in general. The query page provides more functionality but in a simpler, more user-friendly format and known single nucleotide polymorphisms in the human alpha- and beta-globin loci are provided automatically. Population-specific beta-thalassemia mutation frequencies for 31 population groups have been added and/or modified and the previously reported delta- and alpha-thalassemia mutation frequency data from 10 population groups have also been incorporated. In addition, an independent flat-file database, named XPRbase (http://www.goldenhelix.org/xprbase), has been developed and linked to the main HbVar web page to provide a succinct listing of 51 experimental protocols available for globin gene mutation screening. These updates significantly augment the database profile and quality of information provided, which should increase the already high impact of the HbVar database, while its combination with the UCSC powerful genome browser and the ITHANET web portal paves the way for drawing connections of clinical importance, that is from genome to function to phenotype.
Collapse
|
126
|
Abstract
PhenCode (Phenotypes for ENCODE; http://www.bx.psu.edu/phencode) is a collaborative, exploratory project to help understand phenotypes of human mutations in the context of sequence and functional data from genome projects. Currently, it connects human phenotype and clinical data in various locus-specific databases (LSDBs) with data on genome sequences, evolutionary history, and function from the ENCODE project and other resources in the UCSC Genome Browser. Initially, we focused on a few selected LSDBs covering genes encoding alpha- and beta-globins (HBA, HBB), phenylalanine hydroxylase (PAH), blood group antigens (various genes), androgen receptor (AR), cystic fibrosis transmembrane conductance regulator (CFTR), and Bruton's tyrosine kinase (BTK), but we plan to include additional loci of clinical importance, ultimately genomewide. We have also imported variant data and associated OMIM links from Swiss-Prot. Users can find interesting mutations in the UCSC Genome Browser (in a new Locus Variants track) and follow links back to the LSDBs for more detailed information. Alternatively, they can start with queries on mutations or phenotypes at an LSDB and then display the results at the Genome Browser to view complementary information such as functional data (e.g., chromatin modifications and protein binding from the ENCODE consortium), evolutionary constraint, regulatory potential, and/or any other tracks they choose. We present several examples illustrating the power of these connections for exploring phenotypes associated with functional elements, and for identifying genomic data that could help to explain clinical phenotypes.
Collapse
|
127
|
Abstract
Comparative analysis of DNA sequence from multiple species can provide insights into the function and evolutionary processes that shape genomes. The University of California Santa Cruz (UCSC) Genome Bioinformatics group has developed several tools and methodologies in its study of comparative genomics, many of which have been incorporated into the UCSC Genome Browser (http://genome.ucsc.edu), an easy-to-use online tool for browsing genomic data and aligned annotation "tracks" in a single window. The comparative genomics annotations in the browser include pairwise alignments, which aid in the identification of orthologous regions between species, and conservation tracks that show measures of evolutionary conservation among sets of multiply aligned species, highlighting regions of the genome that may be functionally important. A related tool, the UCSC Table Browser, provides a simple interface for querying, analyzing, and downloading the data underlying the Genome Browser annotation tracks. Here, we describe a procedure for examining a genomic region of interest in the Genome Browser, analyzing characteristics of the region, filtering the data, and downloading data sets for further study.
Collapse
|
128
|
|
129
|
Experimental validation of predicted mammalian erythroid cis-regulatory modules. Genes Dev 2006; 16:1480-92. [PMID: 17038566 PMCID: PMC1665632 DOI: 10.1101/gr.5353806] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2006] [Accepted: 06/07/2006] [Indexed: 11/25/2022]
Abstract
Multiple alignments of genome sequences are helpful guides to functional analysis, but predicting cis-regulatory modules (CRMs) accurately from such alignments remains an elusive goal. We predict CRMs for mammalian genes expressed in red blood cells by combining two properties gleaned from aligned, noncoding genome sequences: a positive regulatory potential (RP) score, which detects similarity to patterns in alignments distinctive for regulatory regions, and conservation of a binding site motif for the essential erythroid transcription factor GATA-1. Within eight target loci, we tested 75 noncoding segments by reporter gene assays in transiently transfected human K562 cells and/or after site-directed integration into murine erythroleukemia cells. Segments with a high RP score and a conserved exact match to the binding site consensus are validated at a good rate (50%-100%, with rates increasing at higher RP), whereas segments with lower RP scores or nonconsensus binding motifs tend to be inactive. Active DNA segments were shown to be occupied by GATA-1 protein by chromatin immunoprecipitation, whereas sites predicted to be inactive were not occupied. We verify four previously known erythroid CRMs and identify 28 novel ones. Thus, high RP in combination with another feature of a CRM, such as a conserved transcription factor binding site, is a good predictor of functional CRMs. Genome-wide predictions based on RP and a large set of well-defined transcription factor binding sites are available through servers at http://www.bx.psu.edu/.
Collapse
|
130
|
ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements. Genome Res 2006; 16:1596-604. [PMID: 17053093 PMCID: PMC1665643 DOI: 10.1101/gr.4537706] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Genomic sequence signals - such as base composition, presence of particular motifs, or evolutionary constraint - have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy ( approximately 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).
Collapse
|
131
|
Abstract
This article analyzes mammalian genome rearrangements at higher resolution than has been published to date. We identify 3171 intervals, covering approximately 92% of the human genome, within which we find no rearrangements larger than 50 kilobases (kb) in the lineages leading to human, mouse, rat, and dog from their most recent common ancestor. Combining intervals that are adjacent in all contemporary species produces 1338 segments that may contain large insertions or deletions but that are free of chromosome fissions or fusions as well as inversions or translocations >50 kb in length. We describe a new method for predicting the ancestral order and orientation of those intervals from their observed adjacencies in modern species. We combine the results from this method with data from chromosome painting experiments to produce a map of an early mammalian genome that accounts for 96.8% of the available human genome sequence data. The precision is further increased by mapping inversions as small as 31 bp. Analysis of the predicted evolutionary breakpoints in the human lineage confirms certain published observations but disagrees with others. Although only a few mammalian genomes are currently sequenced to high precision, our theoretical analyses and computer simulations indicate that our results are reasonably accurate and that they will become highly accurate in the foreseeable future. Our methods were developed as part of a project to reconstruct the genome sequence of the last ancestor of human, dogs, and most other placental mammals.
Collapse
|
132
|
Abstract
Although ancient DNA (aDNA) miscoding lesions have been studied since the earliest days of the field, their nature remains a source of debate. A variety of conflicting hypotheses exist about which miscoding lesions constitute true aDNA damage as opposed to PCR polymerase amplification error. Furthermore, considerable disagreement and speculation exists on which specific damage events underlie observed miscoding lesions. The root of the problem is that it has previously been difficult to assemble sufficient data to test the hypotheses, and near-impossible to accurately determine the specific strand of origin of observed damage events. With the advent of emulsion-based clonal amplification (emPCR) and the sequencing-by-synthesis technology this has changed. In this paper we demonstrate how data produced on the Roche GS20 genome sequencer can determine miscoding lesion strands of origin, and subsequently be interpreted to enable characterization of the aDNA damage behind the observed phenotypes. Through comparative analyses on 390,965 bp of modern chloroplast and 131,474 bp of ancient woolly mammoth GS20 sequence data we conclusively demonstrate that in this sample at least, a permafrost preserved specimen, Type 2 (cytosine-->thymine/guanine-->adenine) miscoding lesions represent the overwhelming majority of damage-derived miscoding lesions. Additionally, we show that an as yet unidentified guanine-->adenine analogue modification, not the conventionally argued cytosine-->uracil deamination, underpins a significant proportion of Type 2 damage. How widespread these implications are for aDNA will become apparent as future studies analyse data recovered from a wider range of substrates.
Collapse
|
133
|
Phase 1 study of CPX-1, a fixed ratio formulation of irinotecan (IRI) and floxuridine (FLOX), in patients with advanced solid tumors. J Clin Oncol 2006. [DOI: 10.1200/jco.2006.24.18_suppl.2014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
2014 Background: In vitro studies have shown that varying the ratio of individual agents in drug combinations can result in synergistic, additive or antagonistic activity against tumor cells. CPX-1 is a liposomal formulation of IRI and FLOX in a fixed 1:1 molar ratio which was selected as optimal in vitro and confirmed to be synergistic in vivo in preclinical tumor models. CPX-1 overcomes the dissimilar pharmacokinetics (PK) of the individual drugs, enables sustained maintenance of this ratio after IV administration, and was evaluated in a Phase I open-label, dose-escalation study. Methods: Starting dose was 30 U/m2 (1 Unit of CPX-1 contains 1 mg IRI + 0.36 mg FLOX) given on day 1 and 15 of each 28-day cycle. Dose escalation was by modified Fibonacci with 4 subjects/cohort. Eligibility included: ≥ 18 yo; advanced solid tumor; ECOG PS ≤ 2; adequate bone marrow/liver/renal function. PK analysis was done on day 1 and 15 of the first cycle. Results: 26 subjects (16M:10F), median age 54.5 y (21–72), all with prior therapy, enrolled in 6 cohorts with the 5th cohort expanded to 6 subjects. Diagnoses: 8 colorectal, 3 pancreatic, 3 ovarian, 2 breast, 2 gastric, 2 esophageal, 2 sarcomas, 1 renal cell, 1 prostate, 1 NSCLC and 1 sphenoid sinus. Response: 20 subjects evaluable: 2 confirmed PRs (NSCLC 8+ wks; Colon 13+ wks, in a patient with prior IRI exposure) and 13 with SD (8–24+wks). Safety: DLTs were observed at the 6th dose level: 4 subjects with DLTs: 3 diarrhea (one resulting in death due to dehydration/ARF) and one neutropenia. Other possibly related grade 3 and 4 events included one each of: grade 3 diarrhea, grade 3 vomiting, grade 3 neutropenia, grade 3 fatigue, grade 3 compression fracture and arthralgia and pulmonary embolism grade 4. PK: In all 14 subjects analyzed to date the 1:1 molar ratio of IRI to FLOX was maintained for 24 hours and metabolites 5-FU and SN-38 were present in the plasma. Conclusions: CPX-1 represents a new approach to developing drug combinations in which drug ratios are pre-selected in vitro based on optimal antitumor activity and maintained systemically through pharmacokinetic control. Phase 2 studies are planned with a recommended dose of 210U/m2 of CPX-1. [Table: see text]
Collapse
|
134
|
TU-E-330D-01: TLD-100 Measurement and Assessment of Internal Mouse Dosimetry During Micro-CT Analysis. Med Phys 2006. [DOI: 10.1118/1.2241612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
|
135
|
Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol 2006; 2:e33. [PMID: 16628248 PMCID: PMC1440920 DOI: 10.1371/journal.pcbi.0020033] [Citation(s) in RCA: 406] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2005] [Accepted: 03/06/2006] [Indexed: 12/28/2022] Open
Abstract
The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization. Structurally functional RNA is a versatile component of the cell that comprises both independent molecules and regulatory elements of mRNA transcripts. The many recent discoveries of functional RNAs, most notably miRNAs, suggests that many more are yet to be found. Computational identification of functional RNAs has traditionally been hampered by the lack of strong sequence signals. However, structural conservation over long evolutionary times creates a characteristic substitution pattern, which can be exploited with the advent of comparative genomics. The authors have devised a method for identification of functional RNA structures based on phylogenetic analysis of multiple alignments. This method has been used to screen the regions of the human genome that are under strong selective constraints. The result is a set of 48,479 candidate RNA structures. For some classes of known functional RNAs, such as miRNAs and histone 3′UTR stem loops, this set includes nearly all deeply conserved members. The initial large candidate set has been partitioned by size, shape, and genomic location and ranked by score to produce specific lists of top candidates for miRNAs, selenocysteine insertion sites, RNA editing hairpins, and RNAs involved in transcript auto regulation.
Collapse
|
136
|
Eine randomisierte Phase III Studie einer Kombinationstherapie Carboplatin/Paclitaxel mit oder ohne Bexarotene (Targretin®) bei nicht vortherapierten Patienten mit fortgeschrittenem oder metastasiertem Nicht-kleinzelligem Bronchialkarzinom (NSCLC). Pneumologie 2006. [DOI: 10.1055/s-2006-933784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
137
|
Effects of letrozole and anastrozole on ductal carcinoma in situ (DCIS): results from a randomised trial. EJC Suppl 2006. [DOI: 10.1016/s1359-6349(06)80154-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
138
|
Tumour aromatase as measured by immunohistochemistry in patients treated neoadjuvantly with either letrozole or tamoxifen in the P024 randomised trial—correlations with other biomarkers. EJC Suppl 2006. [DOI: 10.1016/s1359-6349(06)80346-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
139
|
Abstract
We sequenced 28 million base pairs of DNA in a metagenomics approach, using a woolly mammoth (Mammuthus primigenius) sample from Siberia. As a result of exceptional sample preservation and the use of a recently developed emulsion polymerase chain reaction and pyrosequencing technique, 13 million base pairs (45.4%) of the sequencing reads were identified as mammoth DNA. Sequence identity between our data and African elephant (Loxodonta africana) was 98.55%, consistent with a paleontologically based divergence date of 5 to 6 million years. The sample includes a surprisingly small diversity of environmental DNAs. The high percentage of endogenous DNA recoverable from this single mammoth would allow for completion of its genome, unleashing the field of paleogenomics.
Collapse
|
140
|
Phase-field lattice kinetic scheme for the numerical simulation of dendritic growth. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 72:066705. [PMID: 16486096 DOI: 10.1103/physreve.72.066705] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2005] [Indexed: 05/06/2023]
Abstract
A phase-field lattice kinetic model is presented for the numerical simulation of the dendritic growth of a pure crystal in the presence of thermal transport. A finite-difference scheme for the phase field is combined with an explicit lattice kinetic scheme for the temperature field. The resulting scheme is advanced in time with an adaptive time-marching procedure which permits us to achieve long simulation times with larger time steps than explicit finite-difference and previous kinetic methods. The method is demonstrated for the case of dendritic growth of a single crystal over a wide range of Stefan and capillarity numbers.
Collapse
|
141
|
Abstract
Accessing and analyzing the exponentially expanding genomic sequence and functional data pose a challenge for biomedical researchers. Here we describe an interactive system, Galaxy, that combines the power of existing genome annotation databases with a simple Web portal to enable users to search remote resources, combine data from independent queries, and visualize the results. The heart of Galaxy is a flexible history system that stores the queries from each user; performs operations such as intersections, unions, and subtractions; and links to other computational tools. Galaxy can be accessed at http://g2.bx.psu.edu.
Collapse
|
142
|
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 2005; 15:1034-50. [PMID: 16024819 PMCID: PMC1182216 DOI: 10.1101/gr.3715005] [Citation(s) in RCA: 2776] [Impact Index Per Article: 146.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2005] [Accepted: 06/02/2005] [Indexed: 11/24/2022]
Abstract
We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%-8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%-53%), Caenorhabditis elegans (18%-37%), and Saccharaomyces cerevisiae (47%-68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3' UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.
Collapse
|
143
|
Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res 2005; 15:1051-60. [PMID: 16024817 PMCID: PMC1182217 DOI: 10.1101/gr.3642605] [Citation(s) in RCA: 172] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Techniques of comparative genomics are being used to identify candidate functional DNA sequences, and objective evaluations are needed to assess their effectiveness. Different analytical methods score distinctive features of whole-genome alignments among human, mouse, and rat to predict functional regions. We evaluated three of these methods for their ability to identify the positions of known regulatory regions in the well-studied HBB gene complex. Two methods, multispecies conserved sequences and phastCons, quantify levels of conservation to estimate a likelihood that aligned DNA sequences are under purifying selection. A third function, regulatory potential (RP), measures the similarity of patterns in the alignments to those in known regulatory regions. The methods can correctly identify 50%-60% of noncoding positions in the HBB gene complex as regulatory or nonregulatory, with RP performing better than do other methods. When evaluated by the ability to discriminate genomic intervals, RP reaches a sensitivity of 0.78 and a true discovery rate of approximately 0.6. The performance is better on other reference sets; both phastCons and RP scores can capture almost all regulatory elements in those sets along with approximately 7% of the human genome.
Collapse
|
144
|
Risk of silicosis in cohorts of Chinese tin and tungsten miners and pottery workers (II): Workplace-specific silica particle surface composition. Am J Ind Med 2005; 48:10-5. [PMID: 15940714 DOI: 10.1002/ajim.20175] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
BACKGROUND It is hypothesized that surface occlusion by alumino-silicate affects the toxic activity of silica particles in respirable dust. In conjunction with an epidemiological investigation of silicosis disease risk in Chinese tin and tungsten mine and pottery workplaces, we analyzed respirable silica dusts using a multiple-voltage scanning electron microscopy-energy dispersive X-ray spectroscopy (MVSEM-EDS). METHODS Forty-seven samples of respirable sized dust were collected on filters from 13 worksites and were analyzed by MVSEM-EDS using high (20 keV) and low (5 keV) electron beam accelerating voltages. Changes in the silicon-to-aluminum X-ray line intensity ratio between the two voltages are compared particle-by-particle with the 90th percentile value of the same measurements for a ground glass homogeneous control sample. This provides an index that distinguishes a silica particle that is homogeneously aluminum-contaminated from a clay-coated silica particle. RESULTS The average sample percentages of respirable-sized silica particles alumino-silicate occlusion were: 45% for potteries, 18% for tin mines, and 13% for tungsten mines. The difference between the pottery and the metal mine worksites accounted for one third of an overall chi-square statistic for differences in change in measured silicon fraction between the samples. CONCLUSION The companion epidemiological study found lower silicosis risk per unit cumulative respirable silica dust exposure for pottery workers compared to metal miners. Using these surface analysis results resolves differences in risk when exposure is normalized to cumulative respirable surface-available silica dust.
Collapse
|
145
|
Paclitaxel poliglumex (PPX) in combination with carboplatin (carb) for the first-line treatment of patients with advanced non-small cell lung cancer (NSCLC): Preliminary data. J Clin Oncol 2005. [DOI: 10.1200/jco.2005.23.16_suppl.7230] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
146
|
Generation and annotation of the DNA sequences of human chromosomes 2 and 4. Nature 2005; 434:724-31. [PMID: 15815621 DOI: 10.1038/nature03466] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2004] [Accepted: 02/11/2005] [Indexed: 12/27/2022]
Abstract
Human chromosome 2 is unique to the human lineage in being the product of a head-to-head fusion of two intermediate-sized ancestral chromosomes. Chromosome 4 has received attention primarily related to the search for the Huntington's disease gene, but also for genes associated with Wolf-Hirschhorn syndrome, polycystic kidney disease and a form of muscular dystrophy. Here we present approximately 237 million base pairs of sequence for chromosome 2, and 186 million base pairs for chromosome 4, representing more than 99.6% of their euchromatic sequences. Our initial analyses have identified 1,346 protein-coding genes and 1,239 pseudogenes on chromosome 2, and 796 protein-coding genes and 778 pseudogenes on chromosome 4. Extensive analyses confirm the underlying construction of the sequence, and expand our understanding of the structure and evolution of mammalian chromosomes, including gene deserts, segmental duplications and highly variant regions.
Collapse
MESH Headings
- Animals
- Base Composition
- Base Sequence
- Centromere/genetics
- Chromosomes, Human, Pair 2/genetics
- Chromosomes, Human, Pair 4/genetics
- Conserved Sequence/genetics
- CpG Islands/genetics
- Euchromatin/genetics
- Expressed Sequence Tags
- Gene Duplication
- Genetic Variation/genetics
- Genomics
- Humans
- Molecular Sequence Data
- Physical Chromosome Mapping
- Polymorphism, Genetic/genetics
- Primates/genetics
- Proteins/genetics
- Pseudogenes/genetics
- RNA, Messenger/analysis
- RNA, Messenger/genetics
- RNA, Untranslated/analysis
- RNA, Untranslated/genetics
- Recombination, Genetic/genetics
- Sequence Analysis, DNA
Collapse
|
147
|
An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proc Natl Acad Sci U S A 2005; 102:4795-800. [PMID: 15778292 PMCID: PMC555705 DOI: 10.1073/pnas.0409882102] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
With the recent completion of a high-quality sequence of the human genome, the challenge is now to understand the functional elements that it encodes. Comparative genomic analysis offers a powerful approach for finding such elements by identifying sequences that have been highly conserved during evolution. Here, we propose an initial strategy for detecting such regions by generating low-redundancy sequence from a collection of 16 eutherian mammals, beyond the 7 for which genome sequence data are already available. We show that such sequence can be accurately aligned to the human genome and used to identify most of the highly conserved regions. Although not a long-term substitute for generating high-quality genomic sequences from many mammalian species, this strategy represents a practical initial approach for rapidly annotating the most evolutionarily conserved sequences in the human genome, providing a key resource for the systematic study of human genome function.
Collapse
|
148
|
Improvements to GALA and dbERGE II: databases featuring genomic sequence alignment, annotation and experimental results. Nucleic Acids Res 2005; 33:D466-70. [PMID: 15608239 PMCID: PMC539999 DOI: 10.1093/nar/gki045] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2004] [Revised: 09/28/2004] [Accepted: 09/28/2004] [Indexed: 12/02/2022] Open
Abstract
We describe improvements to two databases that give access to information on genomic sequence similarities, functional elements in DNA and experimental results that demonstrate those functions. GALA, the database of Genome ALignments and Annotations, is now a set of interlinked relational databases for five vertebrate species, human, chimpanzee, mouse, rat and chicken. For each species, GALA records pairwise and multiple sequence alignments, scores derived from those alignments that reflect the likelihood of being under purifying selection or being a regulatory element, and extensive annotations such as genes, gene expression patterns and transcription factor binding sites. The user interface supports simple and complex queries, including operations such as subtraction and intersections as well as clustering and finding elements in proximity to features. dbERGE II, the database of Experimental Results on Gene Expression, contains experimental data from a variety of functional assays. Both databases are now run on the DB2 database management system. Improved hardware and tuning has reduced response times and increased querying capacity, while simplified query interfaces will help direct new users through the querying process. Links are available at http://www.bx.psu.edu/.
Collapse
|
149
|
Abstract
Large tracts of the human genome, known as gene deserts, are devoid of protein-coding genes. Dichotomy in their level of conservation with chicken separates these regions into two distinct categories, stable and variable. The separation is not caused by differences in rates of neutral evolution but instead appears to be related to different biological functions of stable and variable gene deserts in the human genome. Gene Ontology categories of the adjacent genes are strongly biased toward transcriptional regulation and development for the stable gene deserts, and toward distinctively different functions for the variable gene deserts. Stable gene deserts resist chromosomal rearrangements and appear to harbor multiple distant regulatory elements physically linked to their neighboring genes, with the linearity of conservation invariant throughout vertebrate evolution.
Collapse
|
150
|
Abstract
Multiple-sequence alignment analysis is a powerful approach for understanding phylogenetic relationships, annotating genes, and detecting functional regulatory elements. With a growing number of partly or fully sequenced vertebrate genomes, effective tools for performing multiple comparisons are required to accurately and efficiently assist biological discoveries. Here we introduce Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the TBA multi-aligner program for rapid identification of local sequence conservation, and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA. Local multiple alignments computed by Mulan ensure reliable representation of short- and large-scale genomic rearrangements in distant organisms. Mulan allows for interactive modification of critical conservation parameters to differentially predict conserved regions in comparisons of both closely and distantly related species. We illustrate the uses and applications of the Mulan tool through multispecies comparisons of the GATA3 gene locus and the identification of elements that are conserved in a different way in avians than in other genomes, allowing speculation on the evolution of birds. Source code for the aligners and the aligner-evaluation software can be freely downloaded from http://www.bx.psu.edu/miller_lab/.
Collapse
|