1
|
Genetic analysis of completely sequenced disease-associated MHC haplotypes identifies shuffling of segments in recent human history. PLoS Genet 2006; 2:e9. [PMID: 16440057 PMCID: PMC1331980 DOI: 10.1371/journal.pgen.0020009] [Citation(s) in RCA: 145] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2005] [Accepted: 12/13/2005] [Indexed: 11/23/2022] Open
Abstract
The major histocompatibility complex (MHC) is recognised as one of the most important genetic regions in relation to common human disease. Advancement in identification of MHC genes that confer susceptibility to disease requires greater knowledge of sequence variation across the complex. Highly duplicated and polymorphic regions of the human genome such as the MHC are, however, somewhat refractory to some whole-genome analysis methods. To address this issue, we are employing a bacterial artificial chromosome (BAC) cloning strategy to sequence entire MHC haplotypes from consanguineous cell lines as part of the MHC Haplotype Project. Here we present 4.25 Mb of the human haplotype QBL (HLA-A26-B18-Cw5-DR3-DQ2) and compare it with the MHC reference haplotype and with a second haplotype, COX (HLA-A1-B8-Cw7-DR3-DQ2), that shares the same HLA-DRB1, -DQA1, and -DQB1 alleles. We have defined the complete gene, splice variant, and sequence variation contents of all three haplotypes, comprising over 259 annotated loci and over 20,000 single nucleotide polymorphisms (SNPs). Certain coding sequences vary significantly between different haplotypes, making them candidates for functional and disease-association studies. Analysis of the two DR3 haplotypes allowed delineation of the shared sequence between two HLA class II-related haplotypes differing in disease associations and the identification of at least one of the sites that mediated the original recombination event. The levels of variation across the MHC were similar to those seen for other HLA-disparate haplotypes, except for a 158-kb segment that contained the HLA-DRB1, -DQA1, and -DQB1 genes and showed very limited polymorphism compatible with identity-by-descent and relatively recent common ancestry (<3,400 generations). These results indicate that the differential disease associations of these two DR3 haplotypes are due to sequence variation outside this central 158-kb segment, and that shuffling of ancestral blocks via recombination is a potential mechanism whereby certain DR-DQ allelic combinations, which presumably have favoured immunological functions, can spread across haplotypes and populations.
Collapse
|
2
|
Genomic sequence of the class II region of the canine MHC: comparison with the MHC of other mammalian species. Genomics 2005; 85:48-59. [PMID: 15607421 DOI: 10.1016/j.ygeno.2004.09.009] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2004] [Accepted: 09/27/2004] [Indexed: 11/21/2022]
Abstract
The domestic dog, Canis familiaris, is an excellent model species in which to study complex inherited diseases, having over 200 recognized breeds, each of which represents a closed gene pool. Overlapping canine genomic BAC clones were sequenced to obtain 711,521 bp of the canine classical and extended MHC class II regions. Analysis and annotation of this sequence reveals that it contains 45 loci, of which 29 are predicted to be functionally expressed. Comparison of the DLA class II sequence with those of the cat, human, and mouse highlights regions of syntenic conservation and species-specific gene rearrangement and duplication and gives an insight into the evolution of the DR region in the order Carnivora. Elucidation of functionally important dog class II genes and the identification of 23 microsatellite markers spanning this region will contribute significantly to the study of canine diseases that have an immune component.
Collapse
|
3
|
Abstract
The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) has been designed to be a community resource for browsing manual annotation of finished sequences from a variety of vertebrate genomes. Its core database is based on an Ensembl-style schema, extended to incorporate curation-specific metadata. In collaboration with the genome sequencing centres, Vega attempts to present consistent high-quality annotation of the published human chromosome sequences. In addition, it is also possible to view various finished regions from other vertebrates, including mouse and zebrafish. Vega displays only manually annotated gene structures built using transcriptional evidence, which can be examined in the browser. Attempts have been made to standardize the annotation procedure across each vertebrate genome, which should aid comparative analysis of orthologues across the different finished regions.
Collapse
|
4
|
Abstract
The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence.
Collapse
MESH Headings
- Animals
- Antigens, Neoplasm/genetics
- Centromere/genetics
- Chromosomes, Human, X/genetics
- Chromosomes, Human, Y/genetics
- Contig Mapping
- Crossing Over, Genetic/genetics
- Dosage Compensation, Genetic
- Evolution, Molecular
- Female
- Genetic Linkage/genetics
- Genetics, Medical
- Genomics
- Humans
- Male
- Polymorphism, Single Nucleotide/genetics
- RNA/genetics
- Repetitive Sequences, Nucleic Acid/genetics
- Sequence Analysis, DNA
- Sequence Homology, Nucleic Acid
- Testis/metabolism
Collapse
|
5
|
Identification of mammalian microRNA host genes and transcription units. Genome Res 2004; 14:1902-10. [PMID: 15364901 PMCID: PMC524413 DOI: 10.1101/gr.2722704] [Citation(s) in RCA: 1407] [Impact Index Per Article: 70.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2004] [Accepted: 07/27/2004] [Indexed: 12/13/2022]
Abstract
To derive a global perspective on the transcription of microRNAs (miRNAs) in mammals, we annotated the genomic position and context of this class of noncoding RNAs (ncRNAs) in the human and mouse genomes. Of the 232 known mammalian miRNAs, we found that 161 overlap with 123 defined transcription units (TUs). We identified miRNAs within introns of 90 protein-coding genes with a broad spectrum of molecular functions, and in both introns and exons of 66 mRNA-like noncoding RNAs (mlncRNAs). In addition, novel families of miRNAs based on host gene identity were identified. The transcription patterns of all miRNA host genes were curated from a variety of sources illustrating spatial, temporal, and physiological regulation of miRNA expression. These findings strongly suggest that miRNAs are transcribed in parallel with their host transcripts, and that the two different transcription classes of miRNAs ('exonic' and 'intronic') identified here may require slightly different mechanisms of biogenesis.
Collapse
|
6
|
Abstract
Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection.
Collapse
|
7
|
Complete MHC haplotype sequencing for common disease gene mapping. Genome Res 2004; 14:1176-87. [PMID: 15140828 PMCID: PMC419796 DOI: 10.1101/gr.2188104] [Citation(s) in RCA: 247] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2003] [Accepted: 02/13/2004] [Indexed: 11/24/2022]
Abstract
The future systematic mapping of variants that confer susceptibility to common diseases requires the construction of a fully informative polymorphism map. Ideally, every base pair of the genome would be sequenced in many individuals. Here, we report 4.75 Mb of contiguous sequence for each of two common haplotypes of the major histocompatibility complex (MHC), to which susceptibility to >100 diseases has been mapped. The autoimmune disease-associated-haplotypes HLA-A3-B7-Cw7-DR15 and HLA-A1-B8-Cw7-DR3 were sequenced in their entirety through a bacterial artificial chromosome (BAC) cloning strategy using the consanguineous cell lines PGF and COX, respectively. The two sequences were annotated to encompass all described splice variants of expressed genes. We defined the complete variation content of the two haplotypes, revealing >18,000 variations between them. Average SNP densities ranged from less than one SNP per kilobase to >60. Acquisition of complete and accurate sequence data over polymorphic regions such as the MHC from large-insert cloned DNA provides a definitive resource for the construction of informative genetic maps, and avoids the limitation of chromosome regions that are refractory to PCR amplification.
Collapse
|
8
|
The DNA sequence and comparative analysis of human chromosome 10. Nature 2004; 429:375-81. [PMID: 15164054 DOI: 10.1038/nature02462] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2003] [Accepted: 03/09/2004] [Indexed: 11/08/2022]
Abstract
The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence.
Collapse
|
9
|
Abstract
Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb.
Collapse
|
10
|
Organisation of the pantothenate (vitamin B5) biosynthesis pathway in higher plants. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2004; 37:61-72. [PMID: 14675432 DOI: 10.1046/j.1365-313x.2003.01940.x] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Pantothenate (vitamin B5) is the precursor for the biosynthesis of the phosphopantetheine moiety of coenzyme A and acyl carrier protein, and is synthesised in Escherichia coli by four enzymic reactions. Ketopantoate hydroxymethyltransferase (KPHMT) and pantothenate synthetase (PtS) catalyse the first and last steps, respectively. Two genes encoding KPHMT and one for PtS were identified in the Arabidopsis thaliana genome, and cDNAs for all three genes were amplified by PCR. The cDNAs were able to complement their respective E. coli auxotrophs, demonstrating that they encoded functional enzymes. Subcellular localisation of the proteins was investigated using green fluorescent protein (GFP) fusions and confocal microscopy. The two KPHMT-GFP fusion proteins were targeted exclusively to mitochondria, whereas PtS-GFP was found in the cytosol. This implies that there must be transporters for pathway intermediates. KPHMT enzyme activity could be measured in purified mitochondria from both pea leaves and Arabidopsis suspension cultures. We investigated whether Arabidopsis encoded homologues of the remaining two pantothenate biosynthesis enzymes from E. coli, l-aspartate-alpha-decarboxylase (ADC) and ketopantoate reductase (KPR). No homologue of ADC could be identified using either conventional blast or searches with the program fugue in which the structure of the E. coli ADC was compared to all the annotated proteins in Arabidopsis. ADC also appears to be absent from the genome of the yeast, Saccharomyces cerevisiae, by the same criteria. In contrast, a putative Arabidopsis oxidoreductase with some similarity to KPR was identified with fugue.
Collapse
|
11
|
Abstract
Chromosome 6 is a metacentric chromosome that constitutes about 6% of the human genome. The finished sequence comprises 166,880,988 base pairs, representing the largest chromosome sequenced so far. The entire sequence has been subjected to high-quality manual annotation, resulting in the evidence-supported identification of 1,557 genes and 633 pseudogenes. Here we report that at least 96% of the protein-coding genes have been identified, as assessed by multi-species comparative sequence analysis, and provide evidence for the presence of further, otherwise unsupported exons/genes. Among these are genes directly implicated in cancer, schizophrenia, autoimmunity and many other diseases. Chromosome 6 harbours the largest transfer RNA gene cluster in the genome; we show that this cluster co-localizes with a region of high transcriptional activity. Within the essential immune loci of the major histocompatibility complex, we find HLA-B to be the most polymorphic gene on chromosome 6 and in the human genome.
Collapse
|
12
|
Abstract
Fifty years after the publication of DNA structure, the whole human genome sequence will be officially finished. This achievement marks the beginning of the task to catalogue every human gene and identify each of their function expression patterns. Currently, researchers estimate that there are about 30,000 human genes and approximately 70% of these can be automatically predicted using a combination of ab initio and similarity-based programs. However, to experimentally investigate every gene's function, the research community requires a high-quality annotation of alternative splicing, pseudogenes, and promoter regions that can only be provided by manual intervention. Manual curation of the human genome will be a long-term project as experimental data are continually produced to confirm or refine the predictions, and new features such as noncoding RNAs and enhancers have not been fully identified. Such a highly curated human gene-set made publicly available will be a great asset for the experimental community and for future comparative genome projects.
Collapse
|
13
|
Crystal structure of Escherichia coli ketopantoate reductase at 1.7 A resolution and insight into the enzyme mechanism. Biochemistry 2001; 40:14493-500. [PMID: 11724562 DOI: 10.1021/bi011020w] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Ketopantoate reductase (KPR, EC 1.1.1.169) catalyzes the NADPH-dependent reduction of ketopantoate to pantoate on the pantothenate (vitamin B(5)) biosynthetic pathway. The Escherichia coli panE gene encoding KPR was cloned and expressed at high levels as the native and selenomethionine-substituted (SeMet) proteins. Both native and SeMet recombinant proteins were purified by three chromatographic steps, to yield pure proteins. The wild-type enzyme was found to have a K(M)(NADPH) of 20 microM, a K(M)(ketopantoate) of 60 microM, and a k(cat) of 40 s(-1). Regular prismatic KPR crystals were prepared using the hanging drop technique. They belonged to the tetragonal space group P4(2)2(1)2, with cell parameters: a = b = 103.7 A and c = 55.7 A, accommodating one enzyme molecule per asymmetric unit. The structure of KPR was determined by the multiwavelength anomalous dispersion method using the SeMet protein, for which data were collected to 2.3 A resolution. The native data were collected to 1.7 A resolution and used to refine the final structure. The secondary structure comprises 12 alpha-helices, three 3(10)-helices, and 11 beta-strands. The enzyme is monomeric and has two domains separated by a cleft. The N-terminal domain has an alphabeta-fold of the Rossmann type. The C-terminal domain (residues 170-291) is composed of eight alpha-helices. KPR is shown to be a member of the 6-phosphogluconate dehydrogenase C-terminal domain-like superfamily. A model for the ternary enzyme-NADPH-ketopantoate ternary complex provides a rationale for kinetic data reported for specific site-directed mutants.
Collapse
|
14
|
Plant viral leaders influence expression of a reporter gene in tobacco. PLANT MOLECULAR BIOLOGY 1993; 23:97-109. [PMID: 8219060 DOI: 10.1007/bf00021423] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
In order to optimise expression of a foreign protein in transgenic plants we investigated the potential benefits of including a viral untranslated leader sequence within a plant transformation vector. A variety of 5 leaders, including the tobacco mosaic virus (TMV) leader sequence and 31 nucleotides of the cauliflower mosaic virus (CaMV) 35S RNA leader, were compared. Viral leader constructs employing the 35S promoter and the reporter beta-glucuronidase (GUS) were tested by electroporation into tobacco mesophyll protoplasts and against a cointroduced chloramphenicol acetyl transferase (CAT) gene in transgenic tobacco leaves. In the transient assay system, GUS activities from the viral leaders were compared with those from either a short, random leader or a translational fusion of the CaMV 19S RNA ORF VI to GUS. A two- to-three-fold enhanced level of expression resulted when these leaders were substituted with either the 35S RNA or the TMV leader sequences. This enhancement was further increased, to four- to five-fold, by inclusion of four or seven of the bases from the 35S transcription initiation site adjacent to the TMV leader. In transgenic tobacco the improved GUS levels were maintained from constructs including either the TMV leader (eight-fold) or this sequence with the addition of the 35S transcription initiation site bases (ten-fold). A comparison of GUS enzyme amounts with GUS mRNA amounts, using the CAT gene as an internal standard, revealed that TMV leader-bearing mRNA was translated from four- to six-fold more efficiently than the random leader control.
Collapse
MESH Headings
- Base Sequence
- Caulimovirus/genetics
- Cloning, Molecular
- DNA, Viral
- Genes, Reporter
- Glucuronidase/genetics
- Glucuronidase/metabolism
- Molecular Sequence Data
- Plants, Genetically Modified/genetics
- Plants, Genetically Modified/metabolism
- Plants, Genetically Modified/microbiology
- Plants, Toxic
- Promoter Regions, Genetic
- Protein Biosynthesis
- RNA, Messenger/analysis
- RNA, Messenger/genetics
- RNA, Viral/genetics
- Restriction Mapping
- Nicotiana/genetics
- Nicotiana/metabolism
- Nicotiana/microbiology
- Tobacco Mosaic Virus/genetics
- Transformation, Genetic
Collapse
|