1
|
Šimon M, Mikec Š, Atanur SS, Konc J, Morton NM, Horvat S, Kunej T. Whole genome sequencing of mouse lines divergently selected for fatness (FLI) and leanness (FHI) revealed several genetic variants as candidates for novel obesity genes. Genes Genomics 2024; 46:557-575. [PMID: 38483771 PMCID: PMC11024027 DOI: 10.1007/s13258-024-01507-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 02/25/2024] [Indexed: 04/18/2024]
Abstract
BACKGROUND Analysing genomes of animal model organisms is widely used for understanding the genetic basis of complex traits and diseases, such as obesity, for which only a few mouse models exist, however, without their lean counterparts. OBJECTIVE To analyse genetic differences in the unique mouse models of polygenic obesity (Fat line) and leanness (Lean line) originating from the same base population and established by divergent selection over more than 60 generations. METHODS Genetic variability was analysed using WGS. Variants were identified with GATK and annotated with Ensembl VEP. g.Profiler, WebGestalt, and KEGG were used for GO and pathway enrichment analysis. miRNA seed regions were obtained with miRPathDB 2.0, LncRRIsearch was used to predict targets of identified lncRNAs, and genes influencing adipose tissue amount were searched using the IMPC database. RESULTS WGS analysis revealed 6.3 million SNPs, 1.3 million were new. Thousands of potentially impactful SNPs were identified, including within 24 genes related to adipose tissue amount. SNP density was highest in pseudogenes and regulatory RNAs. The Lean line carries SNP rs248726381 in the seed region of mmu-miR-3086-3p, which may affect fatty acid metabolism. KEGG analysis showed deleterious missense variants in immune response and diabetes genes, with food perception pathways being most enriched. Gene prioritisation considering SNP GERP scores, variant consequences, and allele comparison with other mouse lines identified seven novel obesity candidate genes: 4930441H08Rik, Aff3, Fam237b, Gm36633, Pced1a, Tecrl, and Zfp536. CONCLUSION WGS revealed many genetic differences between the lines that accumulated over the selection period, including variants with potential negative impacts on gene function. Given the increasing availability of mouse strains and genetic polymorphism catalogues, the study is a valuable resource for researchers to study obesity.
Collapse
Affiliation(s)
- Martin Šimon
- Chair of Genetics, Animal Biotechnology and Immunology, Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domžale, 1230, Slovenia.
| | - Špela Mikec
- Chair of Genetics, Animal Biotechnology and Immunology, Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domžale, 1230, Slovenia
| | - Santosh S Atanur
- Faculty of Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, London, SW7 2AZ, UK
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Janez Konc
- Laboratory for Molecular Modeling, National Institute of Chemistry, Ljubljana, 1000, Slovenia
| | - Nicholas M Morton
- The Queen's Medical Research Institute, Centre for Cardiovascular Science, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Simon Horvat
- Chair of Genetics, Animal Biotechnology and Immunology, Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domžale, 1230, Slovenia
| | - Tanja Kunej
- Chair of Genetics, Animal Biotechnology and Immunology, Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Domžale, 1230, Slovenia.
| |
Collapse
|
2
|
Abstract
Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (∼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.
Collapse
|
3
|
Chen J, Zhang X, Jing R, Blair MW, Mao X, Wang S. Cloning and genetic diversity analysis of a new P5CS gene from common bean (Phaseolus vulgaris L.). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 120:1393-404. [PMID: 20143043 DOI: 10.1007/s00122-010-1263-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2009] [Accepted: 12/23/2009] [Indexed: 05/11/2023]
Abstract
Delta(1)-pyrroline-5-carboxylate synthetase (P5CS) is the rate-limiting enzyme involved in the biosynthesis of proline in plants. By the 3' rapid amplification of cDNA ends (3'-RACE) approach, a 2,246-bp cDNA sequence was obtained from common bean (Phaseolus vulgaris L.), denominated PvP5CS2 differing from another P5CS gene that we cloned previously from common bean (PvP5CS). The predicted amino acid sequence of PvP5CS2 has an overall 93.2% identity GmP5CS (Glycine max L. P5CS). However, PvP5CS2 shows only 83.7% identity in amino acid sequence to PvP5CS, suggesting PvP5CS2 represents a homolog of the soybean P5CS gene. Abundant indel (insertion and deletion events) and SNP (single nucleotide polymorphisms) were found in the cloned PvP5CS2 genome sequence when comparing 24 cultivated and 3 wild common bean accessions and these in turn reflected aspects of common bean evolution. Sequence alignment showed that genotypes from the same gene pool had similar nucleotide variation, while genotypes from different gene pools had distinctly different nucleotide variation for PvP5CS2. Furthermore, diversity along the gene sequence was not evenly distributed, being low in the glutamic-g-semialdehyde dehydrogenase catalyzing region, moderate in the Glu-5-kinase catalyzing region and high in the intervening region. Neutrality tests showed that PvP5CS2 was a conserved gene undergoing negative selection. A new marker (Pv97) was developed for genetic mapping of PvP5CS2 based on an indel between DOR364 and G19833 sequences and the gene was located between markers Bng126 and BMd045 on chromosome b01. The relationship of PvP5CS2 and a previously cloned pyrroline-5-carboxylate synthetase gene as well as the implications of this work on selecting for drought tolerance in common bean are discussed.
Collapse
Affiliation(s)
- Jibao Chen
- National Key Facility for Crop Gene Resources and Genetic Improvement, Key Laboratory of Crop Germplasm and Biotechnology, Institute of Crop Sciences, The Chinese Academy of Agricultural Sciences, Ministry of Agriculture, 100081, Beijing, China.
| | | | | | | | | | | |
Collapse
|
4
|
Chandrasekar A, Riju A, Sithara K, Anoop S, Eapen SJ. Identification of single nucleotide polymorphism in ginger using expressed sequence tags. Bioinformation 2009; 4:119-22. [PMID: 20198184 PMCID: PMC2828891 DOI: 10.6026/97320630004119] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2009] [Revised: 05/10/2009] [Accepted: 06/08/2009] [Indexed: 12/05/2022] Open
Abstract
UNLABELLED Ginger (Zingiber officinale Rosc) (Family: Zingiberaceae) is a herbaceous perennial, the rhizomes of which are used as a spice. Ginger is a plant which is well known for its medicinal applications. Recently EST-derived SNPs are a free by-product of the currently expanding EST (Expressed Sequence Tag) databases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion/deletion) has led to a revolution in their use as molecular markers. Available (38139) Ginger EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script AutoSNP version 1.0 which has used 31905 ESTs for detecting SNPs and Indel sites. We found 64026 SNP sites and 7034 indel polymorphisms with frequency of 0.84 SNPs / 100 bp. Among the three tissues from which the EST libraries had been generated, Rhizomes had high frequency of 1.08 SNPs/indels per 100 bp whereas the leaves had lowest frequency of 0.63 per 100 bp and root is showing relative frequency 0.82/100bp. Transitions and transversion ratio is 0.90. In overall detected SNP, transversion is high when compare to transition. These detected SNPs can be used as markers for genetic studies. AVAILABILITY The results of the present study hosted in our webserver www.spices.res.in/spicesnip.
Collapse
Affiliation(s)
- Arumugam Chandrasekar
- Bioinformatics centre, Indian Institute of spice research, P.B.No.1701, Marikunnu P.O, Calicut -673012, Kerala, India
| | - Aikkal Riju
- Bioinformatics centre, Indian Institute of spice research, P.B.No.1701, Marikunnu P.O, Calicut -673012, Kerala, India
| | - Kandiyl Sithara
- Kandiyl House, Makkada (P.O), Kakkodi, Calicut – 673617. Kerala, India
| | - Sahadevan Anoop
- Ottankulam, Athicode (P.O), Palakkad (Dist), Kerala – 678554, India
| | - Santhosh J Eapen
- Bioinformatics centre, Indian Institute of spice research, P.B.No.1701, Marikunnu P.O, Calicut -673012, Kerala, India
| |
Collapse
|
5
|
Williams JL, Dunner S, Valentini A, Mazza R, Amarger V, Checa ML, Crisà A, Razzaq N, Delourme D, Grandjean F, Marchitelli C, García D, Pérez Gomez R, Negrini R, Ajmone Marsan P, Levéziel H. Discovery, characterization and validation of single nucleotide polymorphisms within 206 bovine genes that may be considered as candidate genes for beef production and quality. Anim Genet 2009; 40:486-91. [DOI: 10.1111/j.1365-2052.2009.01874.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Morais DD, Harrison PM. Genomic evidence for non-random endemic populations of decaying exons from mammalian genes. BMC Genomics 2009; 10:309. [PMID: 19594905 PMCID: PMC2718932 DOI: 10.1186/1471-2164-10-309] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2009] [Accepted: 07/13/2009] [Indexed: 11/13/2022] Open
Abstract
Background Functional diversification of genes in mammalian genomes is engendered by a number of processes, e.g., gene duplication and alternative splicing. Gene duplication is classically discussed as leading to neofunctionalization (generation of new functions), subfunctionalization (generation of a varied function), or pseudogenization (loss of the gene and its function). Results Here, we focus on the process of pseudogenization, but specifically for individual exons from genes. It is at present unclear to what extent pseudogenization of individual exon duplications affects gene evolution, i.e., is it a random phenomenon, or is it associated with specific types of genes and encoded proteins, and positions in gene structures? We gathered genomic evidence for pseudogenic exons (ΨEs, i.e., exons disabled by frameshifts and premature stop codons), to examine for significant trends in their distribution across four mammalian genomes (specifically human, cow, mouse and rat). Across these four genomes, we observed a consistent population of ΨEs, associated with 0.4–1.0% of genes. These ΨE populations exhibit codon substitution patterns that are typical of an endemic population of decaying sequences. In human, ΨEs have significant over-representation for functional categories related to 'ion binding' and 'nucleic-acid binding', compared to duplicated exons in general. Also, ΨEs tend to be associated with some protein domains that are abundant generally, e.g., Zinc-finger and immunoglobulin protein domains, but not others, e.g., EGF-like domains. Positionally, ΨEs are also significantly associated with the 5' end of genes, but despite this, individual stop codons are positioned so that there is significant avoidance of potential targeting to nonsense-mediated decay. In human, ΨEs are often associated with alternative splicing (in 22 out of 284 genes with ΨEs in their milieu), and can have different parts of their sequence differentially spliced in alternative transcripts. Some unusual cases of ΨEs embedded within 5' and 3' non-coding exons are observed. Conclusion Our results indicate the types of genes that harbour ΨEs, and demonstrate that ΨEs have non-random distribution within gene structures. These ΨEs may function in gene regulation through generation of transcribed pseudogenes, or regulatory alternate transcripts.
Collapse
Affiliation(s)
- David Delima Morais
- Department of Biology, McGill University, Stewart Biology Building, 1205 Docteur Penfield Ave, Montreal, QC, H3A 1B1, Canada.
| | | |
Collapse
|
7
|
Mukherjee S, Sarkar-Roy N, Wagener DK, Majumder PP. Signatures of natural selection are not uniform across genes of innate immune system, but purifying selection is the dominant signature. Proc Natl Acad Sci U S A 2009; 106:7073-8. [PMID: 19359493 PMCID: PMC2678448 DOI: 10.1073/pnas.0811357106] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2008] [Indexed: 12/21/2022] Open
Abstract
We tested the opposing views concerning evolution of genes of the innate immune system that (i) being evolutionary ancient, the system may have been highly optimized by natural selection and therefore should be under purifying selection, and (ii) the system may be plastic and continuing to evolve under balancing selection. We have resequenced 12 important innate-immunity genes (CAMP, DEFA4, DEFA5, DEFA6, DEFB1, MBL2, and TLRs 1, 2, 4, 5, 6, and 9) in healthy volunteers (n = 171) recruited from a region of India with high microbial load. We have compared these data with those of European-Americans (EUR) and African-Americans (AFR). We have found that most of the human haplotypes are many mutational steps away from the ancestral (chimpanzee) haplotypes, indicating that humans may have had to adapt to new pathogens. The haplotype structures in India are significantly different from those of EUR and AFR populations, indicating local adaptation to pathogens. In these genes, there is (i) generally an excess of rare variants, (ii) high, but variable, degrees of extended haplotype homozygosity, (iii) low tolerance to nonsynonymous changes, (iv) essentially one or a few high-frequency haplotypes, with star-like phylogenies of other infrequent haplotypes radiating from the modal haplotypes. Purifying selection is the most parsimonious explanation operating on these innate immunity genes. This genetic surveillance system recognizes motifs in pathogens that are perhaps conserved across a broad range of pathogens. Hence, functional constraints are imposed on mutations that diminish the ablility of these proteins to detect pathogens.
Collapse
Affiliation(s)
- Souvik Mukherjee
- The Chatterjee Group—Indian Statistical Institute Centre for Population Genomics, Institute of Molecular Medicine, Kolkata 700091, India
| | - Neeta Sarkar-Roy
- The Chatterjee Group—Indian Statistical Institute Centre for Population Genomics, Institute of Molecular Medicine, Kolkata 700091, India
| | | | - Partha P. Majumder
- The Chatterjee Group—Indian Statistical Institute Centre for Population Genomics, Institute of Molecular Medicine, Kolkata 700091, India
- Human Genetics Unit, Indian Statistical Institute, Kolkata 700108, India
| |
Collapse
|
8
|
Guo Y, Jamison DC. The distribution of SNPs in human gene regulatory regions. BMC Genomics 2005; 6:140. [PMID: 16209714 PMCID: PMC1260019 DOI: 10.1186/1471-2164-6-140] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2005] [Accepted: 10/06/2005] [Indexed: 11/25/2022] Open
Abstract
Background As a result of high-throughput genotyping methods, millions of human genetic variants have been reported in recent years. To efficiently identify those with significant biological functions, a practical strategy is to concentrate on variants located in important sequence regions such as gene regulatory regions. Results Analysis of the most common type of variant, single nucleotide polymorphisms (SNPs), shows that in gene promoter regions more SNPs occur in close proximity to transcriptional start sites than in regions further upstream, and a disproportionate number of those SNPs represent nucleotide transversions. Additionally, the number of SNPs found in the predicted transcription factor binding sites is higher than in non-binding site sequences. Conclusion Current information about transcription factor binding site sequence patterns may not be exhaustive, and SNPs may be actively involved in influencing gene expression by affecting the transcription factor binding sites.
Collapse
Affiliation(s)
- Yongjian Guo
- School of Computational Sciences, George Mason University, Manassas, VA 20110 USA
- Virginia Bioinformatics Institute, Bioinformatics Facility I (0477), Virginia Tech, Blacksburg, VA 24060 USA
| | - D Curtis Jamison
- School of Computational Sciences, George Mason University, Manassas, VA 20110 USA
| |
Collapse
|
9
|
Zheng D, Zhang Z, Harrison PM, Karro J, Carriero N, Gerstein M. Integrated pseudogene annotation for human chromosome 22: evidence for transcription. J Mol Biol 2005; 349:27-45. [PMID: 15876366 DOI: 10.1016/j.jmb.2005.02.072] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2004] [Revised: 02/16/2005] [Accepted: 02/23/2005] [Indexed: 02/06/2023]
Abstract
Pseudogenes are inheritable genetic elements formally defined by two properties: their similarity to functioning genes and their presumed lack of activity. However, their precise characterization, particularly with respect to the latter quality, has proven elusive. An opportunity to explore this issue arises from the recent emergence of tiling-microarray data showing that intergenic regions (containing pseudogenes) are transcribed to a great degree. Here we focus on the transcriptional activity of pseudogenes on human chromosome 22. First, we integrated several sets of annotation to define a unified list of 525 pseudogenes on the chromosome. To characterize these further, we developed a comprehensive list of genomic features based on conservation in related organisms, expression evidence, and the presence of upstream regulatory sites. Of the 525 unified pseudogenes we could confidently classify 154 as processed and 49 as duplicated. Using data from tiling microarrays, especially from recent high-resolution oligonucleotide arrays, we found some evidence that up to a fifth of the 525 pseudogenes are potentially transcribed. Expressed sequence tags (EST) comparison further validated a number of these, and overall we found 17 pseudogenes with strong support for transcription. In particular, one of the pseudogenes with both EST and microarray evidence for transcription turned out to be a duplicated pseudogene in the cat eye syndrome critical region. Although we could not identify a meaningful number of transcription factor-binding sites (based on chromatin immunoprecipitation-chip data) near pseudogenes, we did find that approximately 12% of the pseudogenes had upstream CpG islands. Finally, analysis of corresponding syntenic regions in the mouse, rat and chimp genomes indicates, as previously suggested, that pseudogenes are less conserved than genes, but more preserved than the intergenic background (all notation is available from http://www.pseudogene.org).
Collapse
Affiliation(s)
- Deyou Zheng
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA
| | | | | | | | | | | |
Collapse
|
10
|
Abstract
Pseudogenes are considered as genomic fossils: disabled copies of functional genes that were once active in the ancient genome. Recently, whole-genome computational approaches have revealed thousands of pseudogenes in the genomes of the human and other eukaryotes. Identification of these pseudogenes can improve the accuracy of gene annotation. It also offers new insight on the evolutionary history and the stability of the genome as a whole.
Collapse
Affiliation(s)
- ZhaoLei Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, Connecticut 06520-8114, USA
| | | |
Collapse
|
11
|
Basu A, Chaudhuri P, Majumder PP. Identification of polymorphic motifs using probabilistic search algorithms. Genome Res 2005; 15:67-77. [PMID: 15632091 PMCID: PMC540278 DOI: 10.1101/gr.2358005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2004] [Accepted: 10/21/2004] [Indexed: 01/12/2023]
Abstract
The problem of identifying motifs comprising nucleotides at a set of polymorphic DNA sites, not necessarily contiguous, arises in many human genetic problems. However, when the sites are not contiguous, no efficient algorithm exists for polymorphic motif identification. A search based on complete enumeration is computationally inefficient. We have developed probabilistic search algorithms to discover motifs of known or unknown lengths. We have developed statistical tests of significance for assessing a motif discovery, and a statistical criterion for simultaneously estimating motif length and discovering it. We have tested these algorithms on various synthetic data sets and have shown that they are very efficient, in the sense that the "true" motifs can be detected in the vast majority of replications and in a small number of iterations. Additionally, we have applied them to some real data sets and have shown that they are able to identify known motifs. In certain applications, it is pertinent to find motifs that contain contrasting nucleotides at the sites included in the motif (e.g., motifs identified in case-control association studies). For this, we have suggested appropriate modifications. Using simulations, we have discovered that the success rate of identification of the correct motif is high in case-control studies except when relative risks are small. Our analyses of evolutionary data sets resulted in the identification of some motifs that appear to have important implications on human evolutionary inference. These algorithms can easily be implemented to discover motifs from multilocus genotype data by simple numerical recoding of genotypes.
Collapse
Affiliation(s)
- Analabha Basu
- Human Genetics Unit, Indian Statistical Institute, Kolkata, 700108 India
| | | | | |
Collapse
|
12
|
Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P, Gerstein M, Weissman S, Snyder M. The transcriptional activity of human Chromosome 22. Genes Dev 2003; 17:529-40. [PMID: 12600945 PMCID: PMC195998 DOI: 10.1101/gad.1055203] [Citation(s) in RCA: 237] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2002] [Accepted: 12/24/2002] [Indexed: 01/09/2023]
Abstract
A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)(+) RNA. We found that many of the known, related and predicted genes are expressed. More importantly, our study reveals twice as many transcribed bases as have been reported previously. Many of the newly discovered expressed fragments were verified by RNA blot analysis and a novel technique called differential hybridization mapping (DHM). Interestingly, a significant fraction of these novel fragments are expressed antisense to previously annotated introns. The coding potential of these novel expressed regions is supported by their sequence conservation in the mouse genome. This study has greatly increased our understanding of the biological information encoded on a human chromosome. To facilitate the dissemination of these results to the scientific community, we have developed a comprehensive Web resource to present the findings of this study and other features of human Chromosome 22 at http://array.mbb.yale.edu/chr22.
Collapse
Affiliation(s)
- John L Rinn
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Harrison PM, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 2002; 318:1155-74. [PMID: 12083509 DOI: 10.1016/s0022-2836(02)00109-2] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein families can be used to understand many aspects of genomes, both their "live" and their "dead" parts (i.e. genes and pseudogenes). Surveys of genomes have revealed that, in every organism, there are always a few large families and many small ones, with the overall distribution following a power-law. This commonality is equally true for both genes and pseudogenes, and exists despite the fact that the specific families that are enlarged differ greatly between organisms. Furthermore, because of family structure there is great redundancy in proteomes, a fact linked to the large number of dispensable genes for each organism and the small size of the minimal, indispensable sub-proteome. Pseudogenes in prokaryotes represent families that are in the process of being dispensed with. In particular, the genome sequences of certain pathogenic bacteria (Mycobacterium leprae, Yersinia pestis and Rickettsia prowazekii) show how an organism can undergo reductive evolution on a large scale (i.e. the dying out of families) as a result of niche change. There appears to be less pressure to delete pseudogenes in eukaryotes. These can be divided into two varieties, duplicated and processed, where the latter involves reverse transcription from an mRNA intermediate. We discuss these collectively in yeast, worm, fly, and human. The fly has few pseudogenes apparently because of its high rate of genomic DNA deletion. In the other three organisms, the distribution of pseudogenes on the chromosome and amongst different families is highly non-uniform. Pseudogenes tend not to occur in the middle of chromosome arms, and tend to be associated with lineage-specific (as opposed to highly conserved) families that have environmental-response functions. This may be because, rather than being dead, they may form a reservoir of diverse "extra parts" that can be resurrected to help an organism adapt to its surroundings. In yeast, there may be a novel mechanism involving the [PSI+] prion that potentially enables this resurrection. In worm, the pseudogenes tend to arise out of families (e.g. chemoreceptors) that are greatly expanded in it compared to the fly. The human genome stands out in having many processed pseudogenes. These have a character very different from those of the duplicated variety, to a large extent just representing random insertions. Thus, their occurrence tends to be roughly in proportion to the amount of mRNA for a particular protein and to reflect the extent of the intergenic sequences. Further information about pseudogenes is available at http://genecensus.org/pseudogene
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8114, USA
| | | |
Collapse
|