101
|
Lavoie H, Debeane F, Trinh QD, Turcotte JF, Corbeil-Girard LP, Dicaire MJ, Saint-Denis A, Pagé M, Rouleau GA, Brais B. Polymorphism, shared functions and convergent evolution of genes with sequences coding for polyalanine domains. Hum Mol Genet 2003; 12:2967-79. [PMID: 14519685 DOI: 10.1093/hmg/ddg329] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Mutations causing expansions of polyalanine domains are responsible for nine hereditary diseases. Other GC-rich sequences coding for some polyalanine domains were found to be polymorphic in human. These observations prompted us to identify all sequences in the human genome coding for polyalanine stretches longer than four alanines and establish their degree of polymorphism. We identified 494 annotated human proteins containing 604 polyalanine domains. Thirty-two percent (31/98) of tested sequences coding for more than seven alanines were polymorphic. The length of the polyalanine-coding sequence and its GCG or GCC repeat content are the major predictors of polymorphism. GCG codons are over-represented in human polyalanine coding sequences. Our data suggest that GCG and GCC codons play a key role in polyalanine-coding sequence appearance and polymorphism. The grouping by shared function of polyalanine-containing proteins in Homo sapiens, Drosophila melanogaster and Caenorhabditis elegans shows that the majority are involved in transcriptional regulation. Phylogenetic analyses of HOX, GATA and EVX protein families demonstrate that polyalanine domains arose independently in different members of these families, suggesting that convergent molecular evolution may have played a role. Finally polyalanine domains in vertebrates are conserved between mammals and are rarer and shorter in Gallus gallus and Danio rerio. Together our results show that the polymorphic nature of sequences coding for polyalanine domains makes them prime candidates for mutations in hereditary diseases and suggests that they have appeared in many different protein families through convergent evolution.
Collapse
Affiliation(s)
- Hugo Lavoie
- Laboratoire de Neurogénétique, Centre de Recherche du Centre Hospitalier de l'Université de Montréal, Québec, Canada
| | | | | | | | | | | | | | | | | | | |
Collapse
|
102
|
Brentani H, Caballero OL, Camargo AA, da Silva AM, da Silva WA, Dias Neto E, Grivet M, Gruber A, Guimaraes PEM, Hide W, Iseli C, Jongeneel CV, Kelso J, Nagai MA, Ojopi EPB, Osorio EC, Reis EMR, Riggins GJ, Simpson AJG, de Souza S, Stevenson BJ, Strausberg RL, Tajara EH, Verjovski-Almeida S, Acencio ML, Bengtson MH, Bettoni F, Bodmer WF, Briones MRS, Camargo LP, Cavenee W, Cerutti JM, Coelho Andrade LE, Costa dos Santos PC, Ramos Costa MC, da Silva IT, Estécio MRH, Sa Ferreira K, Furnari FB, Faria M, Galante PAF, Guimaraes GS, Holanda AJ, Kimura ET, Leerkes MR, Lu X, Maciel RMB, Martins EAL, Massirer KB, Melo ASA, Mestriner CA, Miracca EC, Miranda LL, Nobrega FG, Oliveira PS, Paquola ACM, Pandolfi JRC, Campos Pardini MIDM, Passetti F, Quackenbush J, Schnabel B, Sogayar MC, Souza JE, Valentini SR, Zaiats AC, Amaral EJ, Arnaldi LAT, de Araújo AG, de Bessa SA, Bicknell DC, Ribeiro de Camaro ME, Carraro DM, Carrer H, Carvalho AF, Colin C, Costa F, Curcio C, Guerreiro da Silva IDC, Pereira da Silva N, Dellamano M, El-Dorry H, Espreafico EM, Scattone Ferreira AJ, Ayres Ferreira C, Fortes MAHZ, Gama AH, Giannella-Neto D, Giannella MLCC, Giorgi RR, Goldman GH, Goldman MHS, Hackel C, Ho PL, Kimura EM, Kowalski LP, Krieger JE, Leite LCC, Lopes A, Luna AMSC, Mackay A, Mari SKN, Marques AA, Martins WK, Montagnini A, Mourão Neto M, Nascimento ALTO, Neville AM, Nobrega MP, O'Hare MJ, Otsuka AY, Ruas de Melo AI, Paco-Larson ML, Guimarães Pereira G, Pereira da Silva N, Pesquero JB, Pessoa JG, Rahal P, Rainho CA, Rodrigues V, Rogatto SR, Romano CM, Romeiro JG, Rossi BM, Rusticci M, Guerra de Sá R, Sant' Anna SC, Sarmazo ML, Silva TCDLE, Soares FA, Sonati MDF, de Freitas Sousa J, Queiroz D, Valente V, Vettore AL, Villanova FE, Zago MA, Zalcberg H. The generation and utilization of a cancer-oriented representation of the human transcriptome by using expressed sequence tags. Proc Natl Acad Sci U S A 2003; 100:13418-23. [PMID: 14593198 PMCID: PMC263829 DOI: 10.1073/pnas.1233632100] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Whereas genome sequencing defines the genetic potential of an organism, transcript sequencing defines the utilization of this potential and links the genome with most areas of biology. To exploit the information within the human genome in the fight against cancer, we have deposited some two million expressed sequence tags (ESTs) from human tumors and their corresponding normal tissues in the public databases. The data currently define approximately 23,500 genes, of which only approximately 1,250 are still represented only by ESTs. Examination of the EST coverage of known cancer-related (CR) genes reveals that <1% do not have corresponding ESTs, indicating that the representation of genes associated with commonly studied tumors is high. The careful recording of the origin of all ESTs we have produced has enabled detailed definition of where the genes they represent are expressed in the human body. More than 100,000 ESTs are available for seven tissues, indicating a surprising variability of gene usage that has led to the discovery of a significant number of genes with restricted expression, and that may thus be therapeutically useful. The ESTs also reveal novel nonsynonymous germline variants (although the one-pass nature of the data necessitates careful validation) and many alternatively spliced transcripts. Although widely exploited by the scientific community, vindicating our totally open source policy, the EST data generated still provide extensive information that remains to be systematically explored, and that may further facilitate progress toward both the understanding and treatment of human cancers.
Collapse
Affiliation(s)
- Helena Brentani
- Laboratorio de Genética Molecular do Cancer, Departmento de Radiologia, Universidade de São Paulo, Travessa da Rua Dr. Ovídeo Pires de Campos S/N, 4deg, Brazil
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
103
|
Wang J, Webb G, Cao Y, Steiner DF. Contrasting patterns of expression of transcription factors in pancreatic alpha and beta cells. Proc Natl Acad Sci U S A 2003; 100:12660-5. [PMID: 14557546 PMCID: PMC240674 DOI: 10.1073/pnas.1735286100] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Pancreatic alpha and beta cells are derived from the same progenitors but play opposing roles in the control of glucose homeostasis. Disturbances in their function are associated with diabetes mellitus. To identify many of the proteins that define their unique pathways of differentiation and functional features, we have analyzed patterns of gene expression in alphaTC1.6 vs. MIN6 cell lines by using oligonucleotide microarrays. Approximately 9-10% of >11,000 transcripts examined showed significant differences between the two cell types. Of >700 known transcripts enriched in either cell type, transcription factors and their regulators (TFR) was one of the most significantly different categories. Ninety-six members of the basic zipper, basic helix-loop-helix, homeodomain, zinc finger, high mobility group, and other transcription factor families were enriched in alpha cells; in contrast, homeodomain proteins accounted for 51% of a total of 45 TFRs enriched in beta cells. Our analysis thus highlights fundamental differences in expression of TFR subtypes within these functionally distinct islet cell types. Interestingly, the alpha cells appear to express a large proportion of factors associated with progenitor or stem-type cells, perhaps reflecting their earlier appearance during pancreatic development. The implications of these findings for a better understanding of alpha and beta cell dysfunction in diabetes mellitus are also considered.
Collapse
Affiliation(s)
- Jie Wang
- Departments of Biochemistry and Molecular Biology and Medicine and The Howard Hughes Medical Institute, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637
| | - Gene Webb
- Departments of Biochemistry and Molecular Biology and Medicine and The Howard Hughes Medical Institute, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637
| | - Yun Cao
- Departments of Biochemistry and Molecular Biology and Medicine and The Howard Hughes Medical Institute, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637
| | - Donald F. Steiner
- Departments of Biochemistry and Molecular Biology and Medicine and The Howard Hughes Medical Institute, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
104
|
Abstract
Fifty years after the publication of DNA structure, the whole human genome sequence will be officially finished. This achievement marks the beginning of the task to catalogue every human gene and identify each of their function expression patterns. Currently, researchers estimate that there are about 30,000 human genes and approximately 70% of these can be automatically predicted using a combination of ab initio and similarity-based programs. However, to experimentally investigate every gene's function, the research community requires a high-quality annotation of alternative splicing, pseudogenes, and promoter regions that can only be provided by manual intervention. Manual curation of the human genome will be a long-term project as experimental data are continually produced to confirm or refine the predictions, and new features such as noncoding RNAs and enhancers have not been fully identified. Such a highly curated human gene-set made publicly available will be a great asset for the experimental community and for future comparative genome projects.
Collapse
Affiliation(s)
- Jennifer L Ashurst
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom.
| | | |
Collapse
|
105
|
Lombardi MP, van den Hoff MJB, Ruijter JM, Luijerink M, Buffing AA, Markman MW, Moorman AFM, Lekanne Deprez RH. Expression analysis of subtractively enriched libraries (EASEL): a widely applicable approach to the identification of differentially expressed genes. JOURNAL OF BIOCHEMICAL AND BIOPHYSICAL METHODS 2003; 57:17-33. [PMID: 12834960 DOI: 10.1016/s0165-022x(03)00083-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A variety of methods for high throughput analysis of differential gene expression has been developed over the past years. We have implemented the EASEL technique that adds flexibility, efficiency and wide-applicability to these methods. The EASEL procedure is unique as it integrates several well established techniques and thereby offers a combination of subtractive hybridization of 3' cDNA ends with macroarrays analysis and Serial Analysis of Gene Expression (SAGE). In addition, once a set of interesting, differentially expressed genes is identified, the material required for follow up studies to test the hypothesis that the gene is truly involved in the process of interest is readily available. In this report, we first present a step-by-step validation of the procedure, since several of the incorporated steps had to be tailored to meet specific requirements and implied drastic modifications of the original methods. Secondly, we applied EASEL to the identification of up-regulated gene products in the outflow tract region of the embryonic rat heart. Here we provide evidence that at least two among the differentially expressed genes detected, follistatin-like protein gene and membrane type 1-metallo proteinase gene, are selectively up-regulated in the outflow tract, suggesting their involvement in the development of this region during embryogenesis.
Collapse
Affiliation(s)
- M Paola Lombardi
- Experimental and Molecular Cardiology Group, Cardiovascular Research Institute Amsterdam, Academic Medical Centre, Amsterdam, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
106
|
|
107
|
Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C, Retzel E, Whetten R, Sederoff R. Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proc Natl Acad Sci U S A 2003; 100:7383-8. [PMID: 12771380 PMCID: PMC165884 DOI: 10.1073/pnas.1132171100] [Citation(s) in RCA: 143] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Pinus taeda L. (loblolly pine) and Arabidopsis thaliana differ greatly in form, ecological niche, evolutionary history, and genome size. Arabidopsis is a small, herbaceous, annual dicotyledon, whereas pines are large, long-lived, coniferous forest trees. Such diverse plants might be expected to differ in a large number of functional genes. We have obtained and analyzed 59,797 expressed sequence tags (ESTs) from wood-forming tissues of loblolly pine and compared them to the gene sequences inferred from the complete sequence of the Arabidopsis genome. Approximately 50% of pine ESTs have no apparent homologs in Arabidopsis or any other angiosperm in public databases. When evaluated by using contigs containing long, high-quality sequences, we find a higher level of apparent homology between the inferred genes of these two species. For those contigs 1,100 bp or longer, approximately 90% have an apparent Arabidopsis homolog (E value < 10-10). Pines and Arabidopsis last shared a common ancestor approximately 300 million years ago. Few genes would be expected to retain high sequence similarity for this time if they did not have essential functions. These observations suggest substantial conservation of gene sequence in seed plants.
Collapse
Affiliation(s)
- Matias Kirst
- Functional Genomics and Genetics Graduate Program, North Carolina State University, Campus Box 7614, Raleigh, NC 27695, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
108
|
Kiyosawa H, Yamanaka I, Osato N, Kondo S, Hayashizaki Y. Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. Genome Res 2003; 13:1324-34. [PMID: 12819130 PMCID: PMC403655 DOI: 10.1101/gr.982903] [Citation(s) in RCA: 201] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
We have used the FANTOM2 mouse cDNA set (60,770 clones), public mRNA data, and mouse genome sequence data to identify 2481 pairs of sense-antisense transcripts and 899 further pairs of nonantisense bidirectional transcription based upon genomic mapping. The analysis greatly expands the number of known examples of sense-antisense transcript and nonantisense bidirectional transcription pairs in mammals. The FANTOM2 cDNA set appears to contain substantially large numbers of noncoding transcripts suitable for antisense transcript analysis. The average proportion of loci encoding sense-antisense transcript and nonantisense bidirectional transcription pairs on autosomes was 15.1 and 5.4%, respectively. Those on the X chromosome were 6.3 and 4.2%, respectively. Sense-antisense transcript pairs, rather than nonantisense bidirectional transcription pairs, may be less prevalent on the X chromosome, possibly due to X chromosome inactivation. Sense and antisense transcripts tended to be isolated from the same libraries, where nonantisense bidirectional transcription pairs were not apparently coregulated. The existence of large numbers of natural antisense transcripts implies that the regulation of gene expression by antisense transcripts is more common that previously recognized. The viewer showing mapping patterns of sense-antisense transcript pairs and nonantisense bidirectional transcription pairs on the genome and other related statistical data is available on our Web site.
Collapse
Affiliation(s)
- Hidenori Kiyosawa
- Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | |
Collapse
|
109
|
Zhang L, Pavlovic V, Cantor CR, Kasif S. Human-mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Res 2003; 13:1190-202. [PMID: 12743024 PMCID: PMC403647 DOI: 10.1101/gr.703903] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2002] [Accepted: 02/03/2003] [Indexed: 11/24/2022]
Abstract
The identification of genes in the human genome remains a challenge, as the actual predictions appear to disagree tremendously and vary dramatically on the basis of the specific gene-finding methodology used. Because the pattern of conservation in coding regions is expected to be different from intronic or intergenic regions, a comparative computational analysis can lead, in principle, to an improved computational identification of genes in the human genome by using a reference, such as mouse genome. However, this comparative methodology critically depends on three important factors: (1) the selection of the most appropriate reference genome. In particular, it is not clear whether the mouse is at the correct evolutionary distance from the human to provide sufficiently distinctive conservation levels in different genomic regions, (2) the selection of comparative features that provide the most benefit to gene recognition, and (3) the selection of evidence integration architecture that effectively interprets the comparative features. We address the first question by a novel evolutionary analysis that allows us to explicitly correlate the performance of the gene recognition system with the evolutionary distance (time) between the two genomes. Our simulation results indicate that there is a wide range of reference genomes at different evolutionary time points that appear to deliver reasonable comparative prediction of human genes. In particular, the evolutionary time between human and mouse generally falls in the region of good performance; however, better accuracy might be achieved with a reference genome further than mouse. To address the second question, we propose several natural comparative measures of conservation for identifying exons and exon boundaries. Finally, we experiment with Bayesian networks for the integration of comparative and compositional evidence.
Collapse
Affiliation(s)
- Lingang Zhang
- Center for Advanced Biotechnology, Boston University, Boston, Massachusetts 02215, USA
| | | | | | | |
Collapse
|
110
|
Bornholdt S, Röhl T. Self-organized critical neural networks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2003; 67:066118. [PMID: 16241315 DOI: 10.1103/physreve.67.066118] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2002] [Revised: 04/11/2003] [Indexed: 05/04/2023]
Abstract
A mechanism for self-organization of the degree of connectivity in model neural networks is studied. Network connectivity is regulated locally on the basis of an order parameter of the global dynamics, which is estimated from an observable at the single synapse level. This principle is studied in a two-dimensional neural network with randomly wired asymmetric weights. In this class of networks, network connectivity is closely related to a phase transition between ordered and disordered dynamics. A slow topology change is imposed on the network through a local rewiring rule motivated by activity-dependent synaptic development: Neighbor neurons whose activity is correlated, on average develop a new connection while uncorrelated neighbors tend to disconnect. As a result, robust self-organization of the network towards the order disorder transition occurs. Convergence is independent of initial conditions, robust against thermal noise, and does not require fine tuning of parameters.
Collapse
|
111
|
Carninci P, Waki K, Shiraki T, Konno H, Shibata K, Itoh M, Aizawa K, Arakawa T, Ishii Y, Sasaki D, Bono H, Kondo S, Sugahara Y, Saito R, Osato N, Fukuda S, Sato K, Watahiki A, Hirozane-Kishikawa T, Nakamura M, Shibata Y, Yasunishi A, Kikuchi N, Yoshiki A, Kusakabe M, Gustincich S, Beisel K, Pavan W, Aidinis V, Nakagawara A, Held WA, Iwata H, Kono T, Nakauchi H, Lyons P, Wells C, Hume DA, Fagiolini M, Hensch TK, Brinkmeier M, Camper S, Hirota J, Mombaerts P, Muramatsu M, Okazaki Y, Kawai J, Hayashizaki Y. Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res 2003; 13:1273-89. [PMID: 12819125 PMCID: PMC403712 DOI: 10.1101/gr.1119703] [Citation(s) in RCA: 142] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3'-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5' end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5'-end clusters identify regions that are potential promoters for 8637 known genes and 5'-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete.
Collapse
Affiliation(s)
- Piero Carninci
- Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
112
|
Li JB, Lin S, Jia H, Wu H, Roe BA, Kulp D, Stormo GD, Dutcher SK. Analysis of Chlamydomonas reinhardtii genome structure using large-scale sequencing of regions on linkage groups I and III. J Eukaryot Microbiol 2003; 50:145-55. [PMID: 12836870 DOI: 10.1111/j.1550-7408.2003.tb00109.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Chlamydomonas reinhardtii is a unicellular green alga that has been used as a model organism for the study of flagella and basal bodies as well as photosynthesis. This report analyzes finished genomic DNA sequence for 0.5% of the nuclear genome. We have used three gene prediction programs as well as EST and protein homology data to estimate the total number of genes in Chlamydomonas to be between 12,000 and 16,400. Chlamydomonas appears to have many more genes than any other unicellular organism sequenced to date. Twenty-seven percent of the predicted genes have significant identity to both ESTs and to known proteins in other organisms, 32% of the predicted genes have significant identity to ESTs alone, and 14% have significant similarity to known proteins in other organisms. For gene prediction in Chlamydomonas, GreenGenie appeared to have the highest sensitivity and specificity at the exon level, scoring 71% and 82%. respectively. Two new alternative splicing events were predicted by aligning Chlamydomonas ESTs to the genomic sequence. Finally recombination differs between the two sequenced contigs. The 350-Kb of the Linkage group III contig is devoid of recombination, while the Linkage group I contig is 30 map units long over 33-kb.
Collapse
Affiliation(s)
- Jin Billy Li
- Department of Genetics, Washington University School of Medicine, St Louis, Missouri 63110, USA
| | | | | | | | | | | | | | | |
Collapse
|
113
|
Norbert PW, Roses AD. Pharmacogenetics and pharmacogenomics: recent developments, their clinical relevance and some ethical, social, and legal implications. J Mol Med (Berl) 2003; 81:135-40. [PMID: 12755119 DOI: 10.1007/s00109-002-0415-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In recent debates on novel procedures of molecular medicine pharmacogenomics is attracting more and more attention as a genotype-based approach for improving safety and efficacy of the use of therapeutic substances. Promoted by basic knowledge generated in the field of medical genomics, facilitated by novel technological tools for mapping genetic variation in individuals, and supported by results of initial clinical studies linking specific genotypes to metabolic characteristics of individuals important for assessing drug response, procedures of pharmacogenetics and pharmacogenomics now are starting to impact significantly on clinical research and development and medical practice. In this situation assessing the goals, risk, and benefits of pharmacogenetics and pharmacogenomics is essential for the medically successful, ethically justifiable, and socially acceptable implementation of genotype-based diagnosis and pharmacotherapy. We discuss the current state of the art in pharmacogenetics and pharmacogenomics and introduce a model for evidence based assessment of its goals, risk, and benefits. We differentiate here between pragmatic and normative issues in the development of pharmacogenomics in order to contrast prevailing, insufficiently interest-based modes of public technology assessment with the evidence-based mode that can be established as part of clinical study design. Finally, we provide a framework for the analysis of social accountability that can be used for technology development and technology assessment with regard to pharmacogenomics in particular and molecular medicine in general.
Collapse
|
114
|
Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P, Gerstein M, Weissman S, Snyder M. The transcriptional activity of human Chromosome 22. Genes Dev 2003; 17:529-40. [PMID: 12600945 PMCID: PMC195998 DOI: 10.1101/gad.1055203] [Citation(s) in RCA: 237] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2002] [Accepted: 12/24/2002] [Indexed: 01/09/2023]
Abstract
A DNA microarray representing nearly all of the unique sequences of human Chromosome 22 was constructed and used to measure global-transcriptional activity in placental poly(A)(+) RNA. We found that many of the known, related and predicted genes are expressed. More importantly, our study reveals twice as many transcribed bases as have been reported previously. Many of the newly discovered expressed fragments were verified by RNA blot analysis and a novel technique called differential hybridization mapping (DHM). Interestingly, a significant fraction of these novel fragments are expressed antisense to previously annotated introns. The coding potential of these novel expressed regions is supported by their sequence conservation in the mouse genome. This study has greatly increased our understanding of the biological information encoded on a human chromosome. To facilitate the dissemination of these results to the scientific community, we have developed a comprehensive Web resource to present the findings of this study and other features of human Chromosome 22 at http://array.mbb.yale.edu/chr22.
Collapse
Affiliation(s)
- John L Rinn
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut 06520-8103, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
115
|
Brown WRA, Hubbard SJ, Tickle C, Wilson SA. The chicken as a model for large-scale analysis of vertebrate gene function. Nat Rev Genet 2003; 4:87-98. [PMID: 12560806 DOI: 10.1038/nrg998] [Citation(s) in RCA: 121] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Affiliation(s)
- William R A Brown
- Institute of Genetics, Nottingham University, Queen's Medical Centre, Nottingham NG7 2UH, UK
| | | | | | | |
Collapse
|
116
|
Chuang TJ, Lin WC, Lee HC, Wang CW, Hsiao KL, Wang ZH, Shieh D, Lin SC, Ch'ang LY. A complexity reduction algorithm for analysis and annotation of large genomic sequences. Genome Res 2003; 13:313-22. [PMID: 12566410 PMCID: PMC420370 DOI: 10.1101/gr.313703] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexity reduction algorithm for sequence analysis (CRASA) that enables direct alignment of cDNA sequences to the genome. This method features a progressive data structure in hierarchical orders to facilitate a fast and efficient search mechanism. CRASA implementation was tested with already annotated genomic sequences in two benchmark data sets and compared with 15 annotation programs (10 ab initio and 5 homology-based approaches) against the EST database. By the use of layered noise filters, the complexity of CRASA-matched data was reduced exponentially. The results from the benchmark tests showed that CRASA annotation excelled in both the sensitivity and specificity categories. When CRASA was applied to the analysis of human Chromosomes 21 and 22, an additional 83 potential genes were identified. With its large-scale processing capability, CRASA can be used as a robust tool for genome annotation with high accuracy by matching the EST sequences precisely to the genomic sequences.
Collapse
MESH Headings
- Algorithms
- Chromosomes, Human, Pair 21/genetics
- Chromosomes, Human, Pair 22/genetics
- DNA/analysis
- DNA/genetics
- DNA, Complementary/analysis
- DNA, Complementary/genetics
- Exons/genetics
- Expressed Sequence Tags
- Genes/genetics
- Genome, Human
- Humans
- Pseudogenes/genetics
- Reproducibility of Results
- Sensitivity and Specificity
- Sequence Alignment/methods
- Sequence Analysis, DNA/methods
- Sequence Analysis, DNA/trends
- Sequence Homology, Nucleic Acid
Collapse
Affiliation(s)
- Trees-Juen Chuang
- Bioinformatics Research Center, Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan
| | | | | | | | | | | | | | | | | |
Collapse
|
117
|
Sorek R, Safer HM. A novel algorithm for computational identification of contaminated EST libraries. Nucleic Acids Res 2003; 31:1067-74. [PMID: 12560505 PMCID: PMC149192 DOI: 10.1093/nar/gkg170] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A key goal of the Human Genome Project was to understand the complete set of human proteins, the proteome. Since the genome sequence by itself is not sufficient for predicting new genes and alternative splicing events that lead to new proteins, expressed sequence tags (ESTs) are used as the primary tool for these purposes. The high prevalence of artifacts in dbEST, however, often leads to invalid predictions. Here we describe a novel method for recognizing genomic DNA contamination and other artifacts that cannot be identified using current EST cleaning techniques. Our method uses the alignment of the entire set of ESTs to the human genome to identify highly contaminated EST libraries. We discovered 53 highly contaminated libraries and a subset of 24 766 ESTs from these libraries that probably represent contamination with genomic DNA, pre-mRNA, and ESTs that span non-canonical introns. Although this is only a small fraction of the entire EST dataset, each contaminating sequence could create a spurious transcript prediction. Indeed, in the clustering and assembly tool that we used, these sequences would have caused incorrect inference of 9575 new splice variants and 6370 new genes. Conclusions based on EST analysis, including prediction of alternative splicing, should be re-evaluated in light of these results. Our method, along with the identified set of contaminated sequences, will be essential for applications that depend on large EST datasets.
Collapse
Affiliation(s)
- Rotem Sorek
- Compugen Ltd, 72 Pinchas Rosen Street, Tel Aviv 69512, Israel.
| | | |
Collapse
|
118
|
Brown S, Chang JL, Sadee W, Babbitt PC. A semiautomated approach to gene discovery through expressed sequence tag data mining: discovery of new human transporter genes. AAPS PHARMSCI 2003; 5:E1. [PMID: 12713273 PMCID: PMC2751469 DOI: 10.1208/ps050101] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.
Collapse
Affiliation(s)
- Shoshana Brown
- Department of Biopharmaceutical Sciences, School of Pharmacy, University of California, San Francisco, 513 Parnassus St., 94143 San Francisco, CA
| | - Jean l. Chang
- Department of Biopharmaceutical Sciences, School of Pharmacy, University of California, San Francisco, 513 Parnassus St., 94143 San Francisco, CA
- Whitehead Institute/MIT Center for Genome Research, 320 Charles St., 02141 Cambridge, MA
| | - Wolfgang Sadee
- Department of Biopharmaceutical Sciences, School of Pharmacy, University of California, San Francisco, 513 Parnassus St., 94143 San Francisco, CA
- Ohio State University Medical Center, 333 W. 10th Ave., 43210-1239 Columbus, OH
| | - Patricia C. Babbitt
- Department of Biopharmaceutical Sciences, School of Pharmacy, University of California, San Francisco, 513 Parnassus St., 94143 San Francisco, CA
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, 94143 San Francisco, CA
| |
Collapse
|
119
|
McCarter JP, Mitreva MD, Martin J, Dante M, Wylie T, Rao U, Pape D, Bowers Y, Theising B, Murphy CV, Kloek AP, Chiapelli BJ, Clifton SW, Bird DM, Waterston RH. Analysis and functional classification of transcripts from the nematode Meloidogyne incognita. Genome Biol 2003; 4:R26. [PMID: 12702207 PMCID: PMC154577 DOI: 10.1186/gb-2003-4-4-r26] [Citation(s) in RCA: 115] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2002] [Revised: 02/17/2003] [Accepted: 02/28/2003] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Plant parasitic nematodes are major pathogens of most crops. Molecular characterization of these species as well as the development of new techniques for control can benefit from genomic approaches. As an entrée to characterizing plant parasitic nematode genomes, we analyzed 5,700 expressed sequence tags (ESTs) from second-stage larvae (L2) of the root-knot nematode Meloidogyne incognita. RESULTS From these, 1,625 EST clusters were formed and classified by function using the Gene Ontology (GO) hierarchy and the Kyoto KEGG database. L2 larvae, which represent the infective stage of the life cycle before plant invasion, express a diverse array of ligand-binding proteins and abundant cytoskeletal proteins. L2 are structurally similar to Caenorhabditis elegans dauer larva and the presence of transcripts encoding glyoxylate pathway enzymes in the M. incognita clusters suggests that root-knot nematode larvae metabolize lipid stores while in search of a host. Homology to other species was observed in 79% of translated cluster sequences, with the C. elegans genome providing more information than any other source. In addition to identifying putative nematode-specific and Tylenchida-specific genes, sequencing revealed previously uncharacterized horizontal gene transfer candidates in Meloidogyne with high identity to rhizobacterial genes including homologs of nodL acetyltransferase and novel cellulases. CONCLUSIONS With sequencing from plant parasitic nematodes accelerating, the approaches to transcript characterization described here can be applied to more extensive datasets and also provide a foundation for more complex genome analyses.
Collapse
Affiliation(s)
- James P McCarter
- Genome Sequencing Center, Department of Genetics, Box 8501, Washington University School of Medicine, St, Louis, MO 63108, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
120
|
Collins JE, Goward ME, Cole CG, Smink LJ, Huckle EJ, Knowles S, Bye JM, Beare DM, Dunham I. Reevaluating human gene annotation: a second-generation analysis of chromosome 22. Genome Res 2003; 13:27-36. [PMID: 12529303 PMCID: PMC430954 DOI: 10.1101/gr.695703] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We report a second-generation gene annotation of human chromosome 22. Using expressed sequence databases, comparative sequence analysis, and experimental verification, we have extended genes, fused previously fragmented structures, and identified new genes. The total length in exons of annotation was increased by 74% over our previously published annotation and includes 546 protein-coding genes and 234 pseudogenes. Thirty-two potential protein-coding annotations are partial copies of other genes, and may represent duplications on an evolutionary path to change or loss of function. We also identified 31 non-protein-coding transcripts, including 16 possible antisense RNAs. By extrapolation, we estimate the human genome contains 29,000-36,000 protein-coding genes, 21,300 pseudogenes, and 1500 antisense RNAs. We suggest that our revised annotation criteria provide a paradigm for future annotation of the human genome.
Collapse
Affiliation(s)
- John E Collins
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
121
|
Gray SG, Iglesias AH, Teh BT, Dangond F. Modulation of splicing events in histone deacetylase 3 by various extracellular and signal transduction pathways. Gene Expr 2003; 11:13-21. [PMID: 12691522 PMCID: PMC5991154 DOI: 10.3727/000000003783992342] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2002] [Indexed: 11/24/2022]
Abstract
Within the context of the chromatin environment histone deacetylases are important transcriptional regulators. Three classes of human histone deacetylases have currently been identified on the basis of their similarity to yeast proteins. The class I enzymes contain four members: HDACs 1-3 and HDAC8. Of these, HDAC3 is known to generate transcript variants with altered amino-terminal regions. Here we describe the identification of a novel splice variant of HDAC3, in which exon 3 is alternatively spliced from the messenger RNA transcript. We show that this human HDAC3 splice transcript is upregulated by treatments with histone deacetylase inhibitors. We also demonstrate evidence of splicing events in murine HDAC3 as a response to various signals, including switching between splice transcript isoforms following treatments with kinase inhibitors or by osmotic shock. In contrast, such switching events were not observed in human cells. These results indicate that differential pathways in mouse and human may control the regulation of HDAC3, and that splice variants may play important roles in responding to exogenous stimuli that act via signal transduction pathways.
Collapse
Affiliation(s)
- S. G. Gray
- *Van Andel Research Institute, Laboratory for Cancer Research, 333 Bostwick NE, Grand Rapids, MI 49503
| | - A. H. Iglesias
- †Laboratory of Transcriptional and Immune Regulation, Center for Neurologic Diseases, Brigham and Women’s Hospital Laboratories, 65 Landsdowne Street, Cambridge, MA 02139
| | - B. T. Teh
- *Van Andel Research Institute, Laboratory for Cancer Research, 333 Bostwick NE, Grand Rapids, MI 49503
| | - F. Dangond
- †Laboratory of Transcriptional and Immune Regulation, Center for Neurologic Diseases, Brigham and Women’s Hospital Laboratories, 65 Landsdowne Street, Cambridge, MA 02139
- Address correspondence to F. Dangond, Laboratory of Transcriptional and Immune Regulation, Center for Neurologic Diseases, Brigham and Women’s Hospital Laboratories, 65 Landsdowne Street, 3rd Floor, Cambridge, MA 02139. Tel: (617) 768-8591; Fax: (617) 768-8595; E-mail:
| |
Collapse
|
122
|
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigó R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES. Initial sequencing and comparative analysis of the mouse genome. Nature 2002; 420:520-62. [PMID: 12466850 DOI: 10.1038/nature01262] [Citation(s) in RCA: 4860] [Impact Index Per Article: 220.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2002] [Accepted: 10/31/2002] [Indexed: 12/18/2022]
Abstract
The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
Collapse
MESH Headings
- Animals
- Base Composition
- Chromosomes, Mammalian/genetics
- Conserved Sequence/genetics
- CpG Islands/genetics
- Evolution, Molecular
- Gene Expression Regulation
- Genes/genetics
- Genetic Variation/genetics
- Genome
- Genome, Human
- Genomics
- Humans
- Mice/classification
- Mice/genetics
- Mice, Knockout
- Mice, Transgenic
- Models, Animal
- Multigene Family/genetics
- Mutagenesis
- Neoplasms/genetics
- Physical Chromosome Mapping
- Proteome/genetics
- Pseudogenes/genetics
- Quantitative Trait Loci/genetics
- RNA, Untranslated/genetics
- Repetitive Sequences, Nucleic Acid/genetics
- Selection, Genetic
- Sequence Analysis, DNA
- Sex Chromosomes/genetics
- Species Specificity
- Synteny
Collapse
|
123
|
Geary RL, Wong JM, Rossini A, Schwartz SM, Adams LD. Expression profiling identifies 147 genes contributing to a unique primate neointimal smooth muscle cell phenotype. Arterioscler Thromb Vasc Biol 2002; 22:2010-6. [PMID: 12482827 DOI: 10.1161/01.atv.0000038147.93527.35] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
OBJECTIVE This study represents the first in an effort to systematically characterize different intimas by using expression array analysis. METHODS AND RESULTS We compared smooth muscle cells (SMCs) of the neointima formed 4 weeks after aortic grafting with those from normal aorta and vena cava from cynomolgus monkeys. Hybridization to cDNA arrays identified subsets of 147 and 45 genes differentially expressed in the neointima versus the aorta and vena cava, respectively. The expression pattern differentiating neointima from aortic SMCs was characterized largely by suppression. Only 13 genes were induced in the neointima: 7 encoded matrix proteins (6 collagens and 1 versican) and 2 encoded inducers of matrix synthesis (osteoblast-specific factor-2/Cbfa1 and connective tissue growth factor). The genes suppressed most in the neointima included the regulator of G-protein signaling-5, SPARClike-1/hevin, and nonmuscle myosin heavy chain-B. A smaller gene set differentiated the neointima from the vena cava. Most were induced (39 of 45 genes), and overlap with the neointima-aorta set was significant (10 of 13 genes). Array results were validated with Northern analysis, in situ hybridization, or immunohistochemistry. CONCLUSIONS These data underscore the importance of matrix synthesis in neointimal maturation, and novel genes, newly associated with neointimal SMCs (regulator of G-protein signaling-5 and osteoblast-specific factor-2/Cbfa1), have raised new hypotheses regarding the pathogenesis of intimal hyperplasia.
Collapse
MESH Headings
- Animals
- Aorta/chemistry
- Aorta/metabolism
- Aorta/transplantation
- Blotting, Northern/methods
- Chondroitin Sulfate Proteoglycans/genetics
- Collagen Type I/genetics
- Gene Expression Profiling/methods
- Gene Expression Profiling/statistics & numerical data
- Genes/genetics
- Iliac Artery/chemistry
- Iliac Artery/metabolism
- In Situ Hybridization/methods
- Lectins, C-Type
- Macaca fascicularis
- Muscle, Smooth, Vascular/chemistry
- Muscle, Smooth, Vascular/cytology
- Muscle, Smooth, Vascular/metabolism
- Myosin Heavy Chains/genetics
- Oligonucleotide Array Sequence Analysis/methods
- Oligonucleotide Array Sequence Analysis/statistics & numerical data
- Phenotype
- RGS Proteins/genetics
- RNA, Ribosomal, 28S/genetics
- Tunica Intima/chemistry
- Tunica Intima/metabolism
- Venae Cavae/chemistry
- Venae Cavae/metabolism
- Venae Cavae/transplantation
- Versicans
Collapse
Affiliation(s)
- Randolph L Geary
- Department of Surgery, Wake Forest University School of Medicine, Winston-Salem, NC 27157, USA.
| | | | | | | | | |
Collapse
|
124
|
Kan Z, States D, Gish W. Selecting for functional alternative splices in ESTs. Genome Res 2002; 12:1837-45. [PMID: 12466287 PMCID: PMC187565 DOI: 10.1101/gr.764102] [Citation(s) in RCA: 141] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2002] [Accepted: 09/30/2002] [Indexed: 11/24/2022]
Abstract
The expressed sequence tag (EST) collection in dbEST provides an extensive resource for detecting alternative splicing on a genomic scale. Using genomically aligned ESTs, a computational tool (TAP) was used to identify alternative splice patterns for 6400 known human genes from the RefSeq database. With sufficient EST coverage, one or more alternatively spliced forms could be detected for nearly all genes examined. To identify high (>95%) confidence observations of alternative splicing, splice variants were clustered on the basis of having mutually exclusive structures, and sample statistics were then applied. Through this selection, alternative splices expected at a frequency of >5% within their respective clusters were seen for only 17%-28% of genes. Although intron retention events (potentially unspliced messages) had been seen for 36% of the genes overall, the same statistical selection yielded reliable cases of intron retention for <5% of genes. For high-confidence alternative splices in the human ESTs, we also noted significantly higher rates both of cross-species conservation in mouse ESTs and of validation in the GenBank mRNA collection. We suggest quantitative analytical approaches such as these can aid in selecting useful targets for further experimental characterization and in so doing may help elucidate the mechanisms and biological implications of alternative splicing.
Collapse
Affiliation(s)
- Zhengyan Kan
- Department of Genetics, Washington University, St. Louis, Missouri 63110, USA
| | | | | |
Collapse
|
125
|
Murphy D. Gene expression studies using microarrays: principles, problems, and prospects. ADVANCES IN PHYSIOLOGY EDUCATION 2002; 26:256-270. [PMID: 12443997 DOI: 10.1152/advan.00043.2002] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
A number of mammalian genomes having been sequenced, an important next step is to catalog the expression patterns of all transcription units in health and disease by use of microarrays. Such discovery programs are crucial to our understanding of the gene networks that control developmental, physiological, and pathological processes. However, despite the excitement, the full promise of microarray technology has yet to be realized, as the superficial simplicity of the concept belies considerable problems. Microarray technology is very new; methodologies are still evolving, common standards have yet to be established, and many problems with experimental design and variability have still to be fully understood and overcome. This review will describe the time course of a microarray experiment-RNA isolation from sample, target preparation, hybridization to the microarray probe, data capture, and bioinformatic analysis. For each stage, the advantages and disadvantages of competing techniques are compared, and inherent sources of error are identified and discussed.
Collapse
Affiliation(s)
- David Murphy
- University of Bristol Research Centre for Neuroendocrinology, Bristol Royal Infirmary, Bristol BS2 8HW, England.
| |
Collapse
|
126
|
Boardman PE, Sanz-Ezquerro J, Overton IM, Burt DW, Bosch E, Fong WT, Tickle C, Brown WRA, Wilson SA, Hubbard SJ. A comprehensive collection of chicken cDNAs. Curr Biol 2002; 12:1965-9. [PMID: 12445392 DOI: 10.1016/s0960-9822(02)01296-4] [Citation(s) in RCA: 268] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Birds have played a central role in many biological disciplines, particularly ecology, evolution, and behavior. The chicken, as a model vertebrate, also represents an important experimental system for developmental biologists, immunologists, cell biologists, and geneticists. However, genomic resources for the chicken have lagged behind those for other model organisms, with only 1845 nonredundant full-length chicken cDNA sequences currently deposited in the EMBL databank. We describe a large-scale expressed-sequence-tag (EST) project aimed at gene discovery in chickens (http://www.chick.umist.ac.uk). In total, 339,314 ESTs have been sequenced from 64 cDNA libraries generated from 21 different embryonic and adult tissues. These were clustered and assembled into 85,486 contiguous sequences (contigs). We find that a minimum of 38% of the contigs have orthologs in other organisms and define an upper limit of 13,000 new chicken genes. The remaining contigs may include novel avian specific or rapidly evolving genes. Comparison of the contigs with known chicken genes and orthologs indicates that 30% include cDNAs that contain the start codon and 20% of the contigs represent full-length cDNA sequences. Using this dataset, we estimate that chickens have approximately 35,000 genes in total, suggesting that this number may be a characteristic feature of vertebrates.
Collapse
Affiliation(s)
- Paul E Boardman
- Department of Biomolecular Sciences, University of Manchester Institute of Science and Technology, P.O. Box 88, M60 1QD, Manchester, United Kingdom
| | | | | | | | | | | | | | | | | | | |
Collapse
|
127
|
Abstract
An increasingly popular model of regulation is to represent networks of genes as if they directly affect each other. Although such gene networks are phenomenological because they do not explicitly represent the proteins and metabolites that mediate cell interactions, they are a logical way of describing phenomena observed with transcription profiling, such as those that occur with popular microarray technology. The ability to create gene networks from experimental data and use them to reason about their dynamics and design principles will increase our understanding of cellular function. We propose that gene networks are also a good way to describe function unequivocally, and that they could be used for genome functional annotation. Here, we review some of the concepts and methods associated with gene networks, with emphasis on their construction based on experimental data.
Collapse
Affiliation(s)
- Paul Brazhnik
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | | | | |
Collapse
|
128
|
Zhang Z, Harrison P, Gerstein M. Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Res 2002; 12:1466-82. [PMID: 12368239 PMCID: PMC187539 DOI: 10.1101/gr.331902] [Citation(s) in RCA: 146] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2002] [Accepted: 08/12/2002] [Indexed: 11/24/2022]
Abstract
Mammals have 79 ribosomal proteins (RP). Using a systematic procedure based on sequence-homology, we have comprehensively identified pseudogenes of these proteins in the human genome. Our assignments are available at http://www.pseudogene.org or http://bioinfo.mbb.yale.edu/genome/pseudogene. In total, we found 2090 processed pseudogenes and 16 duplications of RP genes. In relation to the matching parent protein, each of the processed pseudogenes has an average relative sequence length of 97% and an average sequence identity of 76%. A small number (258) of them do not contain obvious disablements (stop codons or frameshifts) and, therefore, could be mistaken as functional genes, and 178 are disrupted by one or more repetitive elements. On average, processed pseudogenes have a longer truncation at the 5' end than the 3' end, consistent with the target-primed-reverse-transcription (TPRT) mechanism. Interestingly, on chromosome 16, an RPL26 processed pseudogene was found in the intron region of a functional RPS2 gene. The large-scale distribution of RP pseudogenes throughout the genome appears to result, chiefly, from random insertions with the numbers on each chromosome, consequently, proportional to its size. In contrast to RP genes, the RP pseudogenes have the highest density in GC-intermediate regions (41%-46%) of the genome, with the density pattern being between that of LINEs and Alus. This can be explained by a negative selection theory as we observed that GC-rich RP pseudogenes decay faster in GC-poor regions. Also, we observed a correlation between the number of processed pseudogenes and the GC content of the associated functional gene, i.e., relatively GC-poor RPs have more processed pseudogenes. This ranges from 145 pseudogenes for RPL21 down to 3 pseudogenes for RPL14. We were able to date the RP pseudogenes based on their sequence divergence from present-day RP genes, finding an age distribution similar to that for Alus. The distribution is consistent with a decline in retrotransposition activity in the hominid lineage during the last 40 Myr. We discuss the implications for retrotransposon stability and genome dynamics based on these new findings.
Collapse
Affiliation(s)
- Zhaolei Zhang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | | | | |
Collapse
|
129
|
Elalouf JM, Aude JC, Billon E, Cheval L, Doucet A, Virlon B. Renal transcriptomes: segmental analysis of differential expression. EXPERIMENTAL NEPHROLOGY 2002; 10:75-81. [PMID: 11937754 DOI: 10.1159/000049902] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
BACKGROUND/AIMS Progress accomplished by complete genomes and cDNA-sequencing projects calls for methods that fully use these resources to study gene expression patterns in characterized cell populations. However, since the number of functional genes cannot be readily inferred from the genomic sequence, it is highly desirable to make use of methods enabling to study both known and unknown genes. METHODS The method of serial analysis of gene expression provides short diagnostic cDNA tags without bias towards known genes. In addition, the frequency of each tag in the library conveys quantitative information on gene expression. A microassay was set-up to perform serial analysis of gene expression in minute samples such as those obtained by microdissecting nephron segments. RESULTS Studies carried out in the thick ascending limb of Henle's loop and the collecting duct of the mouse kidney provided expression data for several thousand genes. Known markers were found appropriately enriched, and several of the thick ascending limb or collecting duct specific transcripts had no database match. CONCLUSIONS The microassay for serial analysis of gene expression makes possible large-scale quantitative measurements of mRNA levels in nephron segments. The comprehensive picture generated by analyzing both known and unknown transcripts in defined cell populations should help to discover genes with dedicated functions.
Collapse
Affiliation(s)
- Jean-Marc Elalouf
- Département de Biologie Cellulaire et Moléculaire, Service de Biologie Cellulaire, CNRS URA 1859, CEA SACLAY, F-91191 Gif-sur-Yvette, France.
| | | | | | | | | | | |
Collapse
|
130
|
Bailey SN, Wu RZ, Sabatini DM. Applications of transfected cell microarrays in high-throughput drug discovery. Drug Discov Today 2002; 7:S113-8. [PMID: 12546876 DOI: 10.1016/s1359-6446(02)02386-3] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
DNA microarrays and, more recently, protein microarrays, have become important tools for high-throughput genomic and proteomic studies. Transfected cell microarrays are a complementary technique in which array features comprise clusters of cells overexpressing defined cDNAs. Complementary DNAs cloned in expression vectors are printed on microscope slides, which become living arrays after the addition of a lipid transfection reagent and adherent mammalian cells. This article discusses two potential uses of cell microarrays in drug discovery: as a method of screening for gene products involved in biological processes of pharmaceutical interest and as in situ protein microarrays for the development and assessment of leads.
Collapse
Affiliation(s)
- Steve N Bailey
- Whitehead Institute of Biomedical Research, Cambridge, MA 02142, USA
| | | | | |
Collapse
|
131
|
|
132
|
Tra J, Kondo T, Lu Q, Kuick R, Hanash S, Richardson B. Infrequent occurrence of age-dependent changes in CpG island methylation as detected by restriction landmark genome scanning. Mech Ageing Dev 2002; 123:1487-503. [PMID: 12425956 DOI: 10.1016/s0047-6374(02)00080-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Hypermethylation of CpG islands, resulting in the inactivation of tumor suppressor genes, is an early event in the development of some malignancies. Recent studies suggest that this abnormal methylation may be a function of aging. The number of CpG islands that methylate with age is unknown. We used restriction landmark genome scanning (RLGS) to approximate the extent to which CpG islands change methylation status during aging. Comparison of more than 2000 loci in T lymphocytes isolated from newborn, middle age, and elderly people revealed that 29 loci ( approximately 1%) changed methylation status during aging, with 23 increasing methylation, and six decreasing. The same subset also changed methylation status with age in the esophagus, lung, and pancreas, but in variable directions. Virtual genome scanning identified one of these loci as a member of the forkhead family, recently implicated in aging, and another as an EST fragment. The methylation status of both correlated with level of expression. Confirming studies in multiple tissues from normal and DNMT1(+/-) mice demonstrated only one age dependent change in the methylation of more than 2000 loci, occurring in liver and kidney. These results indicate that the methylation status of the majority of CpG islands in both mice and humans is tightly controlled during aging, and that changes are infrequent and in humans confined to a specific subset of genes.
Collapse
Affiliation(s)
- John Tra
- Department of Pediatrics and Infectious Diseases, University of Michigan, Ann Arbor, MI 48109, USA
| | | | | | | | | | | |
Collapse
|
133
|
Saccone S, Pavlicek A, Federico C, Paces J, Bernard G. Genes, isochores and bands in human chromosomes 21 and 22. Chromosome Res 2002; 9:533-9. [PMID: 11721952 DOI: 10.1023/a:1012443217627] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The recently available DNA sequences from chromosomes 21 and 22 enabled us to define the relationships of different band types with isochores and with gene concentration and to compare these relationships with previous results. We showed that chromosomal bands appear as Giemsa or Reverse bands depending not on their absolute GC level, but on the composition GC level relative to those of adjacent contiguous bands. We also demonstrated that the GC-richest, and gene-richest H3+ bands are characterized by a lower DNA compaction compared with the GC-poorest, gene-poorest L1+ bands. Moreover, our results indicate that the human genome contains about 30,000 genes.
Collapse
Affiliation(s)
- S Saccone
- Dipartimento di Protezione e Valorizzazione Agroalimentare, University of Bologna, Reggio Emilia, Italy
| | | | | | | | | |
Collapse
|
134
|
Goldie-Cregan LC, Croager EJ, Abraham LJ. Characterization of the murine CD30 ligand (CD153) gene: gene structure and expression. TISSUE ANTIGENS 2002; 60:139-46. [PMID: 12392508 DOI: 10.1034/j.1399-0039.2002.600204.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
CD153 (CD30 ligand) has been described as a 40-kDa type II transmembrane glycoprotein belonging to the TNF superfamily and is expressed primarily by activated T cells, B cells and monocytes. In this study, we have determined that the murine CD153 gene consists of four exons, with three intervening introns, spaced over approximately 26 kb of genomic sequence. Sequence analysis of the murine CD153 promoter and 5' flanking region revealed the presence of a TATA box element immediately upstream of two tsp sites, together with putative binding motifs for a variety of lymphoid-specific transcription factors. 5'RACE analysis of LPS-stimulated RAW264.7 macrophage cDNA identified at least four transcriptional start sites for murine CD153, with two sites occurring downstream of the previously predicted translation initiation codon. Additionally, 5' RACE analysis identified multiple murine CD153 polyadenylation sites. Our results indicate that primary murine CD153 transcripts may vary from 26 kb to approximately 28 kb in length.
Collapse
|
135
|
Kochiwa H, Suzuki R, Washio T, Saito R, Bono H, Carninci P, Okazaki Y, Miki R, Hayashizaki Y, Tomita M. Inferring alternative splicing patterns in mouse from a full-length cDNA library and microarray data. Genome Res 2002; 12:1286-93. [PMID: 12176936 PMCID: PMC186638 DOI: 10.1101/gr.220302] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Although many studies on alternative splicing of specific genes have been reported in the literature, the general mechanism that regulates alternative splicing has not been clearly understood. In this study, we systematically aligned each pair of the 21,076 cDNA sequences of Mus musculus, searched for putative alternative splicing patterns, and constructed a list of potential alternative splicing sites. Two cDNAs are suspected to be alternatively spliced and originating from a common gene if they share most of their region with a high degree of sequence homology, but parts of the sequences are very distinctive or deleted in either cDNA. The list contains the following information: (1) tissue, (2) developmental stage, (3) sequences around splice sites, (4) the length of each gapped region, and (5) other comments. The list is available at http://www.bioinfo.sfc.keio.ac.jp/intron. Our results have predicted a number of unreported alternatively spliced genes, some of which are expressed only in a specific tissue or at a specific developmental stage.
Collapse
Affiliation(s)
- Hiromi Kochiwa
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| | | | | | | | | | | | | | | | | | | |
Collapse
|
136
|
Weissenbach J. Human genome project: past, present, future. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2002:1-9. [PMID: 11859560 DOI: 10.1007/978-3-662-04667-8_1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
137
|
Roest Crollius H, Jaillon O, Bernot A, Pelletier E, Dasilva C, Bouneau L, Burge C, Yeh RF, Quetier F, Saurin W, Weissenbach J. Genome-wide comparisons between human and tetraodon. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2002:11-29. [PMID: 11859562 DOI: 10.1007/978-3-662-04667-8_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
138
|
REPLY. Plast Reconstr Surg 2002. [DOI: 10.1097/00006534-200206000-00075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
139
|
Reymond A, Camargo AA, Deutsch S, Stevenson BJ, Parmigiani RB, Ucla C, Bettoni F, Rossier C, Lyle R, Guipponi M, de Souza S, Iseli C, Jongeneel CV, Bucher P, Simpson AJG, Antonarakis SE. Nineteen additional unpredicted transcripts from human chromosome 21. Genomics 2002; 79:824-32. [PMID: 12036297 DOI: 10.1006/geno.2002.6781] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The identification of all human chromosome 21 (HC21) genes is a necessary step in understanding the molecular pathogenesis of trisomy 21 (Down syndrome). The first analysis of the sequence of 21q included 127 previously characterized genes and predicted an additional 98 novel anonymous genes. Recently we evaluated the quality of this annotation by characterizing a set of HC21 open reading frames (C21orfs) identified by mapping spliced expressed sequence tags (ESTs) and predicted genes (PREDs), identified only in silico. This study underscored the limitations of in silico-only gene prediction, as many PREDs were incorrectly predicted. To refine the HC21 annotation, we have developed a reliable algorithm to extract and stringently map sequences that contain bona fide 3' transcript ends to the genome. We then created a specific 21q graphical display allowing an integrated view of the data that incorporates new ESTs as well as features such as CpG islands, repeats, and gene predictions. Using these tools we identified 27 new putative genes. To validate these, we sequenced previously cloned cDNAs and carried out RT-PCR, 5'- and 3'-RACE procedures, and comparative mapping. These approaches substantiated 19 new transcripts, thus increasing the HC21 gene count by 9.5%. These transcripts were likely not previously identified because they are small and encode small proteins. We also identified four transcriptional units that are spliced but contain no obvious open reading frame. The HC21 data presented here further emphasize that current gene prediction algorithms miss a substantial number of transcripts that nevertheless can be identified using a combination of experimental approaches and multiple refined algorithms.
Collapse
Affiliation(s)
- Alexandre Reymond
- Division of Medical Genetics, University of Geneva Medical School, 1211 Geneva, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
140
|
|
141
|
Harrison PM, Gerstein M. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol 2002; 318:1155-74. [PMID: 12083509 DOI: 10.1016/s0022-2836(02)00109-2] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein families can be used to understand many aspects of genomes, both their "live" and their "dead" parts (i.e. genes and pseudogenes). Surveys of genomes have revealed that, in every organism, there are always a few large families and many small ones, with the overall distribution following a power-law. This commonality is equally true for both genes and pseudogenes, and exists despite the fact that the specific families that are enlarged differ greatly between organisms. Furthermore, because of family structure there is great redundancy in proteomes, a fact linked to the large number of dispensable genes for each organism and the small size of the minimal, indispensable sub-proteome. Pseudogenes in prokaryotes represent families that are in the process of being dispensed with. In particular, the genome sequences of certain pathogenic bacteria (Mycobacterium leprae, Yersinia pestis and Rickettsia prowazekii) show how an organism can undergo reductive evolution on a large scale (i.e. the dying out of families) as a result of niche change. There appears to be less pressure to delete pseudogenes in eukaryotes. These can be divided into two varieties, duplicated and processed, where the latter involves reverse transcription from an mRNA intermediate. We discuss these collectively in yeast, worm, fly, and human. The fly has few pseudogenes apparently because of its high rate of genomic DNA deletion. In the other three organisms, the distribution of pseudogenes on the chromosome and amongst different families is highly non-uniform. Pseudogenes tend not to occur in the middle of chromosome arms, and tend to be associated with lineage-specific (as opposed to highly conserved) families that have environmental-response functions. This may be because, rather than being dead, they may form a reservoir of diverse "extra parts" that can be resurrected to help an organism adapt to its surroundings. In yeast, there may be a novel mechanism involving the [PSI+] prion that potentially enables this resurrection. In worm, the pseudogenes tend to arise out of families (e.g. chemoreceptors) that are greatly expanded in it compared to the fly. The human genome stands out in having many processed pseudogenes. These have a character very different from those of the duplicated variety, to a large extent just representing random insertions. Thus, their occurrence tends to be roughly in proportion to the amount of mRNA for a particular protein and to reflect the extent of the intergenic sequences. Further information about pseudogenes is available at http://genecensus.org/pseudogene
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8114, USA
| | | |
Collapse
|
142
|
Qiu P, Benbow L, Liu S, Greene JR, Wang L. Analysis of a human brain transcriptome map. BMC Genomics 2002; 3:10. [PMID: 11955288 PMCID: PMC103672 DOI: 10.1186/1471-2164-3-10] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2001] [Accepted: 04/16/2002] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome wide transcriptome maps can provide tools to identify candidate genes that are over-expressed or silenced in certain disease tissue and increase our understanding of the structure and organization of the genome. Expressed Sequence Tags (ESTs) from the public dbEST and proprietary Incyte LifeSeq databases were used to derive a transcript map in conjunction with the working draft assembly of the human genome sequence. RESULTS Examination of ESTs derived from brain tissues (excluding brain tumor tissues) suggests that these genes are distributed on chromosomes in a non-random fashion. Some regions on the genome are dense with brain-enriched genes while some regions lack brain-enriched genes, suggesting a significant correlation between distribution of genes along the chromosome and tissue type. ESTs from brain tumor tissues have also been mapped to the human genome working draft. We reveal that some regions enriched in brain genes show a significant decrease in gene expression in brain tumors, and, conversely that some regions lacking in brain genes show an increased level of gene expression in brain tumors. CONCLUSIONS This report demonstrates a novel approach for tissue specific transcriptome mapping using EST-based quantitative assessment.
Collapse
Affiliation(s)
- Ping Qiu
- Bioinformatics Group and Human Genomic Research Department, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, New Jersey 07033, USA
| | - Lawrence Benbow
- Bioinformatics Group and Human Genomic Research Department, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, New Jersey 07033, USA
| | - Suxing Liu
- Tumor Biology Department, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, New Jersey 07033, USA
| | - Jonathan R Greene
- Bioinformatics Group and Human Genomic Research Department, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, New Jersey 07033, USA
| | - Luquan Wang
- Bioinformatics Group and Human Genomic Research Department, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, New Jersey 07033, USA
| |
Collapse
|
143
|
Loy AL, Goodnow CC. Novel approaches for identifying genes regulating lymphocyte development and function. Curr Opin Immunol 2002; 14:260-5. [PMID: 11869902 DOI: 10.1016/s0952-7915(02)00331-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The draft sequence of the human and mouse genomes provides an unparalleled opportunity for understanding the genetic control of immune-cell development. Strategies can begin with a gene sequence and pursue a putative immune-system function by employing mRNA-expression profiling or creating gene knockouts in embryonic stem cells. The latter can be produced by utilising the Cre/Lox system, a tetracycline operon, a gene-trap method or chemical mutagenesis. Alternatively, mutant phenotypes (derived using the mutagen ethylnitrosourea) can be traced back to gene sequences.
Collapse
Affiliation(s)
- Adèle L Loy
- Australian Cancer Research Foundation (ACRF) Genetics Laboratory and Medical Genome Centre, John Curtin School of Medical Research, Australian National University, Canberra, Australia.
| | | |
Collapse
|
144
|
Eley GD, Reiter JL, Pandita A, Park S, Jenkins RB, Maihle NJ, James CD. A chromosomal region 7p11.2 transcript map: its development and application to the study of EGFR amplicons in glioblastoma. Neuro Oncol 2002; 4:86-94. [PMID: 11916499 PMCID: PMC1920657 DOI: 10.1093/neuonc/4.2.86] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2001] [Accepted: 01/02/2002] [Indexed: 11/12/2022] Open
Abstract
Cumulative information available about the organization of amplified chromosomal regions in human tumors suggests that the amplification repeat units, or amplicons, can be of a simple or complex nature. For the former, amplified regions generally retain their native chromosomal configuration and involve a single amplification target sequence. For complex amplicons, amplified DNAs usually undergo substantial reorganization relative to the normal chromosomal regions from which they evolve, and the regions subject to amplification may contain multiple target sequences. Previous efforts to characterize the 7p11.2 epidermal growth factor receptor ) amplicon in glioblastoma have relied primarily on the use of markers positioned by linkage analysis and/or radiation hybrid mapping, both of which are known to have the potential for being inaccurate when attempting to order loci over relatively short (<1 Mb) chromosomal regions. Due to the limited resolution of genetic maps that have been established through the use of these approaches, we have constructed a 2-Mb bacterial and P1-derived artificial chromosome (BAC-PAC) contig for the EGFR region and have applied markers positioned on its associated physical map to the analysis of 7p11.2 amplifications in a series of glioblastomas. Our data indicate that EGFR is the sole amplification target within the mapped region, although there are several additional 7p11.2 genes that can be coamplified and overexpressed with EGFR. Furthermore, these results are consistent with EGFR amplicons retaining the same organization as the native chromosome 7p11.2 region from which they are derived.
Collapse
Affiliation(s)
- Greg D Eley
- Department of Laboratory Medicine and Pathology and Tumor Biology Program, Mayo Clinic, Rochester, MN 55905, USA
| | | | | | | | | | | | | |
Collapse
|
145
|
Korke R, Rink A, Seow TK, Chung MCM, Beattie CW, Hu WS. Genomic and proteomic perspectives in cell culture engineering. J Biotechnol 2002; 94:73-92. [PMID: 11792453 DOI: 10.1016/s0168-1656(01)00420-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
In the last few years, the number of biologics produced by mammalian cells have been steadily increasing. The advances in cell culture engineering science have contributed significantly to this increase. A common path of product and process development has emerged in the last decade and the host cell lines frequently used have converged to only a few. Selection of cell clones, their adaptation to a desired growth environment, and improving their productivity has been key to developing a new process. However, the fundamental understanding of changes during the selection and adaptation process is still lacking. Some cells may undergo irreversible alteration at the genome level, some may exhibit changes in their gene expression pattern, while others may incur neither genetic reconstruction nor gene expression changes, but only modulation of various fluxes by changing nutrient/metabolite concentrations and enzyme activities. It is likely that the selection of cell clones and their adaptation to various culture conditions may involve alterations not only in cellular machinery directly related to the selected marker or adapted behavior, but also those which may or may not be essential for selection or adaptation. The genomic and proteomic research tools enable one to globally survey the alterations at mRNA and protein levels and to unveil their regulation. Undoubtedly, a better understanding of these cellular processes at the molecular level will lead to a better strategy for 'designing' producing cells. Herein the genomic and proteomic tools are briefly reviewed and their impact on cell culture engineering is discussed.
Collapse
Affiliation(s)
- Rashmi Korke
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, MN, USA
| | | | | | | | | | | |
Collapse
|
146
|
Abstract
Inherited diseases are associated with profound phenotypic variability, which is affected strongly by genetic modifiers. The splicing machinery could be one such modifying system, through a mechanism involving splicing motifs and their interaction with a complex repertoire of splicing factors. Mutations in splicing motifs and changes in levels of splicing factors can result in different splicing patterns. Changes in the level of normal transcripts or in the relative pattern of different mRNA isoforms affect disease expression, leading to phenotypic variability. Here, we discuss the splicing machinery in terms of its significance in disease severity and its potential role as a genetic modifier.
Collapse
Affiliation(s)
- Malka Nissim-Rafinia
- Dept of Genetics, The Life Sciences Institute, The Hebrew University, 91904, Jerusalem, Israel
| | | |
Collapse
|
147
|
Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J. Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res 2002; 12:493-502. [PMID: 11875039 PMCID: PMC155294 DOI: 10.1101/gr.212002] [Citation(s) in RCA: 122] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Comparative genomics promises to rapidly accelerate the identification and functional classification of biologically important human genes. We developed the TIGR Orthologous Gene Alignment (TOGA; <http://www.tigr.org/tdb/toga/toga.shtml>) database to provide a cross-reference between fully and partially sequenced eukaryotic transcribed sequences. Starting with the assembled expressed sequence tag (EST) and gene sequences that comprise the 28 TIGR Gene Indices, we used high-stringency pair-wise sequence searches and a reflexive, transitive closure process to associate sequence-specific best hits, generating 32,652 tentative ortholog groups (TOGs). This has allowed us to identify putative orthologs and paralogs for known genes, as well as those that exist only as uncharacterized ESTs and to provide links to additional information including genome sequence and mapping data. TOGA provides an important new resource for the analysis of gene function in eukaryotes. In addition, an analysis of the most widely represented sequences can begin to provide insight into eukaryotic biological processes.
Collapse
Affiliation(s)
- Yuandan Lee
- The Institute for Genomic Research, Rockville, Maryland 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
148
|
Harrison PM, Kumar A, Lang N, Snyder M, Gerstein M. A question of size: the eukaryotic proteome and the problems in defining it. Nucleic Acids Res 2002; 30:1083-90. [PMID: 11861898 PMCID: PMC101239 DOI: 10.1093/nar/30.5.1083] [Citation(s) in RCA: 129] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2001] [Revised: 12/20/2001] [Accepted: 01/02/2002] [Indexed: 11/14/2022] Open
Abstract
We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to one-fifth of the 'current' proteome. We discuss these in the context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences ('the orfome'). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genome-annotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome, the relationship between gene number and proteome size is far from simple. We survey the current estimates for the numbers of human genes and, from this, we estimate a range for the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes ('dead' genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at http://genecensus.org/yeast/orfome)
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, PO Box 208114, New Haven, CT 06520-8114, USA
| | | | | | | | | |
Collapse
|
149
|
Camargo AA, de Souza SJ, Brentani RR, Simpson AJG. Human gene discovery through experimental definition of transcribed regions of the human genome. Curr Opin Chem Biol 2002; 6:13-6. [PMID: 11827817 DOI: 10.1016/s1367-5931(01)00279-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
The sequencing of the human genome has failed to realize its primary goal: the identification of all human genes. We have learned that genes can only be identified with certainty within this vast and information-sparse structure by comparison with transcript sequences. Significantly more sequence data of this kind is required before we can claim to have deciphered our genetic blueprint.
Collapse
Affiliation(s)
- Anamaria A Camargo
- The Ludwig Institute for Cancer Research, Rua Professor Antonio Prudente, 109, 4th floor, Saõ Paulo, 01509-010, SP, Brazil
| | | | | | | |
Collapse
|
150
|
Harrison PM, Hegyi H, Balasubramanian S, Luscombe NM, Bertone P, Echols N, Johnson T, Gerstein M. Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res 2002; 12:272-80. [PMID: 11827946 PMCID: PMC155275 DOI: 10.1101/gr.207102] [Citation(s) in RCA: 151] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e., mid-sequence stop codons or frameshifts), while ensuring minimal overlap with annotations of known genes. Pseudogenes can be divided into "processed" and "nonprocessed"; the former are reverse transcribed from mRNA (and therefore have no intron structure), whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e., with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centers. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 nonprocessed pseudogenes, and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene or http://genecensus.org/pseudogene.) By extrapolation, we predict that there could be up to approximately 20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, indicating the existence of pseudogenic "hot-spots" in the genome. We have looked at the distribution of InterPro families and Gene Ontology (GO) functional categories in our pseudogenes. Overall, the families in both processed and nonprocessed pseudogene populations occur according to a similar power-law distribution as that found for the occurrence of gene families, with a few big families and many small ones. The processed population is, in particular, enriched in highly expressed ribosomal-protein sequences (approximately 20%), which appear fairly evenly distributed across the chromosomes. We compared processed pseudogenes of different evolutionary ages, observing a high degree of similarity between "ancient" and "modern" subpopulations. This may be attributable to the consistently high expression of ribosomal proteins over evolutionary time. Finally, we find that chromosome 22 pseudogene population is dominated by immunoglobulin segments, which have a greater rate of disablement per amino acid than the other pseudogene populations and are also substantially more diverged.
Collapse
MESH Headings
- Chromosome Mapping/methods
- Chromosomes, Human, Pair 21/genetics
- Chromosomes, Human, Pair 22/genetics
- Evolution, Molecular
- Fossils
- Genes, Immunoglobulin
- Genes, Overlapping
- Genome, Human
- Humans
- Multigene Family
- Pseudogenes
- RNA Processing, Post-Transcriptional/genetics
- Sequence Analysis, DNA/statistics & numerical data
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520-8114, USA
| | | | | | | | | | | | | | | |
Collapse
|