1
|
Cameron RA, Kudtarkar P, Gordon SM, Worley KC, Gibbs RA. Do echinoderm genomes measure up? Mar Genomics 2015; 22:1-9. [PMID: 25701080 PMCID: PMC4489978 DOI: 10.1016/j.margen.2015.02.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Revised: 02/05/2015] [Accepted: 02/06/2015] [Indexed: 11/19/2022]
Abstract
Echinoderm genome sequences are a corpus of useful information about a clade of animals that serve as research models in fields ranging from marine ecology to cell and developmental biology. Genomic information from echinoids has contributed to insights into the gene interactions that drive the developmental process at the molecular level. Such insights often rely heavily on genomic information and the kinds of questions that can be asked thus depend on the quality of the sequence information. Here we describe the history of echinoderm genomic sequence assembly and present details about the quality of the data obtained. All of the sequence information discussed here is posted on the echinoderm information web system, Echinobase.org.
Collapse
Affiliation(s)
- R Andrew Cameron
- Division of Biology 139-74, California Institute of Technology, Pasadena, CA, USA.
| | - Parul Kudtarkar
- Division of Biology 139-74, California Institute of Technology, Pasadena, CA, USA
| | - Susan M Gordon
- Division of Biology 139-74, California Institute of Technology, Pasadena, CA, USA
| | - Kim C Worley
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
2
|
Andrikou C, Iovene E, Rizzo F, Oliveri P, Arnone MI. Myogenesis in the sea urchin embryo: the molecular fingerprint of the myoblast precursors. EvoDevo 2013; 4:33. [PMID: 24295205 PMCID: PMC4175510 DOI: 10.1186/2041-9139-4-33] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 10/02/2013] [Indexed: 01/01/2023] Open
Abstract
Background In sea urchin larvae the circumesophageal fibers form a prominent muscle system of mesodermal origin. Although the morphology and later development of this muscle system has been well-described, little is known about the molecular signature of these cells or their precise origin in the early embryo. As an invertebrate deuterostome that is more closely related to the vertebrates than other commonly used model systems in myogenesis, the sea urchin fills an important phylogenetic gap and provides a unique perspective on the evolution of muscle cell development. Results Here, we present a comprehensive description of the development of the sea urchin larval circumesophageal muscle lineage beginning with its mesodermal origin using high-resolution localization of the expression of several myogenic transcriptional regulators and differentiation genes. A few myoblasts are bilaterally distributed at the oral vegetal side of the tip of the archenteron and first appear at the late gastrula stage. The expression of the differentiation genes Myosin Heavy Chain, Tropomyosin I and II, as well as the regulatory genes MyoD2, FoxF, FoxC, FoxL1, Myocardin, Twist, and Tbx6 uniquely identify these cells. Interestingly, evolutionarily conserved myogenic factors such as Mef2, MyoR and Six1/2 are not expressed in sea urchin myoblasts but are found in other mesodermal domains of the tip of the archenteron. The regulatory states of these domains were characterized in detail. Moreover, using a combinatorial analysis of gene expression we followed the development of the FoxF/FoxC positive cells from the onset of expression to the end of gastrulation. Our data allowed us to build a complete map of the Non-Skeletogenic Mesoderm at the very early gastrula stage, in which specific molecular signatures identify the precursors of different cell types. Among them, a small group of cells within the FoxY domain, which also express FoxC and SoxE, have been identified as plausible myoblast precursors. Together, these data support a very early gastrula stage segregation of the myogenic lineage. Conclusions From this analysis, we are able to precisely define the regulatory and differentiation signatures of the circumesophageal muscle in the sea urchin embryo. Our findings have important implications in understanding the evolution of development of the muscle cell lineage at the molecular level. The data presented here suggest a high level of conservation of the myogenic specification mechanisms across wide phylogenetic distances, but also reveal clear cases of gene cooption.
Collapse
Affiliation(s)
| | | | | | | | - Maria Ina Arnone
- Cellular and Developmental Biology, Stazione Zoologica Anton Dohrn, Napoli 80121, Italy.
| |
Collapse
|
3
|
Vaughn R, Garnhart N, Garey JR, Thomas WK, Livingston BT. Sequencing and analysis of the gastrula transcriptome of the brittle star Ophiocoma wendtii. EvoDevo 2012; 3:19. [PMID: 22938175 PMCID: PMC3492025 DOI: 10.1186/2041-9139-3-19] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 07/13/2012] [Indexed: 01/22/2023] Open
Abstract
UNLABELLED BACKGROUND The gastrula stage represents the point in development at which the three primary germ layers diverge. At this point the gene regulatory networks that specify the germ layers are established and the genes that define the differentiated states of the tissues have begun to be activated. These networks have been well-characterized in sea urchins, but not in other echinoderms. Embryos of the brittle star Ophiocoma wendtii share a number of developmental features with sea urchin embryos, including the ingression of mesenchyme cells that give rise to an embryonic skeleton. Notable differences are that no micromeres are formed during cleavage divisions and no pigment cells are formed during development to the pluteus larval stage. More subtle changes in timing of developmental events also occur. To explore the molecular basis for the similarities and differences between these two echinoderms, we have sequenced and characterized the gastrula transcriptome of O. wendtii. METHODS Development of Ophiocoma wendtii embryos was characterized and RNA was isolated from the gastrula stage. A transcriptome data base was generated from this RNA and was analyzed using a variety of methods to identify transcripts expressed and to compare those transcripts to those expressed at the gastrula stage in other organisms. RESULTS Using existing databases, we identified brittle star transcripts that correspond to 3,385 genes, including 1,863 genes shared with the sea urchin Strongylocentrotus purpuratus gastrula transcriptome. We characterized the functional classes of genes present in the transcriptome and compared them to those found in this sea urchin. We then examined those members of the germ-layer specific gene regulatory networks (GRNs) of S. purpuratus that are expressed in the O. wendtii gastrula. Our results indicate that there is a shared 'genetic toolkit' central to the echinoderm gastrula, a key stage in embryonic development, though there are also differences that reflect changes in developmental processes. CONCLUSIONS The brittle star expresses genes representing all functional classes at the gastrula stage. Brittle stars and sea urchins have comparable numbers of each class of genes and share many of the genes expressed at gastrulation. Examination of the brittle star genes in which sea urchin orthologs are utilized in germ layer specification reveals a relatively higher level of conservation of key regulatory components compared to the overall transcriptome. We also identify genes that were either lost or whose temporal expression has diverged from that of sea urchins.
Collapse
Affiliation(s)
- Roy Vaughn
- Department of Biological, Sciences, California State University Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90815, USA.
| | | | | | | | | |
Collapse
|
4
|
Tassanakajon A, Klinbunga S, Paunglarp N, Rimphanitchayakit V, Udomkit A, Jitrapakdee S, Sritunyalucksana K, Phongdara A, Pongsomboon S, Supungul P, Tang S, Kuphanumart K, Pichyangkura R, Lursinsap C. Penaeus monodon gene discovery project: the generation of an EST collection and establishment of a database. Gene 2006; 384:104-12. [PMID: 16945489 DOI: 10.1016/j.gene.2006.07.012] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2006] [Revised: 06/28/2006] [Accepted: 07/12/2006] [Indexed: 11/15/2022]
Abstract
A large-scale expressed sequence tag (EST) sequencing project was undertaken for the purpose of gene discovery in the black tiger shrimp Penaeus monodon. Initially, 15 cDNA libraries were constructed from different tissues (eyestalk, hepatopancrease, haematopoietic tissue, haemocyte, lymphoid organ, and ovary) of shrimp, reared under normal or stress conditions, to identify tissue-specific genes and genes responding to infection and heat stress. A total of 10,100 clones were analyzed by single-pass sequencing from the 5' end. Clustering and assembling of these ESTs resulted in a total of 4845 unique sequences with 917 overlapping contigs and 3928 singletons. The redundancy of each cDNA library ranged from 13.4% to 61.3% with an overall redundancy of 61.1%. About half of these ESTs (2365 clones, 48.8%) showed significant homology (BLASTX, e-values <10(-4)) to known genes. A high proportion of P. monodon ESTs was most similar to the predicted protein sequences from various organisms, e.g. Homo sapiens (9%), Mus musculus (7%), Drosophila (6%), Gallus sp.(6%), and Anopheles (5%). Only 6% showed the highest similarity to other known genes from shrimp due to the limited sequence entries of the species in the public database. Several tissue-specific transcripts were identified as well as the candidate genes that may be implicated in the immune response. In addition, bioinformatic mining of microsatellites from the P. monodon ESTs identified 997 unique microsatellite containing ESTs in which 74 loci resided within the genes of known functions. Consequently, the P. monodon EST database was established. The EST sequence data and the BLAST results were stored and made available through a web-accessible database (). This EST database provides a useful resource for gene identification and functional genomic studies of shrimp.
Collapse
Affiliation(s)
- Anchalee Tassanakajon
- Shrimp Molecular Biology and Genomics Laboratory, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Illiger J, Herwig R, Steinfath M, Przewieslik T, Elge T, Bull C, Radelof U, Lehrach H, Janitz M. Establishment of T cell-specific and natural killer cell-specific unigene sets: towards high-throughput genomics of leukaemia. EUROPEAN JOURNAL OF IMMUNOGENETICS : OFFICIAL JOURNAL OF THE BRITISH SOCIETY FOR HISTOCOMPATIBILITY AND IMMUNOGENETICS 2004; 31:253-7. [PMID: 15548262 DOI: 10.1111/j.1365-2370.2004.00483.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
We report the establishment of highly non-redundant unigene sets consisting of cDNA clones derived from T lymphocytes and natural killer cells. Each set consists of 10 506 and 13 409 clones, respectively, arrayed on nylon membranes in duplicate. The sets provide an excellent tool for genome-wide gene expression analysis studies in immunology research.
Collapse
Affiliation(s)
- J Illiger
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Flowers VL, Courteau GR, Poustka AJ, Weng W, Venuti JM. Nodal/activin signaling establishes oral-aboral polarity in the early sea urchin embryo. Dev Dyn 2004; 231:727-40. [PMID: 15517584 DOI: 10.1002/dvdy.20194] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Components of the Wnt signaling pathway are involved in patterning the sea urchin primary or animal-vegetal (AV) axis, but the molecular cues that pattern the secondary embryonic axis, the aboral/oral (AO) axis, are not known. In an analysis of signaling molecules that influence patterning along the sea urchin embryonic axes, we found that members of the activin subfamily of transforming growth factor-beta (TGF-beta) signaling molecules influence the establishment of AO polarities in the early embryo. Injection of activin mRNAs into fertilized eggs or treatment with exogenously applied recombinant activin altered the allocation of ectodermal fates and ventralized the embryo. The phenotypes observed resemble the ventralized phenotype previously reported for NiCl2, a known disrupter of AO patterning. Sensitivity to exogenous activin occurs between fertilization and the late blastula stage, which is also the time of highest NiCl2 sensitivity. These results argue that specification of fates along the embryonic AO axis involves TGF-beta signaling. To further examine TGF-beta signaling in these embryos, we cloned an endogenous TGF-beta from sea urchin embryos that is a member of the activin subfamily, SpNodal, and show through gain of function analysis that it recapitulates results obtained with exogenous activins and NiCl2. The expression pattern of SpNodal is consistent with a role for nodal signaling in the establishment of fates along the AO axis. Loss of function experiments using SpNodal antisense morpholinos also support a role for SpNodal in the establishment of the AO axis.
Collapse
Affiliation(s)
- Vera Lynn Flowers
- Department of Cell Biology and Anatomy, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112-1393, USA
| | | | | | | | | |
Collapse
|
7
|
Cameron RA, Oliveri P, Wyllie J, Davidson EH. cis-Regulatory activity of randomly chosen genomic fragments from the sea urchin. Gene Expr Patterns 2004; 4:205-13. [PMID: 15161101 DOI: 10.1016/j.modgep.2003.08.007] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2003] [Revised: 08/28/2003] [Accepted: 08/28/2003] [Indexed: 11/21/2022]
Abstract
In order to determine the frequency and variety of cis-regulatory elements that function during embryonic development of Strongylocentrotus purpuratus, we constructed a GFP expression vector in which to test the activity of randomly chosen genomic DNA fragments that includes a promiscuous basal promoter from the endo16 gene. This vector was demonstrated to serve as a cis-regulatory element trap. We used it to carry out an initial test for the occurrence of elements that would promote GFP expression in this genome. In the screen reported here 108 different randomly chosen DNA fragments (av. 3.8 kb) were inserted in the vector, and each was injected into > 200 zygotes. Surprisingly, 13% of the fragments tested yielded detectable levels of GFP expression in the recipient embryos. Specific patterns observed included expression in endoderm, in aboral ectoderm, and in pigment cells. The majority of active constructs expressed GFP in all spatial domains of the embryo. Elements with detectable cis-regulatory activity in the embryo occur in the sample screened, on the average, about every 30 kb, and the genome must include many thousands of such elements. On further analysis one isolate was shown to contain a gut specific element as well as one that controls expression in the secondary mesenchyme cells.
Collapse
Affiliation(s)
- R Andrew Cameron
- Division of Biology 156-29, California Institute of Technology, 1200 E. California Blvd., Pasadena, CA 91125, USA.
| | | | | | | |
Collapse
|
8
|
Jackman WR, Mougey JM, Panopoulou GD, Kimmel CB. crabp and maf highlight the novelty of the amphioxus club-shaped gland. ACTA ZOOL-STOCKHOLM 2004. [DOI: 10.1111/j.0001-7272.2004.00161.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
9
|
Poustka AJ, Kühn A, Radosavljevic V, Wellenreuther R, Lehrach H, Panopoulou G. On the origin of the chordate central nervous system: expression of onecut in the sea urchin embryo. Evol Dev 2004; 6:227-36. [PMID: 15230963 DOI: 10.1111/j.1525-142x.2004.04028.x] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We identified a transcription factor of the onecut class in the sea urchin Strongylocentrotus purpuratus that represents an ortholog of the mammalian gene HNF6, the founding member of the onecut class of proteins. The isolated sea urchin gene, named SpOnecut, encodes a protein of 483 amino acids with one cut domain and a homeodomain. Phylogenetic analysis clearly places the sea urchin gene into this family, most closely related to the ascidian onecut gene HNF-6. Nevertheless, phylogenetic analysis reveals a difficult phylogeny indicating that certain members of the family evolve more rapidly than others and also that the cut domain and homeodomain evolve at a different pace. In fly, worm, ascidian, and teleost fish, the onecut genes isolated so far are exclusively expressed in cells of the central nervous system (CNS), whereas in mammals the two copies of the gene have acquired additional functions in liver and pancreas development. In the sea urchin embryo, expression is first detected in the emerging ciliary band at the late blastula stage. During the gastrula stage, expression is limited to the ciliary band. In the early pluteus stage, SpOnecut is expressed at the apical organ and the elongating arms but continues most prominently in the ciliary band. This is the first gene known that exclusively marks the ciliary band and therein the apical organ in a pluteus larva, whereas chordate orthologs execute essential functions in dorsal CNS development. The significance of this finding for the hypothesis that the ciliary bands and apical organs of the hypothetical "dipleurula"-like chordate ancestor and the chordate/vertebrate CNS are of common origin is discussed.
Collapse
Affiliation(s)
- Albert J Poustka
- Max Planck Institute for Molecular Genetics, Department of Vertebrate Genomics, Evolution and Development Group, Ihnestrasse 73, 14195 Berlin, Germany.
| | | | | | | | | | | |
Collapse
|
10
|
Coward K, Owen H, Poustka AJ, Hibbitt O, Tunwell R, Kubota H, Swann K, Parrington J. Cloning of a novel phospholipase C-delta isoform from pacific purple sea urchin (Strongylocentrotus purpuratus) gametes and its expression during early embryonic development. Biochem Biophys Res Commun 2004; 313:894-901. [PMID: 14706626 DOI: 10.1016/j.bbrc.2003.12.029] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Calcium (Ca(2+)) is a ubiquitous intracellular messenger, controlling a diverse range of cellular processes, including fertilization and development of the embryo. One of the key mechanisms involved in triggering intracellular calcium release is the generation of the second messenger inositol-1,4,5-phosphate (IP(3)) by the phospholipase C (PLC) class of enzymes. Although five distinct forms of PLC have been identified in mammals (beta, gamma, delta, epsilon, and zeta), only one, PLCgamma, has thus far been detected in echinoderms. In the present study, we describe the isolation of a cDNA encoding a novel PLC isoform of the delta (delta) subclass, PLC-deltasu, from the egg of the Pacific purple sea urchin Strongylocentrotus purpuratus. We also demonstrate the presence of this PLC within the sperm and in the early embryo. The PLC-deltasu cDNA (2.44kb) encodes a 742 amino acid polypeptide with an open reading frame of 84.6kDa and a pI of 6.04. All of the characteristic domains found in mammalian PLCdelta isoforms (PH domain, EF hands, an X-Y catalytic region, and a C2 domain) are present in PLC-deltasu. A homology search revealed that PLC-deltasu shares most sequence identity with bovine PLCdelta2 (39%). We present evidence that PLC-deltasu is expressed in unfertilized eggs, fertilized eggs, and in the early embryo. In addition to Northern and polymerase chain reaction (PCR) analyses, in situ hybridization experiments further demonstrated that the embryonic regions within which the PLC-deltasu transcript can be detected during early embryonic development are associated with the highest levels of proliferative activity, suggesting a possible involvement with metabolism or cell cycle regulation.
Collapse
Affiliation(s)
- Kevin Coward
- University Department of Pharmacology, University of Oxford, Mansfield Road, Oxford OX1 3QT, UK
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Poustka AJ, Groth D, Hennig S, Thamm S, Cameron A, Beck A, Reinhardt R, Herwig R, Panopoulou G, Lehrach H. Generation, annotation, evolutionary analysis, and database integration of 20,000 unique sea urchin EST clusters. Genome Res 2004; 13:2736-46. [PMID: 14656975 PMCID: PMC403816 DOI: 10.1101/gr.1674103] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Together with the hemichordates, sea urchins represent basal groups of nonchordate invertebrate deuterostomes that occupy a key position in bilaterian evolution. Because sea urchin embryos are also amenable to functional studies, the sea urchin system has emerged as one of the leading models for the analysis of the function of genomic regulatory networks that control development. We have analyzed a total of 107,283 cDNA clones of libraries that span the development of the sea urchin Strongylocentrotus purpuratus. Normalization by oligonucleotide fingerprinting, EST sequencing and sequence clustering resulted in an EST catalog comprised of 20,000 unique genes or gene fragments. Around 7000 of the unique EST consensus sequences were associated with molecular and developmental functions. Phylogenetic comparison of the identified genes to the genome of the urochordate Ciona intestinalis indicate that at least one quarter of the genes thought to be chordate specific were already present at the base of deuterostome evolution. Comparison of the number of gene copies in sea urchins to those in chordates and vertebrates indicates that the sea urchin genome has not undergone extensive gene or complete genome duplications. The established unique gene set represents an essential tool for the annotation and assembly of the forthcoming sea urchin genome sequence. All cDNA clones and filters of all analyzed libraries are available from the resource center of the German genome project at http://www.rzpd.de.
Collapse
Affiliation(s)
- Albert J Poustka
- Evolution and Development Group, Max Planck Institute for Molecular Genetics, Department of Vertebrate Genomics, 14195 Berlin, Germany.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
It may safely be predicted that GRN analysis will become increasingly important. It will come to underlie the causal study of development, the major effort underway to understand the regulatory code built into animal genomes and also the evolution of these genomes. Partly by serendipity, sea urchin embryos turn out to be a superb experimental material for GRN analysis. Their natural properties have, in turn, influenced the predilections of those who work on them, and between them and us, so to speak, this is now a developmental system of which we are rapidly gaining an unusually complete understanding. The causal linkages that control development of the whole embryo will be revealed, leading all the way from the heritable genomic regulatory code to the events of embryology. The fundamental experimental operation is the perturbation analysis: Here is where causality permeates the exploration. We have in this chapter summarized in some detail the requirements for perturbation GRN analysis in sea urchin embryos. But that is not all, nor is it enough to enable the assembly of a GRN: What is required is the combined application of elegant computational methods, of gene regulation molecular biology, of genomic sequence data, and of experimental embryology. As the results crystallize together, we can begin to see how far this powerful combination of methods and ideas is going to carry us.
Collapse
Affiliation(s)
- Paola Oliveri
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | | |
Collapse
|
13
|
Affiliation(s)
- R Andrew Cameron
- Division of Biology and the Center for Computational Regulatory Genomics, Beckman Institute, California Institute of Technology, Pasadena, California 91125, USA
| | | | | |
Collapse
|
14
|
Chu Z, Peng K, Zhang L, Zhou B, Wei J, Wang S. Construction and characterization of a normalized whole-life-cycle cDNA library of rice. CHINESE SCIENCE BULLETIN-CHINESE 2003. [DOI: 10.1007/bf03183288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
15
|
Sharan R, Elkon R, Shamir R. Cluster analysis and its applications to gene expression data. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2002:83-108. [PMID: 12061008 DOI: 10.1007/978-3-662-04747-7_5] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- R Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel.
| | | | | |
Collapse
|
16
|
Fuchs T, Malecova B, Linhart C, Sharan R, Khen M, Herwig R, Shmulevich D, Elkon R, Steinfath M, O'Brien JK, Radelof U, Lehrach H, Lancet D, Shamir R. DEFOG: a practical scheme for deciphering families of genes. Genomics 2002; 80:295-302. [PMID: 12213199 DOI: 10.1006/geno.2002.6830] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We developed a novel efficient scheme, DEFOG (for "deciphering families of genes"), for determining sequences of numerous genes from a family of interest. The scheme provides a powerful means to obtain a gene family composition in species for which high-throughput genomic sequencing data are not available. DEFOG uses two key procedures. The first is a novel algorithm for designing highly degenerate primers based on a set of known genes from the family of interest. These primers are used in PCR reactions to amplify the members of the gene family. The second combines oligofingerprinting of the cloned PCR products with clustering of the clones based on their fingerprints. By selecting members from each cluster, a low-redundancy clone subset is chosen for sequencing. We applied the scheme to the human olfactory receptor (OR) genes. OR genes constitute the largest gene superfamily in the human genome, as well as in the genomes of other vertebrate species. DEFOG almost tripled the size of the initial repertoire of human ORs in a single experiment, and only 7% of the PCR clones had to be sequenced. Extremely high degeneracies, reaching over a billion combinations of distinct PCR primer pairs, proved to be very effective and yielded only 0.4% nonspecific products.
Collapse
Affiliation(s)
- Tania Fuchs
- Department of Molecular Genetics and the Crown Human Genome Center, The Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Kessler MM, Willins DA, Zeng Q, Del Mastro RG, Cook R, Doucette-Stamm L, Lee H, Caron A, McClanahan TK, Wang L, Greene J, Hare RS, Cottarel G, Shimer GH. The use of direct cDNA selection to rapidly and effectively identify genes in the fungus Aspergillus fumigatus. Fungal Genet Biol 2002; 36:59-70. [PMID: 12051895 DOI: 10.1016/s1087-1845(02)00002-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Aspergillus fumigatus is one of the causes of invasive lung disease in immunocompromised individuals. To rapidly identify genes in this fungus, including potential targets for chemotherapy, diagnostics, and vaccine development, we constructed cDNA libraries. We began with non-normalized libraries, then to improve this approach we constructed a normalized cDNA library using direct cDNA selection. Normalization resulted in a reduction of the frequency of clones with highly expressed genes and an enrichment of underrepresented cDNAs. Expressed sequence tags generated from both the original and the normalized libraries were compared with the genomes of Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Candida albicans, indicating that a large proportion of A. fumigatus genes do not have orthologs in these fungal species. This method allowed the expeditious identification of genes in a fungal pathogen. The same approach can be applied to other human or plant pathogens to rapidly identify genes without the need for genomic sequence information.
Collapse
|
18
|
Peterson DG, Schulze SR, Sciara EB, Lee SA, Bowers JE, Nagel A, Jiang N, Tibbitts DC, Wessler SR, Paterson AH. Integration of Cot analysis, DNA cloning, and high-throughput sequencing facilitates genome characterization and gene discovery. Genome Res 2002; 12:795-807. [PMID: 11997346 PMCID: PMC186575 DOI: 10.1101/gr.226102] [Citation(s) in RCA: 105] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Cot-based sequence discovery represents a powerful means by which both low-copy and repetitive sequences can be selectively and efficiently fractionated, cloned, and characterized. Based upon the results of a Cot analysis, hydroxyapatite chromatography was used to fractionate sorghum (Sorghum bicolor) genomic DNA into highly repetitive (HR), moderately repetitive (MR), and single/low-copy (SL) sequence components that were consequently cloned to produce HRCot, MRCot, and SLCot genomic libraries. Filter hybridization (blotting) and sequence analysis both show that the HRCot library is enriched in sequences traditionally found in high-copy number (e.g., retroelements, rDNA, centromeric repeats), the SLCot library is enriched in low-copy sequences (e.g., genes and "nonrepetitive ESTs"), and the MRCot library contains sequences of moderate redundancy. The Cot analysis suggests that the sorghum genome is approximately 700 Mb (in agreement with previous estimates) and that HR, MR, and SL components comprise 15%, 41%, and 24% of sorghum DNA, respectively. Unlike previously described techniques to sequence the low-copy components of genomes, sequencing of Cot components is independent of expression and methylation patterns that vary widely among DNA elements, developmental stages, and taxa. High-throughput sequencing of Cot clones may be a means of "capturing" the sequence complexity of eukaryotic genomes at unprecedented efficiency.
Collapse
Affiliation(s)
- Daniel G Peterson
- Center for Applied Genetic Technologies and Department of Crop and Soil Sciences, University of Georgia, Athens, Georgia 30602, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, Yuh CH, Minokawa T, Amore G, Hinman V, Arenas-Mena C, Otim O, Brown CT, Livi CB, Lee PY, Revilla R, Rust AG, Pan ZJ, Schilstra MJ, Clarke PJC, Arnone MI, Rowen L, Cameron RA, McClay DR, Hood L, Bolouri H. A genomic regulatory network for development. Science 2002; 295:1669-78. [PMID: 11872831 DOI: 10.1126/science.1069883] [Citation(s) in RCA: 943] [Impact Index Per Article: 42.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Development of the body plan is controlled by large networks of regulatory genes. A gene regulatory network that controls the specification of endoderm and mesoderm in the sea urchin embryo is summarized here. The network was derived from large-scale perturbation analyses, in combination with computational methodologies, genomic data, cis-regulatory analysis, and molecular embryology. The network contains over 40 genes at present, and each node can be directly verified at the DNA sequence level by cis-regulatory analysis. Its architecture reveals specific and general aspects of development, such as how given cells generate their ordained fates in the embryo and why the process moves inexorably forward in developmental time.
Collapse
Affiliation(s)
- Eric H Davidson
- Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Clark MD, Hennig S, Herwig R, Clifton SW, Marra MA, Lehrach H, Johnson SL. An oligonucleotide fingerprint normalized and expressed sequence tag characterized zebrafish cDNA library. Genome Res 2001; 11:1594-602. [PMID: 11544204 PMCID: PMC311136 DOI: 10.1101/gr.186901] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The zebrafish is a powerful system for understanding the vertebrate genome, allowing the combination of genetic, molecular, and embryological analysis. Expressed sequence tags (ESTs) provide a rapid means of identifying an organism's genes for further analysis, but any EST project is limited by the availability of suitable libraries. Such cDNA libraries must be of high quality and provide a high rate of gene discovery. However, commonly used normalization and subtraction procedures tend to select for shorter, truncated, and internally primed inserts, seriously affecting library quality. An alternative procedure is to use oligonucleotide fingerprinting (OFP) to precluster clones before EST sequencing, thereby reducing the re-sequencing of common transcripts. Here, we describe the use of OFP to normalize and subtract 75,000 clones from two cDNA libraries, to a minimal set of 25,102 clones. We generated 25,788 ESTs (11,380 3' and 14,408 5') from over 16,000 of these clones. Clustering of 10,654 high-quality 3' ESTs from this set identified 7232 clusters (likely genes), corresponding to a 68% gene diversity rate, comparable to what has been reported for the best normalized human cDNA libraries, and indicating that the complete set of 25,102 clones contains as many as 17,000 genes. Yet, the library quality remains high. The complete set of 25,102 clones is available for researchers as glycerol stocks, filters sets, and as individual EST clones. These resources have been used for radiation hybrid, genetic, and physical mapping of the zebrafish genome, as well as positional cloning and candidate gene identification, molecular marker, and microarray development.
Collapse
Affiliation(s)
- M D Clark
- Max-Planck-Institut für Molekulare Genetik, 14195 Berlin, Germany.
| | | | | | | | | | | | | |
Collapse
|
21
|
Zhu X, Mahairas G, Illies M, Cameron RA, Davidson EH, Ettensohn CA. A large-scale analysis of mRNAs expressed by primary mesenchyme cells of the sea urchin embryo. Development 2001; 128:2615-27. [PMID: 11493577 DOI: 10.1242/dev.128.13.2615] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The primary mesenchyme cells (PMCs) of the sea urchin embryo have been an important model system for the analysis of cell behavior during gastrulation. To gain an improved understanding of the molecular basis of PMC behavior, a set of 8293 expressed sequenced tags (ESTs) was derived from an enriched population of mid-gastrula stage PMCs. These ESTs represented approximately 1200 distinct proteins, or about 15% of the mRNAs expressed by the gastrula stage embryo. 655 proteins were similar (P<10−7 by BLAST comparisons) to other proteins in GenBank, for which some information is available concerning expression and/or function. Another 116 were similar to ESTs identified in other organisms, but not further characterized. We conservatively estimate that sequences encoding at least 435 additional proteins were included in the pool of ESTs that did not yield matches by BLAST analysis. The collection of newly identified proteins includes many candidate regulators of primary mesenchyme morphogenesis, including PMC-specific extracellular matrix proteins, cell surface proteins, spicule matrix proteins and transcription factors. This work provides a basis for linking specific molecular changes to specific cell behaviors during gastrulation. Our analysis has also led to the cloning of several key components of signaling pathways that play crucial roles in early sea urchin development.
Collapse
Affiliation(s)
- X Zhu
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | | | |
Collapse
|
22
|
Makabe KW, Kawashima T, Kawashima S, Minokawa T, Adachi A, Kawamura H, Ishikawa H, Yasuda R, Yamamoto H, Kondoh K, Arioka S, Sasakura Y, Kobayashi A, Yagi K, Shojima K, Kondoh Y, Kido S, Tsujinami M, Nishimura N, Takahashi M, Nakamura T, Kanehisa M, Ogasawara M, Nishikata T, Nishida H. Large-scale cDNA analysis of the maternal genetic information in the egg of Halocynthia roretzi for a gene expression catalog of ascidian development. Development 2001; 128:2555-67. [PMID: 11493572 DOI: 10.1242/dev.128.13.2555] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The ascidian egg is a well-known mosaic egg. In order to investigate the molecular nature of the maternal genetic information stored in the egg, we have prepared cDNAs from the mRNAs in the fertilized eggs of the ascidian, Halocynthia roretzi. The cDNAs of the ascidian embryo were sequenced, and the localization of individual mRNA was examined in staged embryos by whole-mount in situ hybridization. The data obtained were stored in the database MAGEST (http://www.genome.ad.jp/magest) and further analyzed. A total of 4240 cDNA clones were found to represent 2221 gene transcripts, including at least 934 different protein-coding sequences. The mRNA population of the egg consisted of a low prevalence, high complexity sequence set. The majority of the clones were of the rare sequence class, and of these, 42% of the clones showed significant matches with known peptides, mainly consisting of proteins with housekeeping functions such as metabolism and cell division. In addition, we found cDNAs encoding components involved in different signal transduction pathways and cDNAs encoding nucleotide-binding proteins. Large-scale analyses of the distribution of the RNA corresponding to each cDNA in the eight-cell, 110-cell and early tailbud embryos were simultaneously carried out. These analyses revealed that a small fraction of the maternal RNAs were localized in the eight-cell embryo, and that 7.9% of the clones were exclusively maternal, while 40.6% of the maternal clones showed expression in the later stages. This study provides global insights about the genes expressed during early development.
Collapse
Affiliation(s)
- K W Makabe
- Department of Zoology, Graduate School of Science, Kyoto University, Kyoto 606-8502, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Rast JP, Amore G, Calestani C, Livi CB, Ransick A, Davidson EH. Recovery of developmentally defined gene sets from high-density cDNA macroarrays. Dev Biol 2000; 228:270-86. [PMID: 11112329 DOI: 10.1006/dbio.2000.9941] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
New technologies for isolating differentially expressed genes from large arrayed cDNA libraries are reported. These methods can be used to identify genes that lie downstream of developmentally important transcription factors and genes that are expressed in specific tissues, processes, or stages of embryonic development. Though developed for the study of gene expression during the early embryogenesis of the sea urchin Strongylocentrotus purpuratus, these technologies can be applied generally. Hybridization parameters were determined for the reaction of complex cDNA probes to cDNA libraries carried on six nylon filters, each containing duplicate spots from 18,432 bacterial clones (macroarrays). These libraries are of sufficient size to include nearly all genes expressed in the embryo. The screening strategy we have devised is designed to overcome inherent sensitivity limitations of macroarray hybridization and thus to isolate differentially expressed genes that are represented only by low-prevalence mRNAs. To this end, we have developed improved methods for the amplification of cDNA from small amounts of tissue (as little as approximately 300 sea urchin embryos, or 2 x 10(5) cells, or about 10 ng of mRNA) and for the differential enhancement of probe sequence concentration by subtractive hybridization. Quantitative analysis of macroarray hybridization shows that these probes now suffice for detection of differentially expressed mRNAs down to a level below five molecules per average embryo cell.
Collapse
Affiliation(s)
- J P Rast
- Division of Biology 156-29, California Institute of Technology, Pasadena, California 91125, USA
| | | | | | | | | | | |
Collapse
|
24
|
Neidhardt L, Gasca S, Wertz K, Obermayr F, Worpenberg S, Lehrach H, Herrmann BG. Large-scale screen for genes controlling mammalian embryogenesis, using high-throughput gene expression analysis in mouse embryos. Mech Dev 2000; 98:77-94. [PMID: 11044609 DOI: 10.1016/s0925-4773(00)00453-6] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We have adapted the whole-mount in situ hybridization technique to perform high-throughput gene expression analysis in mouse embryos. A large-scale screen for genes showing specific expression patterns in the mid-gestation embryo was carried out, and a large number of genes controlling development were isolated. From 35760 clones of a 9.5 d.p.c. cDNA library, a total of 5348 cDNAs, enriched for rare transcripts, were selected and analyzed by whole-mount in situ hybridization. Four hundred and twenty-eight clones revealed specific expression patterns in the 9.5 d.p.c. embryo. Of 361 tag-sequenced clones, 198 (55%) represent 154 known mouse genes. Thirty-nine (25%) of the known genes are involved in transcriptional regulation and 33 (21%) in inter- or intracellular signaling. A large number of these genes have been shown to play an important role in embryogenesis. Furthermore, 24 (16%) of the known genes are implicated in human disorders and three others altered in classical mouse mutations. Similar proportions of regulators of embryonic development and candidates for human disorders or mouse mutations are expected among the 163 new mouse genes isolated. Thus, high-throughput gene expression analysis is suitable for isolating regulators of embryonic development on a large-scale, and in the long term, for determining the molecular anatomy of the mouse embryo. This knowledge will provide a basis for the systematic investigation of pattern formation, tissue differentiation and organogenesis in mammals.
Collapse
Affiliation(s)
- L Neidhardt
- Max-Planck-Institut für Immunbiologie, Abt. Entwicklungsbiologie, Stübeweg 51, 79108, Freiburg, Germany
| | | | | | | | | | | | | |
Collapse
|
25
|
Cameron RA, Mahairas G, Rast JP, Martinez P, Biondi TR, Swartzell S, Wallace JC, Poustka AJ, Livingston BT, Wray GA, Ettensohn CA, Lehrach H, Britten RJ, Davidson EH, Hood L. A sea urchin genome project: sequence scan, virtual map, and additional resources. Proc Natl Acad Sci U S A 2000; 97:9514-8. [PMID: 10920195 PMCID: PMC16896 DOI: 10.1073/pnas.160261897] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Results of a first-stage Sea Urchin Genome Project are summarized here. The species chosen was Strongylocentrotus purpuratus, a research model of major importance in developmental and molecular biology. A virtual map of the genome was constructed by sequencing the ends of 76,020 bacterial artificial chromosome (BAC) recombinants (average length, 125 kb). The BAC-end sequence tag connectors (STCs) occur an average of 10 kb apart, and, together with restriction digest patterns recorded for the same BAC clones, they provide immediate access to contigs of several hundred kilobases surrounding any gene of interest. The STCs survey >5% of the genome and provide the estimate that this genome contains approximately 27,350 protein-coding genes. The frequency distribution and canonical sequences of all middle and highly repetitive sequence families in the genome were obtained from the STCs as well. The 500-kb Hox gene complex of this species is being sequenced in its entirety. In addition, arrayed cDNA libraries of >10(5) clones each were constructed from every major stage of embryogenesis, several individual cell types, and adult tissues and are available to the community. The accumulated STC data and an expanding expressed sequence tag database (at present including >12, 000 sequences) have been reported to GenBank and are accessible on public web sites.
Collapse
Affiliation(s)
- R A Cameron
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Eickhoff H, Schuchhardt J, Ivanov I, Meier-Ewert S, O'Brien J, Malik A, Tandon N, Wolski EW, Rohlfs E, Nyarsik L, Reinhardt R, Nietfeld W, Lehrach H. Tissue gene expression analysis using arrayed normalized cDNA libraries. Genome Res 2000; 10:1230-40. [PMID: 10958641 PMCID: PMC310898 DOI: 10.1101/gr.10.8.1230] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We have used oligonucleotide-fingerprinting data on 60,000 cDNA clones from two different mouse embryonic stages to establish a normalized cDNA clone set. The normalized set of 5,376 clones represents different clusters and therefore, in almost all cases, different genes. The inserts of the cDNA clones were amplified by PCR and spotted on glass slides. The resulting arrays were hybridized with mRNA probes prepared from six different adult mouse tissues. Expression profiles were analyzed by hierarchical clustering techniques. We have chosen radioactive detection because it combines robustness with sensitivity and allows the comparison of multiple normalized experiments. Sensitive detection combined with highly effective clustering algorithms allowed the identification of tissue-specific expression profiles and the detection of genes specifically expressed in the tissues investigated. The obtained results are publicly available (http://www.rzpd.de) and can be used by other researchers as a digital expression reference.
Collapse
Affiliation(s)
- H Eickhoff
- Max-Planck-Institut für Molekulare Genetik, 14195 Berlin, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Hartuv E, Schmitt AO, Lange J, Meier-Ewert S, Lehrach H, Shamir R. An algorithm for clustering cDNA fingerprints. Genomics 2000; 66:249-56. [PMID: 10873379 DOI: 10.1006/geno.2000.6187] [Citation(s) in RCA: 59] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Clustering large data sets is a central challenge in gene expression analysis. The hybridization of synthetic oligonucleotides to arrayed cDNAs yields a fingerprint for each cDNA clone. Cluster analysis of these fingerprints can identify clones corresponding to the same gene. We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. Unlike other methods, it does not assume that the clusters are hierarchically structured and does not require prior knowledge on the number of clusters. In tests with simulated libraries the algorithm outperformed the Greedy method and demonstrated high speed and robustness to high error rate. Good solution quality was also obtained in a blind test on real cDNA fingerprints.
Collapse
Affiliation(s)
- E Hartuv
- Department of Computer Science, Tel-Aviv University, Tel-Aviv, 69978, Israel
| | | | | | | | | | | |
Collapse
|
28
|
Herwig R, Poustka AJ, Müller C, Bull C, Lehrach H, O'Brien J. Large-scale clustering of cDNA-fingerprinting data. Genome Res 1999; 9:1093-105. [PMID: 10568749 PMCID: PMC310829 DOI: 10.1101/gr.9.11.1093] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Clustering is one of the main mathematical challenges in large-scale gene expression analysis. We describe a clustering procedure based on a sequential k-means algorithm with additional refinements that is able to handle high-throughput data in the order of hundreds of thousands of data items measured on hundreds of variables. The practical motivation for our algorithm is oligonucleotide fingerprinting-a method for simultaneous determination of expression level for every active gene of a specific tissue-although the algorithm can be applied as well to other large-scale projects like EST clustering and qualitative clustering of DNA-chip data. As a pairwise similarity measure between two p-dimensional data points, x and y, we introduce mutual information that can be interpreted as the amount of information about x in y, and vice versa. We show that for our purposes this measure is superior to commonly used metric distances, for example, Euclidean distance. We also introduce a modified version of mutual information as a novel method for validating clustering results when the true clustering is known. The performance of our algorithm with respect to experimental noise is shown by extensive simulation studies. The algorithm is tested on a subset of 2029 cDNA clones coming from 15 different genes from a cDNA library derived from human dendritic cells. Furthermore, the clustering of these 2029 cDNA clones is demonstrated when the entire set of 76,032 cDNA clones is processed.
Collapse
Affiliation(s)
- R Herwig
- Max-Planck Institut für Molekulare Genetik, Ihnestrasse 73, D-14195 Berlin, Germany.
| | | | | | | | | | | |
Collapse
|