1
|
Stochastics of Cellular Differentiation Explained by Epigenetics: The Case of T-Cell Differentiation and Functional Plasticity. Scand J Immunol 2017; 86:184-195. [PMID: 28799233 DOI: 10.1111/sji.12589] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2017] [Accepted: 08/06/2017] [Indexed: 12/19/2022]
Abstract
Epigenetic marks including histone modifications and DNA methylation are associated with the regulation of gene expression and activity. In addition, an increasing number of non-coding RNAs with regulatory activity on gene expression have been identified. Alongside, technological advancements allow for the analysis of these mechanisms with high resolution up to the single-cell level. For instance, the assay for transposase-accessible chromatin using sequencing (ATAC-seq) simultaneously probes for chromatin accessibility and nucleosome positioning. Thus, it provides information on two levels of epigenetic regulation. Development and differentiation of T cells into functional subset cells including memory T cells are dynamic processes driven by environmental signals. Here, we briefly review the current knowledge of how epigenetic regulation contributes to subset specification, differentiation and memory development in T cells. Specifically, we focus on epigenetic mechanisms differentially active in the two distinct T cell populations expressing αβ or γδ T cell receptors. We also discuss examples of epigenetic alterations of T cells in autoimmune diseases. DNA methylation and histone acetylation are subject to modification by several classes of 'epigenetic modifiers', some of which are in clinical use or in preclinical development. Therefore, we address the impact of some epigenetic modifiers on T-cell activation and differentiation, and discuss possible synergies with T cell-based immunotherapeutic strategies.
Collapse
|
2
|
Genome-wide chromatin profiling of Legionella pneumophila-infected human macrophages reveals activation of the pro-bacterial host factor TNFAIP2. Pneumologie 2016. [DOI: 10.1055/s-0036-1584629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
3
|
Systemic characterization of macrophage phenotypes in allergic airway inflammation. Pneumologie 2016. [DOI: 10.1055/s-0036-1584611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
4
|
X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes. Mol Psychiatry 2016; 21:133-48. [PMID: 25644381 PMCID: PMC5414091 DOI: 10.1038/mp.2014.193] [Citation(s) in RCA: 208] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 11/17/2014] [Accepted: 12/08/2014] [Indexed: 12/27/2022]
Abstract
X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.
Collapse
|
5
|
Alternative and classic activation of primary macrophages – a systems biology approach. Pneumologie 2014. [DOI: 10.1055/s-0034-1367782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
6
|
Alternative and classic activation of primary macrophages – a systems biology approach. Pneumologie 2014. [DOI: 10.1055/s-0033-1363095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
7
|
Differentially expressed miRNAs after Legionella pneumophila infection of human macrophages. Pneumologie 2014. [DOI: 10.1055/s-0033-1363106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
8
|
ISCB/SPRINGER series in computational biology. Bioinformatics 2013. [DOI: 10.1093/bioinformatics/btt670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
9
|
Differenzially expressed miRNAs after Legionella pneumophila infection of human macrophages. Pneumologie 2013. [DOI: 10.1055/s-0033-1334623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
10
|
Alternative and Classic Activation of Primary Human Macrophages - A Systems Biology Approach. Pneumologie 2013. [DOI: 10.1055/s-0033-1334756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
11
|
Role of TNFAIP2 in Legionella pneumophila-induced pulmonary inflammation. Pneumologie 2013. [DOI: 10.1055/s-0033-1334622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
12
|
Analyse von microRNA-Regelkreisen in der Legionella pneumophila-Infektion. Pneumologie 2011. [DOI: 10.1055/s-0031-1296144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
13
|
Systembiologische Analyse der Legionella pneumophila-induzierten microRNA-Expression in humanem Alveolarepithel. Pneumologie 2011. [DOI: 10.1055/s-0031-1272253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
14
|
Systembiologische Analyse der klassischen und alternativen Makrophagenaktivierung. Pneumologie 2011. [DOI: 10.1055/s-0031-1272055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
15
|
A role for Fra1 in the control of transcriptional network reorganization following ras transformation. Cell Commun Signal 2009. [PMCID: PMC4291840 DOI: 10.1186/1478-811x-7-s1-a8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
16
|
The BREW workshop series: a stimulating experience in PhD education. Brief Bioinform 2008; 9:250-3. [DOI: 10.1093/bib/bbn002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
17
|
Exploring potential target genes of signaling pathways by predicting conserved transcription factor binding sites. Bioinformatics 2007; 19 Suppl 2:ii50-6. [PMID: 14534171 DOI: 10.1093/bioinformatics/btg1059] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Many cellular signaling pathways induce gene expression by activating specific transcription factor complexes. Conventional approaches to the prediction of transcription factor binding sites lead to a notoriously high number of false discoveries. To alleviate this problem, we consider only binding sites that are conserved in man-mouse genomic sequence comparisons. We employ two alternative methods for predicting binding sites: exact matches to validated binding site sequences and weight matrix scans. We then ask the question whether there is a characteristic association between a transcription factor or set thereof to a particular group of genes. Our approach is tested on genes, which are induced in dendritic cells in response to the cells' exposure to LPS. We chose this example because the underlying signaling pathways are well understood. We demonstrate the benefit of conserved predicted binding sites in interpreting the LPS experiment. Additionally, we find that both methods for the prediction of conserved binding sites complement one another. Finally, our results suggest a distinct role for SRF in the context of LPS-induced gene expression.
Collapse
|
18
|
Abstract
The CORG resource (Comparative Regulatory Genomics, ) provides extensive cross-species comparisons of promoter regions in particular and whole gene loci in general. Pairwise as well as multiple alignments of 10 vertebrate species form the key component of CORG. We implemented a rapid alignment approach based on weight matrix motif anchors to ensure efficient computation and biologically informative alignments. All CORG workbench components have been enhanced towards more flexibility and interactivity. Reference sequence based data presentation and analysis was put into the well-known and modular Generic Genome Browser framework. Herein, various plugins facilitate online data analysis and integration with static conservation data. Main emphasis was put on the design of a new JAVA WebStart application for comparative data display. Flexible data import and export options for standard formats complete the provided services.
Collapse
|
19
|
Abstract
MOTIVATION Even for the amino acid motifs collected in the Prosite database there may be chance occurences as opposed to those occurences where the motif is involved in fold or function of a protein. With recent mathematical advances in assessing the significance of observing such a motif a particular number of times, we can now study the over- or under-representation of particular motifs in a complete genome and attempt to make functional deductions. RESULTS We demonstrate that statistical over- or under-representation of motifs in complete proteomes may be an indicator of whether, in that organism, we are looking at chance occurrences of the motif or whether the occurrences are sufficiently numerous to suggest a systematic, and thus functionally important occurrence. This has important implications on databank annotations. AVAILABILITY The complete dataset comprising the plotted statistics of 266 Prosite motifs on 42 proteomes is available at http://algo.inria.fr/nicodeme/proteomes/proteocomp.html. The software used to compute this data has been described by Nicodème (2000, 2001). They are available either by web access as mentioned in these articles or by direct request from Pierre Nicodème.
Collapse
|
20
|
SVC: structured visualization of evolutionary sequence conservation. Nucleic Acids Res 2005; 33:W271-3. [PMID: 15991338 PMCID: PMC1160265 DOI: 10.1093/nar/gki589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We have developed a web application for the detailed analysis and visualization of evolutionary sequence conservation in complex vertebrate genes. Given a pair of orthologous genes, the protein-coding sequences are aligned. When these sequences are mapped back onto their encoding exons in the genomes, a scaffold of the conserved gene structure naturally emerges. Sequence similarity between exons and introns is analysed and embedded into the gene structure scaffold. The visualization on the SVC server provides detailed information about evolutionarily conserved features of these genes. It further allows concise representation of complex splice patterns in the context of evolutionary conservation. A particular application of our tool arises from the fact that around mRNA editing sites both exonic and intronic sequences are highly conserved. This aids in delineation of these sites. SVC is available at .
Collapse
|
21
|
The Helmholtz Network for Bioinformatics: an integrative web portal for bioinformatics resources. Bioinformatics 2004; 20:268-70. [PMID: 14734319 DOI: 10.1093/bioinformatics/btg398] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY The Helmholtz Network for Bioinformatics (HNB) is a joint venture of eleven German bioinformatics research groups that offers convenient access to numerous bioinformatics resources through a single web portal. The 'Guided Solution Finder' which is available through the HNB portal helps users to locate the appropriate resources to answer their queries by employing a detailed, tree-like questionnaire. Furthermore, automated complex tool cascades ('tasks'), involving resources located on different servers, have been implemented, allowing users to perform comprehensive data analyses without the requirement of further manual intervention for data transfer and re-formatting. Currently, automated cascades for the analysis of regulatory DNA segments as well as for the prediction of protein functional properties are provided. AVAILABILITY The HNB portal is available at http://www.hnbioinfo.de
Collapse
|
22
|
Abstract
MOTIVATION Alternative splicing is currently seen to explain the vast disparity between the number of predicted genes in the human genome and the highly diverse proteome. The mapping of expressed sequences tag (EST) consensus sequences derived from the GeneNest database onto the genome provides an efficient way of predicting exon-intron boundaries, gene structure and alternative splicing events. However, the alternative splicing events are obscured by a large number of putatively artificial exon boundaries arising due to genomic contamination or alignment errors. The current work describes a methodology to associate quality values to the predicted exon-intron boundaries. High quality exon-intron boundaries are used to predict constitutive and alternative splicing ranked by confidence values, aiming to facilitate large-scale analysis of alternative splicing and splicing in general. RESULTS Applying the current methodology, constitutive splicing is observed in 33,270 EST clusters, out of which 45% are alternatively spliced. The classification derived from the computed confidence values for 17 of these splice events frequently correlate (15/17) with RT-PCR experiments performed for 40 different tissue samples. As an application of the confidence measure, an evaluation of distribution of alternative splicing revealed that majority of variants correspond to the coding regions of the genes. However, still a significant fraction maps to non-coding regions, thereby indicating a functional relevance of alternative splicing in untranslated regions. AVAILABILITY The predicted alternative splice variants are visualized in the SpliceNest database at http://splicenest.molgen.mpg.de
Collapse
|
23
|
Maximum likelihood estimation of mathematical models for genetic development in gastrointestinal stromal tumors. Pathol Res Pract 2004. [DOI: 10.1016/s0344-0338(04)80768-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
24
|
An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome. Genome Biol 2003; 5:R3. [PMID: 14709175 PMCID: PMC395735 DOI: 10.1186/gb-2003-5-1-r3] [Citation(s) in RCA: 99] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2003] [Revised: 10/13/2003] [Accepted: 11/19/2003] [Indexed: 11/19/2022] Open
Abstract
A novel Drosophila microarray constructed on the basis of an integrated in silico/wet biology approach provides evidence for the transcription of approximately 2,600 additional genes. Validation indicates a lower limit of 2,000 novel annotations, thus raising the number of genes that make a fly. Background While the genome sequences for a variety of organisms are now available, the precise number of the genes encoded is still a matter of debate. For the human genome several stringent annotation approaches have resulted in the same number of potential genes, but a careful comparison revealed only limited overlap. This indicates that only the combination of different computational prediction methods and experimental evaluation of such in silico data will provide more complete genome annotations. In order to get a more complete gene content of the Drosophila melanogaster genome, we based our new D. melanogaster whole-transcriptome microarray, the Heidelberg FlyArray, on the combination of the Berkeley Drosophila Genome Project (BDGP) annotation and a novel ab initio gene prediction of lower stringency using the Fgenesh software. Results Here we provide evidence for the transcription of approximately 2,600 additional genes predicted by Fgenesh. Validation of the developmental profiling data by RT-PCR and in situ hybridization indicates a lower limit of 2,000 novel annotations, thus substantially raising the number of genes that make a fly. Conclusions The successful design and application of this novel Drosophila microarray on the basis of our integrated in silico/wet biology approach confirms our expectation that in silico approaches alone will always tend to be incomplete. The identification of at least 2,000 novel genes highlights the importance of gathering experimental evidence to discover all genes within a genome. Moreover, as such an approach is independent of homology criteria, it will allow the discovery of novel genes unrelated to known protein families or those that have not been strictly conserved between species.
Collapse
|
25
|
Abstract
Sequence conservation in non-coding, upstream regions of orthologous genes from man and mouse is likely to reflect common regulatory DNA sites. Motivated by this assumption we have delineated a catalogue of conserved non-coding sequence blocks and provide the CORG-'COmparative Regulatory Genomics'-database. The data were computed based on statistically significant local suboptimal alignments of 15 kb regions upstream of the translation start sites of, currently, 10 793 pairs of orthologous genes. The resulting conserved non-coding blocks were annotated with EST matches for easier detection of non-coding mRNA and with hits to known transcription factor binding sites. CORG data are accessible from the ENSEMBL web site via a DAS service as well as a specially developed web service (http://corg.molgen.mpg.de) for query and interactive visualization of the conserved blocks and their annotation.
Collapse
|
26
|
Mathematical tree models for cytogenetic development in solid tumors. VERHANDLUNGEN DER DEUTSCHEN GESELLSCHAFT FUR PATHOLOGIE 2003; 87:188-92. [PMID: 16888912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
We present a new approach for modeling the occurrence of genetic changes in human tumors over time. In solid tumors, data on genetic alterations are usually only available at a single point in time, allowing no direct insight into the sequential order of genetic events. In our approach, genetic tumor development and progression is assumed to follow a probabilistic tree model. We use maximum likelihood estimation to reconstruct a tree model for the genetic evolution of a given tumor type. The use of the proposed method is illustrated by an application to cytogenetic data from 173 cases of clear cell renal cell carcinoma, which results in a model for the karyotypic evolution of this tumor.
Collapse
|
27
|
|
28
|
Identifying splits with clear separation: a new class discovery method for gene expression data. Bioinformatics 2002; 17 Suppl 1:S107-14. [PMID: 11472999 DOI: 10.1093/bioinformatics/17.suppl_1.s107] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We present a new class discovery method for microarray gene expression data. Based on a collection of gene expression profiles from different tissue samples, the method searches for binary class distinctions in the set of samples that show clear separation in the expression levels of specific subsets of genes. Several mutually independent class distinctions may be found, which is difficult to obtain from most commonly used clustering algorithms. Each class distinction can be biologically interpreted in terms of its supporting genes. The mathematical characterization of the favored class distinctions is based on statistical concepts. By analyzing three data sets from cancer gene expression studies, we demonstrate that our method is able to detect biologically relevant structures, for example cancer subtypes, in an unsupervised fashion.
Collapse
|
29
|
Transcription profiling of renal cell carcinoma. VERHANDLUNGEN DER DEUTSCHEN GESELLSCHAFT FUR PATHOLOGIE 2002; 86:153-64. [PMID: 12647365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 04/20/2023]
Abstract
AIMS Our aim was to prepare a comprehensive catalogue of the changes in gene expression accompanying the development and progression of renal cell carcinoma, and to correlate these with histo-pathological, cytogenetic and clinical findings. METHODS mRNA samples from paired neoplastic and non-cancerous human kidney tissue were labeled and hybridized in duplicate against high-density cDNA arrays. Two array technologies were used: 31,500-element transcriptome-wide nylon arrays for hybridization with 37 radioactively labelled sample pairs, and 4200-element kidney- and cancer-specific glass microarrays for hybridization with 19 fluorescently labelled sample pairs. RESULTS We identified more than 1700 cDNA clones that show differential transcription levels in kidney tumor tissue compared to normal kidney tissue. The functional classification of 389 annotated genes provided views of the changes in the activities of specific biological processes in renal cancer. Among the biological processes with a large proportion of up-regulated genes we found cell adhesion, signal transduction, and nucleotide metabolism. Down-regulated processes included small molecule transport, ion homeostasis, and oxygen and radical metabolism. Furthermore, we explored the feasibility of molecular diagnosis for renal cell tumors using cDNA microarrays on glass slides, investigating the association of transcription levels with tumor type, progression, and a putative prognostic variable. The experimental data is available from the GEO gene expression database (http://www.ncbi.nlm.nih.gov/geo; accession no. GSE3), and a comprehensive presentation of the results is available in the web supplement (http://www.dkfz-heidelberg.de/abt0840/whuber/rcc). CONCLUSION Transcription profiling using high-density cDNA arrays is a powerful method with the potential to improve cancer diagnosis and prognosis. The identification and classification of differentially transcribed genes, as described in our study, is the beginning of a more complete understanding of kidney cancer.
Collapse
|
30
|
Abstract
We present a database search method that is based on phylogenetic trees (treesearch). The method is used to search a protein sequence database for homologs to a protein family. In preparation for the search, a phylogenetic tree is constructed from a given multiple alignment of the family. During the search, each database sequence is temporarily inserted into the tree, thus adding a new edge to the tree. Homology between family and sequence is then judged from the length of this edge. In a comparison of our method to profiles (ISREC pfsearch), two implementations of hidden Markov models (HMMER hmmsearch and SAM hmmscore), and to the family pairwise search (FPS) method on 43 families from the SCOP database based on minimum false-positive counts (min-FPCs), we found a considerable gain in sensitivity. In 69% of the test cases, treesearch showed a min-FPC of at most 50, whereas the two second best methods (hmmsearch and FPS) showed this performance only in 53% cases. A similar impression holds for a large range of min-FPC thresholds. The results demonstrate that phylogenetic information can significantly improve the detection of distant homologies and justify our method as a useful alternative to existing methods.
Collapse
|
31
|
Abstract
Microarray analysis has become a widely used tool for the generation of gene expression data on a genomic scale. Although many significant results have been derived from microarray studies, one limitation has been the lack of standards for presenting and exchanging such data. Here we present a proposal, the Minimum Information About a Microarray Experiment (MIAME), that describes the minimum information required to ensure that microarray data can be easily interpreted and that results derived from its analysis can be independently verified. The ultimate goal of this work is to establish a standard for recording and reporting microarray-based gene expression data, which will in turn facilitate the establishment of databases and public repositories and enable the development of data analysis tools. With respect to MIAME, we concentrate on defining the content and structure of the necessary information rather than the technical format for capturing it.
Collapse
|
32
|
Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31,500-element cDNA array. Genome Res 2001; 11:1861-70. [PMID: 11691851 PMCID: PMC311168 DOI: 10.1101/gr.184501] [Citation(s) in RCA: 153] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2001] [Accepted: 08/07/2001] [Indexed: 11/24/2022]
Abstract
We investigated the changes in gene expression accompanying the development and progression of kidney cancer by use of 31,500-element complementary DNA arrays. We measured expression profiles for paired neoplastic and noncancerous renal epithelium samples from 37 individuals. Using an experimental design optimized for factoring out technological and biological noise, and an adapted statistical test, we found 1738 differentially expressed cDNAs with an expected number of six false positives. Functional annotation of these genes provided views of the changes in the activities of specific biological pathways in renal cancer. Cell adhesion, signal transduction, and nucleotide metabolism were among the biological processes with a large proportion of genes overexpressed in renal cell carcinoma. Down-regulated pathways in the kidney tumor cells included small molecule transport, ion homeostasis, and oxygen and radical metabolism. Our expression profiling data uncovered gene expression changes shared with other epithelial tumors, as well as a unique signature for renal cell carcinoma. [Expression data for the differentially expressed cDNAs are available as a Web supplement at http://www.dkfz-heidelberg.de/abt0840/whuber/rcc.]
Collapse
|
33
|
Abstract
Correspondence analysis is an explorative computational method for the study of associations between variables. Much like principal component analysis, it displays a low-dimensional projection of the data, e.g., into a plane. It does this, though, for two variables simultaneously, thus revealing associations between them. Here, we demonstrate the applicability of correspondence analysis to and high value for the analysis of microarray data, displaying associations between genes and experiments. To introduce the method, we show its application to the well-known Saccharomyces cerevisiae cell-cycle synchronization data by Spellman et al. [Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. & Futcher, B. (1998) Mol. Biol. Cell 9, 3273-3297], allowing for comparison with their visualization of this data set. Furthermore, we apply correspondence analysis to a non-time-series data set of our own, thus supporting its general applicability to microarray data of different complexity, underlying structure, and experimental strategy (both two-channel fluorescence-tag and radioactive labeling).
Collapse
|
34
|
Abstract
The estimation of amino acid replacement frequencies during molecular evolution is crucial for many applications in sequence analysis. Score matrices for database search programs or phylogenetic analysis rely on such models of protein evolution. Pioneering work was done by Dayhoff et al. (1978) who formulated a Markov model of evolution and derived the famous PAM score matrices. Her estimation procedure for amino acid exchange frequencies is restricted to pairs of proteins that have a constant and small degree of divergence. Here we present an improved estimator, called the resolvent method, that is not subject to these limitations. This extension of Dayhoff's approach enables us to estimate an amino acid substitution model from alignments of varying degree of divergence. Extensive simulations show the capability of the new estimator to recover accurately the exchange frequencies among amino acids. Based on the SYSTERS database of aligned protein families (Krause and Vingron, 1998) we recompute a series of score matrices.
Collapse
|
35
|
|
36
|
Abstract
In physical mapping, one orders a set of genetic landmarks or a library of cloned fragments of DNA according to their position in the genome. Our approach to physical mapping divides the problem into smaller and easier subproblems by partitioning the probe set into independent parts (probe contigs). For this purpose we introduce a new distance function between probes, the averaged rank distance (ARD) derived from bootstrap resampling of the raw data. The ARD measures the pairwise distances of probes within a contig and smoothes the distances of probes across different contigs. It shows distinct jumps at contig borders. This makes it appropriate for contig selection by clustering. We have designed a physical mapping algorithm that makes use of these observations and seems to be particularly well suited to the delineation of reliable contigs. We evaluated our method on data sets from two physical mapping projects. On data from the recently sequenced bacterium Xylella fastidiosa, the probe contig set produced by the new method was evaluated using the probe order derived from the sequence information. Our approach yielded a basically correct contig set. On this data we also compared our method to an approach which uses the number of supporting clones to determine contigs. Our map is much more accurate. In comparison to a physical map of Pasteurella haemolytica that was computed using simulated annealing, the newly computed map is considerably cleaner. The results of our method have already proven helpful for the design of experiments aimed at further improving the quality of a map.
Collapse
|
37
|
Abstract
MOTIVATION Noise in database searches resulting from random sequence similarities increases as the databases expand rapidly. The noise problems are not a technical shortcoming of the database search programs, but a logical consequence of the idea of homology searches. The effect can be observed in simulation experiments. RESULTS We have investigated noise levels in pairwise alignment based database searches. The noise levels of 38 releases of the SwissProt database, display perfect logarithmic growth with the total length of the databases. Clustering of real biological sequences reduces noise levels, but the effect is marginal.
Collapse
|
38
|
Abstract
MOTIVATION The technology of hybridization to DNA arrays is used to obtain the expression levels of many different genes simultaneously. It enables searching for genes that are expressed specifically under certain conditions. However, the technology produces large amounts of data demanding computational methods for their analysis. It is necessary to find ways to compare data from different experiments and to consider the quality and reproducibility of the data. RESULTS Data analyzed in this paper have been generated by hybridization of radioactively labeled targets to DNA arrays spotted on nylon membranes. We introduce methods to compare the intensity values of several hybridization experiments. This is essential to find differentially expressed genes or to do pattern analysis. We also discuss possibilities for quality control of the acquired data. AVAILABILITY http://www.dkfz.de/tbi CONTACT M.Vingron@dkfz-heidelberg.de
Collapse
|
39
|
|
40
|
Abstract
Ordering genetic markers or clones from a genomic library into a physical map is a central problem in genetics. In the presence of errors, there is no efficient algorithm known that solves this problem. Based on a standard heuristic algorithm for it, we present a method to construct a confidence neighborhood for a computed solution. We compute a confidence value for putative local solutions derived from bootstrap replicates of the original solution. In the reliable parts, the confidence neighborhood and the computed solution tend to coincide. In regions that are ill-defined by the data, the neighborhood contains additional reasonable alternatives. This offers the possibility of designing further experiments for the badly defined regions to improve the quality of the physical map. We analyze our approach by a simulation study and by application to a dataset of the genome of the bacterium Xylella fastidiosa.
Collapse
|
41
|
In silico analysis of gene expression patterns during early development of Xenopus laevis. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2000:443-54. [PMID: 10902192 DOI: 10.1142/9789814447331_0042] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The information as to where and when a mRNA is present in a given cell is essential to bridge the gap between the DNA sequence of a gene and its physiological function. Therefore, a major component of functional genomics is to characterize the levels and the spatio-temporal domains of gene expression. Currently, there is just a few specialised public databases available storing the data on gene expression while they are needed as a resource for the field. Moreover, there is a need to develop and assess computational tools to compare and analyse expression profiles in a suitable way for biological interpretation. Here we describe our recent work on developing a database on gene expression for the frog Xenopus laevis, and on setting up and using new tools for the analysis and comparison of gene expression patterns. We used histogram clustering to compare expression profiles at both gene and tissue levels using a set of data coming from the characterization of the expression of genes during early development of Xenopus. This enabled us to draw a tree of tissue relatedness and to identify coexpressed genes by in silico analysis.
Collapse
|
42
|
TIF-IA, the factor mediating growth-dependent control of ribosomal RNA synthesis, is the mammalian homolog of yeast Rrn3p. EMBO Rep 2000; 1:171-5. [PMID: 11265758 PMCID: PMC1084264 DOI: 10.1093/embo-reports/kvd032] [Citation(s) in RCA: 111] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2000] [Revised: 06/19/2000] [Accepted: 06/28/2000] [Indexed: 11/13/2022] Open
Abstract
Cells carefully modulate the rate of rRNA transcription in order to prevent an overinvestment in ribosome synthesis under less favorable nutritional conditions. In mammals, growth-dependent regulation of RNA polymerase I (Pol I) transcription is mediated by TIF-IA, an essential initiation factor that is active in extracts from growing but not starved or cycloheximide-treated mammalian cells. Here we report the molecular cloning and functional characterization of recombinant TIF-IA, which turns out to be the mammalian homolog of the yeast factor Rrn3p. We demonstrate that TIF-IA interacts with Pol I in the absence of template DNA, augments Pol I transcription in vivo and rescues transcription in extracts from growth-arrested cells in vitro.
Collapse
|
43
|
Abstract
We describe approaches to improve the detection of proteins by postharvest alkylation and subsequent radioactive labeling with either [3H]iodoacetamide or 125I. Database protein sequence analysis suggested that cysteine is not suitable for detection of the entire proteome, but that cysteine alkylating reagents can increase the number of proteins able to be detected by iodination chemistry. Proteins were alkylated with beta-(4-hydroxyphenyl)ethyl iodoacetamide, or with 1,5-l-AEDANS (the Hudson Weber reagent). Subsequent iodination using the Iodo-Gen system was found to be most efficient. The enhanced sensitivity obtainable by using these approaches is expected to be sufficient for visualization of the lowest copy number proteins from human cells, such as from clinical samples. However, we argue that significantly improved methods of protein separation will be necessary to resolve the large number of proteins expected to be detectable with this sensitivity.
Collapse
|
44
|
Abstract
Transcriptional profiling on DNA arrays has become a synonym for the type of analyses that aim to understand cellular functioning in a comprehensive manner. In this review, the status of the technology is briefly discussed, with emphasis on some inherent weaknesses and problems.
Collapse
|
45
|
Axeldb: a Xenopus laevis database focusing on gene expression. Nucleic Acids Res 2000; 28:139-40. [PMID: 10592204 PMCID: PMC102398 DOI: 10.1093/nar/28.1.139] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/1999] [Revised: 09/22/1999] [Accepted: 10/04/1999] [Indexed: 11/13/2022] Open
Abstract
Axeldb is a database storing and integrating gene expression patterns and DNA sequences identified in a large-scale in situ hybridization study in Xenopus laevis embryos. The data are organised in a format appropriate for comprehensive analysis, and enable comparison of images of expression pattern for any given set of genes. Information on literature, cDNA clones and their availability, nucleotide sequences, expression pattern and accompanying pictures are available. Current developments are aimed toward the interconnection with other databases and the integration of data from the literature. Axeldb is implemented using an ACEDB database system, and available through the web at http://www.dkfz-heidelberg.de/abt0135/axeldb.htm
Collapse
|
46
|
The SYSTERS protein sequence cluster set. Nucleic Acids Res 2000; 28:270-2. [PMID: 10592244 PMCID: PMC102384 DOI: 10.1093/nar/28.1.270] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/1999] [Revised: 09/17/1999] [Accepted: 10/04/1999] [Indexed: 11/13/2022] Open
Abstract
The SYSTERS (short for SYSTEmatic Re-Searching) protein sequence cluster set consists of the classification of all sequences from SWISS-PROT and PIR into disjoint protein family clusters and hierarchically into superfamily and subfamily clusters. The cluster set can be searched with a sequence using the SSMAL search tool or a traditional database search tool like BLAST or FASTA. Additionally a multiple alignment is generated for each cluster and annotated with domain information from the Pfam database of protein domain families. A taxonomic overview of the organisms covered by a cluster is given based on the NCBI taxonomy. The cluster set is available for querying and browsing at http://www.dkfz-heidelberg. de/tbi/services/cluster/systersform
Collapse
|
47
|
|
48
|
Abstract
SUMMARY We present a Web server where the SYSTERS cluster set of the non-redundant protein database consisting of sequences from SWISS-PROT and PIR is being made available for querying and browsing. The cluster set can be searched with a new sequence using the SSMAL search tool. Additionally, a multiple alignment is generated for each cluster and annotated with domain information from the Pfam protein family database. AVAILABILITY The server address is http://www.dkfz-heidelberg.de/tbi/services/cluster/ systersform
Collapse
|
49
|
Abstract
Several experimental techniques are available nowadays to study the spectrum of genes expressed in a cell at a specific moment. Typically, such methods generate large amounts of expression data that may be hard to interpret. Here we review computational questions and approaches resulting from the various experimental techniques.
Collapse
|
50
|
Abstract
Ribonucleic acid (RNA) is a polymer composed of four bases denoted A, C, G, and U. It generally is a single-stranded molecule where the bases form hydrogen bonds within the same molecule leading to structure formation. In comparing different homologous RNA molecules it is important to consider both the base sequence and the structure of the molecules. Traditional alignment algorithms can only account for the sequence of bases, but not for the base pairings. Considering the structure leads to significant computational problems because of the dependencies introduced by the base pairings. In this paper we address the problem of optimally aligning a given RNA sequence of unknown structure to one of known sequence and structure. We phrase the problem as an integer linear program and then solve it using methods from polyhedral combinatorics. In our computational experiments we could solve large problem instances--23S ribosomal RNA with more than 1400 bases--a size intractable for former algorithms.
Collapse
|