1
|
Li J, Zou Q, Yuan L. A review from biological mapping to computation-based subcellular localization. MOLECULAR THERAPY. NUCLEIC ACIDS 2023; 32:507-521. [PMID: 37215152 PMCID: PMC10192651 DOI: 10.1016/j.omtn.2023.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Subcellular localization is crucial to the study of virus and diseases. Specifically, research on protein subcellular localization can help identify clues between virus and host cells that can aid in the design of targeted drugs. Research on RNA subcellular localization is significant for human diseases (such as Alzheimer's disease, colon cancer, etc.). To date, only reviews addressing subcellular localization of proteins have been published, which are outdated for reference, and reviews of RNA subcellular localization are not comprehensive. Therefore, we collated (the most up-to-date) literature on protein and RNA subcellular localization to help researchers understand changes in the field of protein and RNA subcellular localization. Extensive and complete methods for constructing subcellular localization models have also been summarized, which can help readers understand the changes in application of biotechnology and computer science in subcellular localization research and explore how to use biological data to construct improved subcellular localization models. This paper is the first review to cover both protein subcellular localization and RNA subcellular localization. We urge researchers from biology and computational biology to jointly pay attention to transformation patterns, interrelationships, differences, and causality of protein subcellular localization and RNA subcellular localization.
Collapse
Affiliation(s)
- Jing Li
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324000, China
- School of Biomedical Sciences, University of Hong Kong, Hong Kong, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang 324000, China
| | - Lei Yuan
- Department of Hepatobiliary Surgery, Quzhou People's Hospital, 100 Minjiang Main Road, Quzhou, Zhejiang 324000, China
| |
Collapse
|
2
|
van der Wal T, Lambooij JP, van Amerongen R. TMEM98 is a negative regulator of FRAT mediated Wnt/ß-catenin signalling. PLoS One 2020; 15:e0227435. [PMID: 31961879 PMCID: PMC6974163 DOI: 10.1371/journal.pone.0227435] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Accepted: 12/18/2019] [Indexed: 12/19/2022] Open
Abstract
Wnt/ß-catenin signalling is crucial for maintaining the balance between cell proliferation and differentiation, both during tissue morphogenesis and in tissue maintenance throughout postnatal life. Whereas the signalling activities of the core Wnt/ß-catenin pathway components are understood in great detail, far less is known about the precise role and regulation of the many different modulators of Wnt/ß-catenin signalling that have been identified to date. Here we describe TMEM98, a putative transmembrane protein of unknown function, as an interaction partner and regulator of the GSK3-binding protein FRAT2. We show that TMEM98 reduces FRAT2 protein levels and, accordingly, inhibits the FRAT2-mediated induction of ß-catenin/TCF signalling. We also characterize the intracellular trafficking of TMEM98 in more detail and show that it is recycled between the plasma membrane and the Golgi. Together, our findings not only reveal a new layer of regulation for Wnt/ß-catenin signalling, but also a new biological activity for TMEM98.
Collapse
Affiliation(s)
- Tanne van der Wal
- Section of Molecular Cytology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, the Netherlands
- Van Leeuwenhoek Centre for Advanced Microscopy, University of Amsterdam, Amsterdam, the Netherlands
| | - Jan-Paul Lambooij
- Division of Molecular Genetics, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Renée van Amerongen
- Section of Molecular Cytology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, the Netherlands
- Van Leeuwenhoek Centre for Advanced Microscopy, University of Amsterdam, Amsterdam, the Netherlands
- * E-mail:
| |
Collapse
|
3
|
Snow CJ, Dar A, Dutta A, Kehlenbach RH, Paschal BM. Defective nuclear import of Tpr in Progeria reflects the Ran sensitivity of large cargo transport. ACTA ACUST UNITED AC 2013; 201:541-57. [PMID: 23649804 PMCID: PMC3653351 DOI: 10.1083/jcb.201212117] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Nuclear transport of large protein cargoes such as Tpr is more sensitive to the alteration of the ratio of nuclear to cytoplasmic Ran that occurs in Progeria. The RanGTPase acts as a master regulator of nucleocytoplasmic transport by controlling assembly and disassembly of nuclear transport complexes. RanGTP is required in the nucleus to release nuclear localization signal (NLS)–containing cargo from import receptors, and, under steady-state conditions, Ran is highly concentrated in the nucleus. We previously showed the nuclear/cytoplasmic Ran distribution is disrupted in Hutchinson-Gilford Progeria syndrome (HGPS) fibroblasts that express the Progerin form of lamin A, causing a major defect in nuclear import of the protein, translocated promoter region (Tpr). In this paper, we show that Tpr import was mediated by the most abundant import receptor, KPNA2, which binds the bipartite NLS in Tpr with nanomolar affinity. Analyses including NLS swapping revealed Progerin did not cause global inhibition of nuclear import. Rather, Progerin inhibited Tpr import because transport of large protein cargoes was sensitive to changes in the Ran nuclear/cytoplasmic distribution that occurred in HGPS. We propose that defective import of large protein complexes with important roles in nuclear function may contribute to disease-associated phenotypes in Progeria.
Collapse
Affiliation(s)
- Chelsi J Snow
- Center for Cell Signaling, University of Virginia, Charlottesville, VA 22903, USA
| | | | | | | | | |
Collapse
|
4
|
Guo L, Yang LY, Fan C, Chen GD, Wu F. Novel roles of Vmp1: inhibition metastasis and proliferation of hepatocellular carcinoma. Cancer Sci 2012; 103:2110-9. [PMID: 22971212 DOI: 10.1111/cas.12025] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2012] [Revised: 08/27/2012] [Accepted: 09/02/2012] [Indexed: 12/22/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is one of the most deadly human cancers because of its high incidence of metastasis. Despite extensive efforts, therapies against metastasis of HCC remain underdeveloped. Vacuole membrane protein 1 (Vmp1) was recently identified to be involved in cancer-relevant processes; however, its expression, clinical significance and biological function in HCC progression are still unknown. Therefore, we evaluated the expression of Vmp1 in human HCC specimens. To functionally characterize Vmp1 in HCC, we upregulated its expression in HCCLM3 cells using a plasmid transfection approach, following which both in vitro and in vivo models were used to elucidate its role. A significant downregulation of Vmp1 was found in human HCC tissues and closely correlated with multiple tumor nodes, absence of capsular formation, vein invasion and poor prognosis of HCC. Such expression was verified with HCC cell lines including HepG2, MHCC97-L and HCCLM3, and the Vmp1 expression levels negatively correlated with metastatic potential. Interestingly, upregulation of Vmp1 significantly affects proliferation, migration, invasion and adhesion of HCCLM3 cells. Using a mouse model, we demonstrated that upregulation of Vmp1 was associated with suppression of growth and pulmonary metastases of HCC. Therefore, our data suggest Vmp1 is a novel prognostic marker and potential therapeutic target for metastasis of HCC.
Collapse
Affiliation(s)
- Lei Guo
- Liver Cancer Laboratory, Department of Surgery, Xiangya Hospital, Central South University, Changsha, China
| | | | | | | | | |
Collapse
|
5
|
Nanni L, Lumini A. Ensemble of Neural Networks for Automated Cell Phenotype Image Classification. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Subcellular location is related to the knowledge of the spatial distribution of a protein within the cell. The knowledge of the location of all proteins is crucial for several applications ranging from early diagnosis of a disease to monitoring of therapeutic effectiveness of drugs. This chapter focuses on the study of machine learning techniques for cell phenotype image classification and is aimed at pointing out some of the advantages of using a multi-classifier system instead of a stand-alone method to solve this difficult classification problem. The main problems and solutions proposed in this field are discussed and a new approach is proposed based on ensemble of neural networks trained by local and global features. Finally, the most used benchmarks for this problem are presented and an experimental comparison among several state-of-the-art approaches is reported which allows to quantify the performance improvement obtained by the approach proposed in this chapter.
Collapse
|
6
|
Wälde S, Thakar K, Hutten S, Spillner C, Nath A, Rothbauer U, Wiemann S, Kehlenbach RH. The nucleoporin Nup358/RanBP2 promotes nuclear import in a cargo- and transport receptor-specific manner. Traffic 2011; 13:218-33. [PMID: 21995724 DOI: 10.1111/j.1600-0854.2011.01302.x] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Revised: 10/11/2011] [Accepted: 10/11/2011] [Indexed: 12/31/2022]
Abstract
In vertebrates, the nuclear pore complex (NPC), the gate for transport of macromolecules between the nucleus and the cytoplasm, consists of approximately 30 different nucleoporins (Nups). The Nup and SUMO E3-ligase Nup358/RanBP2 are the major components of the cytoplasmic filaments of the NPC. In this study, we perform a structure-function analysis of Nup358 and describe its role in nuclear import of specific proteins. In a screen for nuclear proteins that accumulate in the cytoplasm upon Nup358 depletion, we identified proteins that were able to interact with Nup358 in a receptor-independent manner. These included the importin α/β-cargo DBC-1 (deleted in breast cancer 1) and DMAP-1 (DNA methyltransferase 1 associated protein 1). Strikingly, a short N-terminal fragment of Nup358 was sufficient to promote import of DBC-1, whereas DMAP-1 required a larger portion of Nup358 for stimulated import. Neither the interaction of RanGAP with Nup358 nor its SUMO-E3 ligase activity was required for nuclear import of all tested cargos. Together, Nup358 functions as a cargo- and receptor-specific assembly platform, increasing the efficiency of nuclear import of proteins through various mechanisms.
Collapse
Affiliation(s)
- Sarah Wälde
- Department of Biochemistry I, Faculty of Medicine, Georg-August-University of Göttingen, Humboldtallee 23, 37073, Göttingen, Germany
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Fagerberg L, Stadler C, Skogs M, Hjelmare M, Jonasson K, Wiking M, Åbergh A, Uhlén M, Lundberg E. Mapping the Subcellular Protein Distribution in Three Human Cell Lines. J Proteome Res 2011; 10:3766-77. [DOI: 10.1021/pr200379a] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Linn Fagerberg
- School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, SE-106 91, Sweden
| | - Charlotte Stadler
- Science for Life Laboratory, Royal Institute of Technology (KTH), Stockholm, SE-171 65, Sweden
| | - Marie Skogs
- Science for Life Laboratory, Royal Institute of Technology (KTH), Stockholm, SE-171 65, Sweden
| | - Martin Hjelmare
- Science for Life Laboratory, Royal Institute of Technology (KTH), Stockholm, SE-171 65, Sweden
| | - Kalle Jonasson
- School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, SE-106 91, Sweden
| | - Mikaela Wiking
- Science for Life Laboratory, Royal Institute of Technology (KTH), Stockholm, SE-171 65, Sweden
| | - Annica Åbergh
- Science for Life Laboratory, Royal Institute of Technology (KTH), Stockholm, SE-171 65, Sweden
| | - Mathias Uhlén
- School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, SE-106 91, Sweden
- Science for Life Laboratory, Royal Institute of Technology (KTH), Stockholm, SE-171 65, Sweden
| | - Emma Lundberg
- Science for Life Laboratory, Royal Institute of Technology (KTH), Stockholm, SE-171 65, Sweden
| |
Collapse
|
8
|
Groen AJ, Lilley KS. Proteomics of total membranes and subcellular membranes. Expert Rev Proteomics 2011; 7:867-78. [PMID: 21142888 DOI: 10.1586/epr.10.85] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Membrane proteins are key molecules in the cell and are important targets for drug development. Much effort has, therefore, been directed towards research of this group of proteins, but their hydrophobic nature can make working with them challenging. Here we discuss methodologies used in the study of the membrane proteome, specifically discussing approaches that circumvent technical issues specific to the membrane. In addition, we review several techniques used for visualization, qualification, quantitation and localization of membrane proteins. The combination of the techniques we describe holds great promise to allow full characterization of the membrane proteome and to map the dynamic changes within it essential for cellular function.
Collapse
Affiliation(s)
- Arnoud J Groen
- Cambridge Centre for Proteomics, Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Cambridge, UK
| | | |
Collapse
|
9
|
|
10
|
Baras A, Moskaluk CA. Intracellular localization of GASP/ECOP/VOPP1. J Mol Histol 2010; 41:153-64. [PMID: 20571887 DOI: 10.1007/s10735-010-9272-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2010] [Accepted: 05/25/2010] [Indexed: 11/30/2022]
Abstract
Vesicular Over-expressed in cancer Prosurvival Protein 1 (VOPP1), also known as Glioblastoma Amplified and Secreted Protein and EGFR-Coamplified and Over-expressed Protein has been previously shown to be over-expressed in human glioblastoma multiforme and squamous cell carcinoma. Additionally, previous experimental work suggests that it confers a prosurvival cellular phenotype. A query of a public database of gene expression profiling data (Oncomine) shows that the VOPP1 transcript is also highly expressed in several other common human cancers, including breast carcinoma, pancreatic carcinoma, and lymphoma. Analysis of VOPP1 sequence structure shows both a signal sequence and a transmembrane domain, and examination of a public microarray dataset for endoplasmic reticulum (ER)-bound mRNA transcripts is consistent with the VOPP1 protein product being synthesized into the ER. Immunoblot analysis of cell culture and conditioned media confirms that the protein product is not secreted and is retained intracellularly. VOPP1 protein tagged with a fluorescence reporter, as well as antibody-mediated visualization of recombinant and native forms of the protein reveals an intracellular vesicular pattern of localization. Co-localization experiments reveal that VOPP1 vesicles do not co-localize with mitochondria or peroxisomes, but show partial co-localization with perinuclear lysosomes. Additionally, markers of endocytosis and autophagy show partial perinuclear co-localization, suggesting that VOPP1-containing vesicles enter final common pathways of the lysosomal system. These findings throw into doubt the hypothesis that VOPP1 interacts directly with cytoplasmic mediators of the NF kappa B pathway, and suggest that the prosurvival phenotype conferred by this gene product is mediated by other mechanisms.
Collapse
Affiliation(s)
- Alexander Baras
- Department of Pathology, University of Virginia, Charlottesville, 22908, USA.
| | | |
Collapse
|
11
|
|
12
|
Schulze JO, Quedenau C, Roske Y, Adam T, Schüler H, Behlke J, Turnbull AP, Sievert V, Scheich C, Mueller U, Heinemann U, Büssow K. Structural and functional characterization of human Iba proteins. FEBS J 2008; 275:4627-40. [DOI: 10.1111/j.1742-4658.2008.06605.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
13
|
Tsai YS, Chung IF, Simpson JC, Lee MI, Hsiung CC, Chiu TY, Kao LS, Chiu TC, Lin CT, Lin WC, Liang SF, Lin CC. Automated recognition system to classify subcellular protein localizations in images of different cell lines acquired by different imaging systems. Microsc Res Tech 2008; 71:305-14. [PMID: 18069668 DOI: 10.1002/jemt.20555] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Systemic analysis of subcellular protein localization (location proteomics) provides clues for understanding gene functions and physiological condition of the cells. However, recognition of cell images of subcellular structures highly depends on experience and becomes the rate-limiting step when classifying subcellular protein localization. Several research groups have extracted specific numerical features for the recognition of subcellular protein localization, but these recognition systems are restricted to images of single particular cell line acquired by one specific imaging system and not applied to recognize a range of cell image sources. In this study, we establish a single system for automated subcellular structure recognition to identify cell images from various sources. Two different sources of cell images, 317 Vero (http://gfp-cdna.embl.de) and 875 CHO cell images of subcellular structures, were used to train and test the system. When the system was trained by a single source of images, the recognition rate is high and specific to the trained source. The system trained by the CHO cell images gave high average recognition accuracy for CHO cells of 96%, but this was reduced to 46% with Vero images. When we trained the system using a mixture of CHO and Vero cell images, an average accuracy of recognition reached 86.6% for both CHO and Vero cell images. The system can reject images with low confidence and identify the cell images correctly recognized to avoid manual reconfirmation. In summary, we have established a single system that can recognize subcellular protein localizations from two different sources for location-proteomic studies. studies.
Collapse
Affiliation(s)
- Yuh-Show Tsai
- Department of Biomedical Engineering, Chung Yuan Christian University, Jhongli, Taiwan, Republic of China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
A reliable method for cell phenotype image classification. Artif Intell Med 2008; 43:87-97. [DOI: 10.1016/j.artmed.2008.03.005] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2007] [Revised: 02/28/2008] [Accepted: 03/10/2008] [Indexed: 11/19/2022]
|
15
|
Kohl T, Schmidt C, Wiemann S, Poustka A, Korf U. Automated production of recombinant human proteins as resource for proteome research. Proteome Sci 2008; 6:4. [PMID: 18226205 PMCID: PMC2266735 DOI: 10.1186/1477-5956-6-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2007] [Accepted: 01/28/2008] [Indexed: 02/01/2023] Open
Abstract
Background An arbitrary set of 96 human proteins was selected and tested to set-up a fully automated protein production strategy, covering all steps from DNA preparation to protein purification and analysis. The target proteins are encoded by functionally uncharacterized open reading frames (ORF) identified by the German cDNA consortium. Fusion proteins were produced in E. coli with four different fusion tags and tested in five different purification strategies depending on the respective fusion tag. The automated strategy relies on standard liquid handling and clone picking equipment. Results A robust automated strategy for the production of recombinant human proteins in E. coli was established based on a set of four different protein expression vectors resulting in NusA/His, MBP/His, GST and His-tagged proteins. The yield of soluble fusion protein was correlated with the induction temperature and the respective fusion tag. NusA/His and MBP/His fusion proteins are best expressed at low temperature (25°C), whereas the yield of soluble GST fusion proteins was higher when protein expression was induced at elevated temperature. In contrast, the induction of soluble His-tagged fusion proteins was independent of the temperature. Amylose was not found useful for affinity-purification of MBP/His fusion proteins in a high-throughput setting, and metal chelating chromatography is recommended instead. Conclusion Soluble fusion proteins can be produced in E. coli in sufficient qualities and μg/ml culture quantities for downstream applications like microarray-based assays, and studies on protein-protein interactions employing a fully automated protein expression and purification strategy. Future applications might include the optimization of experimental conditions for the large-scale production of soluble recombinant proteins from libraries of open reading frames.
Collapse
|
16
|
Towards defining the nuclear proteome. Genome Biol 2008; 9:R15. [PMID: 18211718 PMCID: PMC2395251 DOI: 10.1186/gb-2008-9-1-r15] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2007] [Revised: 12/19/2007] [Accepted: 01/23/2008] [Indexed: 11/17/2022] Open
Abstract
Direct evidence is reported for 2,568 mammalian proteins within the nuclear proteome, consisting of at least 14% of the entire proteome. Background The nucleus is a complex cellular organelle and accurately defining its protein content is essential before any systematic characterization can be considered. Results We report direct evidence for 2,568 mammalian proteins within the nuclear proteome: the nuclear subcellular localization of 1,529 proteins based on a high-throughput subcellular localization protocol of full-length proteins and an additional 1,039 proteins for which clear experimental evidence is documented in published literature. This is direct evidence that the nuclear proteome consists of at least 14% of the entire proteome. This dataset was used to evaluate computational approaches designed to identify additional nuclear proteins. Conclusion This represents direct experimental evidence that the nuclear proteome consists of at least 14% of the entire proteome. This high-quality nuclear proteome dataset was used to evaluate computational approaches designed to identify additional nuclear proteins. Based on this analysis, researchers can determine the stringency and types of lines of evidence they consider to infer the size and complement of the nuclear proteome.
Collapse
|
17
|
Sprenger J, Lynn Fink J, Karunaratne S, Hanson K, Hamilton NA, Teasdale RD. LOCATE: a mammalian protein subcellular localization database. Nucleic Acids Res 2007; 36:D230-3. [PMID: 17986452 PMCID: PMC2238969 DOI: 10.1093/nar/gkm950] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
LOCATE is a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of mouse and human proteins. Over the past 2 years, the data in LOCATE have grown substantially. The database now contains high-quality localization data for 20% of the mouse proteome and general localization annotation for nearly 36% of the mouse proteome. The proteome annotated in LOCATE is from the RIKEN FANTOM Consortium Isoform Protein Sequence sets which contains 58 128 mouse and 64 637 human protein isoforms. Other additions include computational subcellular localization predictions, automated computational classification of experimental localization image data, prediction of protein sorting signals and third party submission of literature data. Collectively, this database provides localization proteome for individual subcellular compartments that will underpin future systematic investigations of these regions. It is available at http://locate.imb.uq.edu.au/
Collapse
Affiliation(s)
- Josefine Sprenger
- ARC Centre of Excellence in Bioinformatics, Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland 4072, Australia
| | | | | | | | | | | |
Collapse
|
18
|
Lin CC, Tsai YS, Lin YS, Chiu TY, Hsiung CC, Lee MI, Simpson JC, Hsu CN. Boosting multiclass learning with repeating codes and weak detectors for protein subcellular localization. Bioinformatics 2007; 23:3374-81. [DOI: 10.1093/bioinformatics/btm497] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
19
|
Sigal A, Danon T, Cohen A, Milo R, Geva-Zatorsky N, Lustig G, Liron Y, Alon U, Perzov N. Generation of a fluorescently labeled endogenous protein library in living human cells. Nat Protoc 2007; 2:1515-27. [PMID: 17571059 DOI: 10.1038/nprot.2007.197] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We present a protocol to tag proteins expressed from their endogenous chromosomal locations in individual mammalian cells using central dogma tagging. The protocol can be used to build libraries of cell clones, each expressing one endogenous protein tagged with a fluorophore such as the yellow fluorescent protein. Each round of library generation produces 100-200 cell clones and takes about 1 month. The protocol integrates procedures for high-throughput single-cell cloning using flow cytometry, high-throughput cDNA generation and 3' rapid amplification of cDNA ends, semi-automatic protein localization screening using fluorescent microscopy and freezing cells in 96-well format.
Collapse
Affiliation(s)
- Alex Sigal
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | | | | | | | | | | | | | |
Collapse
|
20
|
del Val C, Ernst P, Falkenhahn M, Fladerer C, Glatting KH, Suhai S, Hotz-Wagenblatt A. ProtSweep, 2Dsweep and DomainSweep: protein analysis suite at DKFZ. Nucleic Acids Res 2007; 35:W444-50. [PMID: 17526514 PMCID: PMC1933246 DOI: 10.1093/nar/gkm364] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The wealth of transcript information that has been made publicly available in recent years has led to large pools of individual web sites offering access to bioinformatics software. However, finding out which services exist, what they can or cannot do, how to use them and how to feed results from one service to the next one in the right format can be very time and resource consuming, especially for non-experts. Automating this task, we present a suite of protein annotation pipelines (tasks) developed at the German Cancer Research Centre (DKFZ) oriented to protein annotation by homology (ProtSweep), by domain analysis (DomainSweep), and by secondary structure elements (2Dsweep). The aim of these tasks is to perform an exhaustive structural and functional analysis employing a wide variety of methods in combination with the most updated public databases. The three servers are available for academic users at the HUSAR open server http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar/
Collapse
Affiliation(s)
- C del Val
- DKFZ, German Cancer Research Center, Division of Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany.
| | | | | | | | | | | | | |
Collapse
|
21
|
Sauermann M, Hahne F, Schmidt C, Majety M, Rosenfelder H, Bechtel S, Huber W, Poustka A, Arlt D, Wiemann S. High-throughput flow cytometry-based assay to identify apoptosis-inducing proteins. ACTA ACUST UNITED AC 2007; 12:510-20. [PMID: 17478479 DOI: 10.1177/1087057107301271] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
After sequencing the human genome, the challenge ahead is to systematically analyze the functions and disease relation of the proteins encoded. Here the authors describe the application of a flow cytometry-based high-throughput assay to screen for apoptosis-activating proteins in transiently transfected cells. The assay is based on the detection of activated caspase-3 with a specific antibody, in cells overexpressing proteins tagged C- or N-terminally with yellow fluorescent protein. Fluorescence intensities are measured using a flow cytometer integrated with a high-throughput autosampler. The applicability of this screen has been tested in a pilot screen with 200 proteins. The candidate proteins were all verified in an independent microscopy-based nuclear fragmentation assay, finally resulting in the identification of 6 apoptosis inducers.
Collapse
Affiliation(s)
- Mamatha Sauermann
- Division of Molecular Genome Analysis, German Cancer Research Centre, Heidelberg, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Hamilton NA, Pantelic RS, Hanson K, Teasdale RD. Fast automated cell phenotype image classification. BMC Bioinformatics 2007; 8:110. [PMID: 17394669 PMCID: PMC1847687 DOI: 10.1186/1471-2105-8-110] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2006] [Accepted: 03/30/2007] [Indexed: 11/10/2022] Open
Abstract
Background The genomic revolution has led to rapid growth in sequencing of genes and proteins, and attention is now turning to the function of the encoded proteins. In this respect, microscope imaging of a protein's sub-cellular localisation is proving invaluable, and recent advances in automated fluorescent microscopy allow protein localisations to be imaged in high throughput. Hence there is a need for large scale automated computational techniques to efficiently quantify, distinguish and classify sub-cellular images. While image statistics have proved highly successful in distinguishing localisation, commonly used measures suffer from being relatively slow to compute, and often require cells to be individually selected from experimental images, thus limiting both throughput and the range of potential applications. Here we introduce threshold adjacency statistics, the essence which is to threshold the image and to count the number of above threshold pixels with a given number of above threshold pixels adjacent. These novel measures are shown to distinguish and classify images of distinct sub-cellular localization with high speed and accuracy without image cropping. Results Threshold adjacency statistics are applied to classification of protein sub-cellular localization images. They are tested on two image sets (available for download), one for which fluorescently tagged proteins are endogenously expressed in 10 sub-cellular locations, and another for which proteins are transfected into 11 locations. For each image set, a support vector machine was trained and tested. Classification accuracies of 94.4% and 86.6% are obtained on the endogenous and transfected sets, respectively. Threshold adjacency statistics are found to provide comparable or higher accuracy than other commonly used statistics while being an order of magnitude faster to calculate. Further, threshold adjacency statistics in combination with Haralick measures give accuracies of 98.2% and 93.2% on the endogenous and transfected sets, respectively. Conclusion Threshold adjacency statistics have the potential to greatly extend the scale and range of applications of image statistics in computational image analysis. They remove the need for cropping of individual cells from images, and are an order of magnitude faster to calculate than other commonly used statistics while providing comparable or better classification accuracy, both essential requirements for application to large-scale approaches.
Collapse
Affiliation(s)
- Nicholas A Hamilton
- ARC Centre in Bioinformatics, University of Queensland, Brisbane, Queensland 4072, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland 4072, Australia
- Advanced Computational Modelling Centre, University of Queensland, Brisbane, Queensland 4072, Australia
| | - Radosav S Pantelic
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland 4072, Australia
| | - Kelly Hanson
- ARC Centre in Bioinformatics, University of Queensland, Brisbane, Queensland 4072, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland 4072, Australia
| | - Rohan D Teasdale
- ARC Centre in Bioinformatics, University of Queensland, Brisbane, Queensland 4072, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland 4072, Australia
| |
Collapse
|
23
|
Mueller M, Martens L, Apweiler R. Annotating the human proteome: Beyond establishing a parts list. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2007; 1774:175-91. [PMID: 17223395 DOI: 10.1016/j.bbapap.2006.11.011] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2006] [Revised: 11/16/2006] [Accepted: 11/21/2006] [Indexed: 12/31/2022]
Abstract
The completion of the human genome has shifted the attention from deciphering the sequence to the identification and characterisation of the functional components, including genes. Improved gene prediction algorithms, together with the existing transcript and protein information, have enabled the identification of most exons in a genome. Availability of the 'parts list' has fostered the development of experimental approaches to systematically interrogate gene function on the genome, transcriptome and proteome level. Studying gene function at the protein level is vital to the understanding of how cells perform their functions as variations in protein isoforms and protein quantity which may underlie a change in phenotype can often not be deduced from sequence or transcript level genomics experiments alone. Recent advancements in proteomics have afforded technologies capable of measuring protein expression, post-translational modifications of these proteins, their subcellular localisation and assembly into complexes and pathways. Although an enormous amount of data already exists on the function of many human proteins, much of it is scattered over multiple resources. Public domain databases are therefore required to manage and collate this information and present it to the user community in both a human and machine readable manner. Of special importance here is the integration of heterogeneous data to facilitate the creation of resources that go beyond a mere parts list.
Collapse
Affiliation(s)
- Michael Mueller
- EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK
| | | | | |
Collapse
|
24
|
del Val C, Kuryshev VY, Glatting KH, Ernst P, Hotz-Wagenblatt A, Poustka A, Suhai S, Wiemann S. CAFTAN: a tool for fast mapping, and quality assessment of cDNAs. BMC Bioinformatics 2006; 7:473. [PMID: 17064411 PMCID: PMC1636072 DOI: 10.1186/1471-2105-7-473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Accepted: 10/25/2006] [Indexed: 11/10/2022] Open
Abstract
Background The German cDNA Consortium has been cloning full length cDNAs and continued with their exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA resources requires the development of strategies that are capable of a speedy selection of truly useful cDNAs from biological and experimental noise. To this end we have developed a new high-throughput analysis tool, CAFTAN, which simplifies these efforts and thus fills the gap between large-scale cDNA collections and their systematic annotation and application in functional genomics. Results CAFTAN is built around the mapping of cDNAs to the genome assembly, and the subsequent analysis of their genomic context. It uses sequence features like the presence and type of PolyA signals, inner and flanking repeats, the GC-content, splice site types, etc. All these features are evaluated in individual tests and classify cDNAs according to their sequence quality and likelihood to have been generated from fully processed mRNAs. Additionally, CAFTAN compares the coordinates of mapped cDNAs with the genomic coordinates of reference sets from public available resources (e.g., VEGA, ENSEMBL). This provides detailed information about overlapping exons and the structural classification of cDNAs with respect to the reference set of splice variants. The evaluation of CAFTAN showed that is able to correctly classify more than 85% of 5950 selected "known protein-coding" VEGA cDNAs as high quality multi- or single-exon. It identified as good 80.6 % of the single exon cDNAs and 85 % of the multiple exon cDNAs. The program is written in Perl and in a modular way, allowing the adoption of this strategy to other tasks like EST-annotation, or to extend it by adding new classification rules and new organism databases as they become available. We think that it is a very useful program for the annotation and research of unfinished genomes. Conclusion CAFTAN is a high-throughput sequence analysis tool, which performs a fast and reliable quality prediction of cDNAs. Several thousands of cDNAs can be analyzed in a short time, giving the curator/scientist a first quick overview about the quality and the already existing annotation of a set of cDNAs. It supports the rejection of low quality cDNAs and helps in the selection of likely novel splice variants, and/or completely novel transcripts for new experiments.
Collapse
Affiliation(s)
- Coral del Val
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
- DKFZ, German Cancer Research Center, Division of Molecular Genome Analysis, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
- Dept. Computer Science and Artificial Intelligence, ETSI Informatics University of Granada, C/Daniel Saucedo Aranda s/n 18071, Granada, Spain
| | - Vladimir Yurjevich Kuryshev
- DKFZ, German Cancer Research Center, Division of Molecular Genome Analysis, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Karl-Heinz Glatting
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Peter Ernst
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Agnes Hotz-Wagenblatt
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Annemarie Poustka
- DKFZ, German Cancer Research Center, Division of Molecular Genome Analysis, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Sandor Suhai
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Stefan Wiemann
- DKFZ, German Cancer Research Center, Division of Molecular Genome Analysis, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| |
Collapse
|
25
|
Sigal A, Milo R, Cohen A, Geva-Zatorsky N, Klein Y, Alaluf I, Swerdlin N, Perzov N, Danon T, Liron Y, Raveh T, Carpenter AE, Lahav G, Alon U. Dynamic proteomics in individual human cells uncovers widespread cell-cycle dependence of nuclear proteins. Nat Methods 2006; 3:525-31. [PMID: 16791210 DOI: 10.1038/nmeth892] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2006] [Accepted: 05/23/2006] [Indexed: 12/20/2022]
Abstract
We examined cell cycle-dependent changes in the proteome of human cells by systematically measuring protein dynamics in individual living cells. We used time-lapse microscopy to measure the dynamics of a random subset of 20 nuclear proteins, each tagged with yellow fluorescent protein (YFP) at its endogenous chromosomal location. We synchronized the cells in silico by aligning protein dynamics in each cell between consecutive divisions. We observed widespread (40%) cell-cycle dependence of nuclear protein levels and detected previously unknown cell cycle-dependent localization changes. This approach to dynamic proteomics can aid in discovery and accurate quantification of the extensive regulation of protein concentration and localization in individual living cells.
Collapse
Affiliation(s)
- Alex Sigal
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Manjasetty BA, Büssow K, Fieber-Erdmann M, Roske Y, Gobom J, Scheich C, Götz F, Niesen FH, Heinemann U. Crystal structure of Homo sapiens PTD012 reveals a zinc-containing hydrolase fold. Protein Sci 2006; 15:914-20. [PMID: 16522806 PMCID: PMC2242484 DOI: 10.1110/ps.052037006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The human protein PTD012 is the longer product of an alternatively spliced gene and was described to be localized in the nucleus. The X-ray structure analysis at 1.7 A resolution of PTD012 through SAD phasing reveals a monomeric protein and a novel fold. The shorter splice form was also studied and appears to be unfolded and non-functional. The structure of PTD012 displays an alphabetabetaalpha four-layer topology. A metal ion residing between the central beta-sheets is partially coordinated by three histidine residues. X-ray absorption near-edge structure (XANES) analysis identifies the PTD012-bound ion as Zn(2+). Tetrahedral coordination of the ion is completed by the carboxylate oxygen atom of an acetate molecule taken up from the crystallization buffer. The binding of Zn(2+) to PTD012 is reminiscent of zinc-containing enzymes such as carboxypeptidase, carbonic anhydrase, and beta-lactamase. Biochemical assays failed to demonstrate any of these enzyme activities in PTD012. However, PTD012 exhibits ester hydrolase activity on the substrate p-nitrophenyl acetate.
Collapse
|
27
|
Mehrle A, Rosenfelder H, Schupp I, del Val C, Arlt D, Hahne F, Bechtel S, Simpson J, Hofmann O, Hide W, Glatting KH, Huber W, Pepperkok R, Poustka A, Wiemann S. The LIFEdb database in 2006. Nucleic Acids Res 2006; 34:D415-8. [PMID: 16381901 PMCID: PMC1347501 DOI: 10.1093/nar/gkj139] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
LIFEdb () integrates data from large-scale functional genomics assays and manual cDNA annotation with bioinformatics gene expression and protein analysis. New features of LIFEdb include (i) an updated user interface with enhanced query capabilities, (ii) a configurable output table and the option to download search results in XML, (iii) the integration of data from cell-based screening assays addressing the influence of protein-overexpression on cell proliferation and (iv) the display of the relative expression (‘Electronic Northern’) of the genes under investigation using curated gene expression ontology information. LIFEdb enables researchers to systematically select and characterize genes and proteins of interest, and presents data and information via its user-friendly web-based interface.
Collapse
Affiliation(s)
- Alexander Mehrle
- Division Molecular Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Kolb-Kokocinski A, Mehrle A, Bechtel S, Simpson JC, Kioschis P, Wiemann S, Wellenreuther R, Poustka A. The systematic functional characterisation of Xq28 genes prioritises candidate disease genes. BMC Genomics 2006; 7:29. [PMID: 16503986 PMCID: PMC1431524 DOI: 10.1186/1471-2164-7-29] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2005] [Accepted: 02/17/2006] [Indexed: 12/03/2022] Open
Abstract
Background Well known for its gene density and the large number of mapped diseases, the human sub-chromosomal region Xq28 has long been a focus of genome research. Over 40 of approximately 300 X-linked diseases map to this region, and systematic mapping, transcript identification, and mutation analysis has led to the identification of causative genes for 26 of these diseases, leaving another 17 diseases mapped to Xq28, where the causative gene is still unknown. To expedite disease gene identification, we have initiated the functional characterisation of all known Xq28 genes. Results By using a systematic approach, we describe the Xq28 genes by RNA in situ hybridisation and Northern blotting of the mouse orthologs, as well as subcellular localisation and data mining of the human genes. We have developed a relational web-accessible database with comprehensive query options integrating all experimental data. Using this database, we matched gene expression patterns with affected tissues for 16 of the 17 remaining Xq28 linked diseases, where the causative gene is unknown. Conclusion By using this systematic approach, we have prioritised genes in linkage regions of Xq28-mapped diseases to an amenable number for mutational screens. Our database can be queried by any researcher performing highly specified searches including diseases not listed in OMIM or diseases that might be linked to Xq28 in the future.
Collapse
Affiliation(s)
- Anja Kolb-Kokocinski
- Division of Molecular Genome Analysis, German Cancer Research Centre (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
- Embryo Gene Expression Patterns, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Alexander Mehrle
- Division of Molecular Genome Analysis, German Cancer Research Centre (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | - Stephanie Bechtel
- Division of Molecular Genome Analysis, German Cancer Research Centre (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | - Jeremy C Simpson
- Cell Biology and Biophysics Programme, EMBL Heidelberg, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | - Petra Kioschis
- Institute of Molecular Biology and Cell Culture Technology, Mannheim University of Applied Sciences, Windeckstrasse 110, 68163 Mannheim, Germany
| | - Stefan Wiemann
- Division of Molecular Genome Analysis, German Cancer Research Centre (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | - Ruth Wellenreuther
- Division of Molecular Genome Analysis, German Cancer Research Centre (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | - Annemarie Poustka
- Division of Molecular Genome Analysis, German Cancer Research Centre (DKFZ), Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| |
Collapse
|
29
|
Fink JL, Aturaliya RN, Davis MJ, Zhang F, Hanson K, Teasdale MS, Kai C, Kawai J, Carninci P, Hayashizaki Y, Teasdale RD. LOCATE: a mouse protein subcellular localization database. Nucleic Acids Res 2006; 34:D213-7. [PMID: 16381849 PMCID: PMC1347432 DOI: 10.1093/nar/gkj069] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2005] [Revised: 10/08/2005] [Accepted: 10/08/2005] [Indexed: 11/14/2022] Open
Abstract
We present here LOCATE, a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of proteins from the FANTOM3 Isoform Protein Sequence set. Membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations of selected proteins from this set were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing >1700 peer-reviewed publications. LOCATE represents the first effort to catalogue the experimentally verified subcellular location and membrane organization of mammalian proteins using a high-throughput approach and provides localization data for approximately 40% of the mouse proteome. It is available at http://locate.imb.uq.edu.au.
Collapse
Affiliation(s)
- J Lynn Fink
- ARC Centre in Bioinformatics, University of Queensland, St Lucia, Queensland 4072, Australia.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Guilleaume B, Buness A, Schmidt C, Klimek F, Moldenhauer G, Huber W, Arlt D, Korf U, Wiemann S, Poustka A. Systematic comparison of surface coatings for protein microarrays. Proteomics 2005; 5:4705-12. [PMID: 16267812 DOI: 10.1002/pmic.200401324] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
To process large numbers of samples in parallel is one potential of protein microarrays for research and diagnostics. However, the application of protein arrays is currently hampered by the lack of comprehensive technological knowledge about the suitability of 2-D and 3-D slide surface coatings. We have performed a systematic study to analyze how both surface types perform in combination with different fluorescent dyes to generate significant and reproducible data. In total, we analyzed more than 100 slides containing 1152 spots each. Slides were probed against different monoclonal antibodies (mAbs) and recombinant fusion proteins. We found two surface coatings to be most suitable for protein and antibody (Ab) immobilization. These were further subjected to quantitative analyses by evaluating intraslide and slide-to-slide reproducibilities, and the linear range of target detection. In summary, we demonstrate that only suitable combinations of surface and fluorescent dyes allow the generation of highly reproducible data.
Collapse
Affiliation(s)
- Birgit Guilleaume
- Department of Molecular Genome Analysis, German Cancer Research Center, 69120 Heidelberg, Germany.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Arlt D, Huber W, Liebel U, Schmidt C, Majety M, Sauermann M, Rosenfelder H, Bechtel S, Mehrle A, Bannasch D, Schupp I, Seiler M, Simpson JC, Hahne F, Moosmayer P, Ruschhaupt M, Guilleaume B, Wellenreuther R, Pepperkok R, Sültmann H, Poustka A, Wiemann S. Functional profiling: from microarrays via cell-based assays to novel tumor relevant modulators of the cell cycle. Cancer Res 2005; 65:7733-42. [PMID: 16140941 DOI: 10.1158/0008-5472.can-05-0642] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Cancer transcription microarray studies commonly deliver long lists of "candidate" genes that are putatively associated with the respective disease. For many of these genes, no functional information, even less their relevance in pathologic conditions, is established as they were identified in large-scale genomics approaches. Strategies and tools are thus needed to distinguish genes and proteins with mere tumor association from those causally related to cancer. Here, we describe a functional profiling approach, where we analyzed 103 previously uncharacterized genes in cancer relevant assays that probed their effects on DNA replication (cell proliferation). The genes had previously been identified as differentially expressed in genome-wide microarray studies of tumors. Using an automated high-throughput assay with single-cell resolution, we discovered seven activators and nine repressors of DNA replication. These were further characterized for effects on extracellular signal-regulated kinase 1/2 (ERK1/2) signaling (G1-S transition) and anchorage-independent growth (tumorigenicity). One activator and one inhibitor protein of ERK1/2 activation and three repressors of anchorage-independent growth were identified. Data from tumor and functional profiling make these proteins novel prime candidates for further in-depth study of their roles in cancer development and progression. We have established a novel functional profiling strategy that links genomics to cell biology and showed its potential for discerning cancer relevant modulators of the cell cycle in the candidate lists from microarray studies.
Collapse
Affiliation(s)
- Dorit Arlt
- Division of Molecular Genome Analysis, German Cancer Research Center, Heidelberg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Wiemann S, Kolb-Kokocinski A, Poustka A. Alternative pre-mRNA processing regulates cell-type specific expression of the IL4l1 and NUP62 genes. BMC Biol 2005; 3:16. [PMID: 16029492 PMCID: PMC1198218 DOI: 10.1186/1741-7007-3-16] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2005] [Accepted: 07/19/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Given the complexity of higher organisms, the number of genes encoded by their genomes is surprisingly small. Tissue specific regulation of expression and splicing are major factors enhancing the number of the encoded products. Commonly these mechanisms are intragenic and affect only one gene. RESULTS Here we provide evidence that the IL4I1 gene is specifically transcribed from the apparent promoter of the upstream NUP62 gene, and that the first two exons of NUP62 are also contained in the novel IL4I1_2 variant. While expression of IL4I1 driven from its previously described promoter is found mostly in B cells, the expression driven by the NUP62 promoter is restricted to cells in testis (Sertoli cells) and in the brain (e.g., Purkinje cells). Since NUP62 is itself ubiquitously expressed, the IL4I1_2 variant likely derives from cell type specific alternative pre-mRNA processing. CONCLUSION Comparative genomics suggest that the promoter upstream of the NUP62 gene originally belonged to the IL4I1 gene and was later acquired by NUP62 via insertion of a retroposon. Since both genes are apparently essential, the promoter had to serve two genes afterwards. Expression of the IL4I1 gene from the "NUP62" promoter and the tissue specific involvement of the pre-mRNA processing machinery to regulate expression of two unrelated proteins indicate a novel mechanism of gene regulation.
Collapse
Affiliation(s)
- Stefan Wiemann
- Molecular Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 580, Heidelberg, 69120, Germany
| | - Anja Kolb-Kokocinski
- Molecular Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 580, Heidelberg, 69120, Germany
| | - Annemarie Poustka
- Molecular Genome Analysis, German Cancer Research Center, Im Neuenheimer Feld 580, Heidelberg, 69120, Germany
| |
Collapse
|
33
|
Büssow K, Scheich C, Sievert V, Harttig U, Schultz J, Simon B, Bork P, Lehrach H, Heinemann U. Structural genomics of human proteins--target selection and generation of a public catalogue of expression clones. Microb Cell Fact 2005; 4:21. [PMID: 15998469 PMCID: PMC1250228 DOI: 10.1186/1475-2859-4-21] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2005] [Accepted: 07/05/2005] [Indexed: 11/12/2022] Open
Abstract
Background The availability of suitable recombinant protein is still a major bottleneck in protein structure analysis. The Protein Structure Factory, part of the international structural genomics initiative, targets human proteins for structure determination. It has implemented high throughput procedures for all steps from cloning to structure calculation. This article describes the selection of human target proteins for structure analysis, our high throughput cloning strategy, and the expression of human proteins in Escherichia coli host cells. Results and Conclusion Protein expression and sequence data of 1414 E. coli expression clones representing 537 different proteins are presented. 139 human proteins (18%) could be expressed and purified in soluble form and with the expected size. All E. coli expression clones are publicly available to facilitate further functional characterisation of this set of human proteins.
Collapse
Affiliation(s)
- Konrad Büssow
- Protein Structure Factory, Heubnerweg 6, 14059 Berlin, Germany
- Max-Planck-Institut für Molekulare Genetik, Ihnestr. 73, 14195 Berlin, Germany
| | - Christoph Scheich
- Protein Structure Factory, Heubnerweg 6, 14059 Berlin, Germany
- Max-Planck-Institut für Molekulare Genetik, Ihnestr. 73, 14195 Berlin, Germany
| | - Volker Sievert
- Protein Structure Factory, Heubnerweg 6, 14059 Berlin, Germany
- Max-Planck-Institut für Molekulare Genetik, Ihnestr. 73, 14195 Berlin, Germany
| | - Ulrich Harttig
- Protein Structure Factory, Heubnerweg 6, 14059 Berlin, Germany
- RZPD German Resource Center for Genome Research GmbH, Heubnerweg 6, 14059 Berlin, Germany
- DIFE, Arthur-Scheunert-Allee 114–116, 14558 Nuthetal, Germany
| | - Jörg Schultz
- EMBL Heidelberg, Meyerhofstr. 1, 69117 Heidelberg, Germany
- Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany
| | - Bernd Simon
- EMBL Heidelberg, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Peer Bork
- EMBL Heidelberg, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Hans Lehrach
- Protein Structure Factory, Heubnerweg 6, 14059 Berlin, Germany
- Max-Planck-Institut für Molekulare Genetik, Ihnestr. 73, 14195 Berlin, Germany
| | - Udo Heinemann
- Protein Structure Factory, Heubnerweg 6, 14059 Berlin, Germany
- Max-Delbrück-Centrum für Molekulare Medizin, Robert-Rössle-Str. 10, 13092 Berlin, Germany
- Institut für Chemie/Kristallographie, Freie Universität, Takustr. 6, 14195 Berlin, Germany
| |
Collapse
|
34
|
McKinney JL, Murdoch DJ, Wang J, Robinson J, Biltcliffe C, Khan HMR, Walker PM, Savage J, Skerjanc I, Hegele RA. Venn analysis as part of a bioinformatic approach to prioritize expressed sequence tags from cardiac libraries. Clin Biochem 2005; 37:953-60. [PMID: 15498521 DOI: 10.1016/j.clinbiochem.2004.07.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2004] [Revised: 07/06/2004] [Accepted: 07/24/2004] [Indexed: 11/22/2022]
Abstract
OBJECTIVES We needed to sort expressed sequence tags (ESTs) from human cardiac expression libraries. DESIGN AND METHODS We annotated DNA sequence text files of 35,152 cardiac ESTs using our search and annotation tool called Multiblast.pl. We generated lists of the most prevalent ESTs in each library, and using a novel Venn tool, we grouped ESTs that were common to all or exclusive to particular libraries. RESULTS Hypothetical protein KIAA0553 was expressed 120 times among 917 ESTs from an adult cardiac library (13.1%) compared only once among 8075 ESTs from fetal cardiac libraries (P < 10(-114)), this was confirmed using Northern analysis. We collated biochemical features of KIAA0553 and determined DNA polymorphism frequencies. We also used the Venn tool to specify genes that were uniquely expressed in hypertrophic cardiomyocytes. CONCLUSIONS Annotating ESTs and sorting them using Venn analysis can help specify new candidate disease genes from the current lists of "hypothetical proteins".
Collapse
Affiliation(s)
- James L McKinney
- Vascular Biology Group and London Regional Genomics Centre, Robarts Research Institute, London, Ontario, Canada N6A 5K8
| | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Ding L, Sabo A, Berkowicz N, Meyer RR, Shotland Y, Johnson MR, Pepin KH, Wilson RK, Spieth J. EAnnot: a genome annotation tool using experimental evidence. Genome Res 2005; 14:2503-9. [PMID: 15574829 PMCID: PMC534675 DOI: 10.1101/gr.3152604] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The sequence of any genome becomes most useful for biological experimentation when a complete and accurate gene set is available. Gene prediction programs offer an efficient way to generate an automated gene set. Manual annotation, when performed by experienced annotators, is more accurate and complete than automated annotation. However, it is a laborious and expensive process, and by its nature, introduces a degree of variability not found with automated annotation. EAnnot (Electronic Annotation) is a program originally developed for manually annotating the human genome. It combines the latest bioinformatics tools to extract and analyze a wide range of publicly available data in order to achieve fast and reliable automatic gene prediction and annotation. EAnnot builds gene models based on mRNA, EST, and protein alignments to genomic sequence, attaches supporting evidence to the corresponding genes, identifies pseudogenes, and locates poly(A) sites and signals. Here, we compare manual annotation of human chromosome 6 with annotation performed by EAnnot in order to assess the latter's accuracy. EAnnot can readily be applied to manual annotation of other eukaryotic genomes and can be used to rapidly obtain an automated gene set.
Collapse
Affiliation(s)
- Li Ding
- Genome Sequencing Center, Washington University School of Medicine, St. Louis, Missouri 63110, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Hidden localization motifs: naturally occurring peroxisomal targeting signals in non-peroxisomal proteins. Genome Biol 2004; 5:R97. [PMID: 15575971 PMCID: PMC545800 DOI: 10.1186/gb-2004-5-12-r97] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2004] [Revised: 10/11/2004] [Accepted: 11/09/2004] [Indexed: 11/13/2022] Open
Abstract
Functional but silent peroxisomal targeting signals have been found in non- peroxisomal proteins. This discovery has important implications for sequence-based signal prediction and for evolution. Background Can sequence segments coding for subcellular targeting or for posttranslational modifications occur in proteins that are not substrates in either of these processes? Although considerable effort has been invested in achieving low false-positive prediction rates, even accurate sequence-analysis tools for the recognition of these motifs generate a small but noticeable number of protein hits that lack the appropriate biological context but cannot be rationalized as false positives. Results We show that the carboxyl termini of a set of definitely non-peroxisomal proteins with predicted peroxisomal targeting signals interact with the peroxisomal matrix protein receptor peroxin 5 (PEX5) in a yeast two-hybrid test. Moreover, we show that examples of these proteins - chicken lysozyme, human tyrosinase and the yeast mitochondrial ribosomal protein L2 (encoded by MRP7) - are imported into peroxisomes in vivo if their original sorting signals are disguised. We also show that even prokaryotic proteins can contain peroxisomal targeting sequences. Conclusions Thus, functional localization signals can evolve in unrelated protein sequences as a result of neutral mutations, and subcellular targeting is hierarchically organized, with signal accessibility playing a decisive role. The occurrence of silent functional motifs in unrelated proteins is important for the development of sequence-based function prediction tools and the interpretation of their results. Silent functional signals have the potential to acquire importance in future evolutionary scenarios and in pathological conditions.
Collapse
|
37
|
Wiemann S, Arlt D, Huber W, Wellenreuther R, Schleeger S, Mehrle A, Bechtel S, Sauermann M, Korf U, Pepperkok R, Sültmann H, Poustka A. From ORFeome to biology: a functional genomics pipeline. Genome Res 2004; 14:2136-44. [PMID: 15489336 PMCID: PMC528930 DOI: 10.1101/gr.2576704] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
As several model genomes have been sequenced, the elucidation of protein function is the next challenge toward the understanding of biological processes in health and disease. We have generated a human ORFeome resource and established a functional genomics and proteomics analysis pipeline to address the major topics in the post-genome-sequencing era: the identification of human genes and splice forms, and the determination of protein localization, activity, and interaction. Combined with the understanding of when and where gene products are expressed in normal and diseased conditions, we create information that is essential for understanding the interplay of genes and proteins in the complex biological network. We have implemented bioinformatics tools and databases that are suitable to store, analyze, and integrate the different types of data from high-throughput experiments and to include further annotation that is based on external information. All information is presented in a Web database (http://www.dkfz.de/LIFEdb). It is exploited for the identification of disease-relevant genes and proteins for diagnosis and therapy.
Collapse
Affiliation(s)
- Stefan Wiemann
- Molecular Genome Analysis, German Cancer Research Center, 69120 Heidelberg, Germany.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
del Val C, Mehrle A, Falkenhahn M, Seiler M, Glatting KH, Poustka A, Suhai S, Wiemann S. High-throughput protein analysis integrating bioinformatics and experimental assays. Nucleic Acids Res 2004; 32:742-8. [PMID: 14762202 PMCID: PMC373366 DOI: 10.1093/nar/gkh257] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The wealth of transcript information that has been made publicly available in recent years requires the development of high-throughput functional genomics and proteomics approaches for its analysis. Such approaches need suitable data integration procedures and a high level of automation in order to gain maximum benefit from the results generated. We have designed an automatic pipeline to analyse annotated open reading frames (ORFs) stemming from full-length cDNAs produced mainly by the German cDNA Consortium. The ORFs are cloned into expression vectors for use in large-scale assays such as the determination of subcellular protein localization or kinase reaction specificity. Additionally, all identified ORFs undergo exhaustive bioinformatic analysis such as similarity searches, protein domain architecture determination and prediction of physicochemical characteristics and secondary structure, using a wide variety of bioinformatic methods in combination with the most up-to-date public databases (e.g. PRINTS, BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimental results and from the bioinformatic analysis are integrated and stored in a relational database (MS SQL-Server), which makes it possible for researchers to find answers to biological questions easily, thereby speeding up the selection of targets for further analysis. The designed pipeline constitutes a new automatic approach to obtaining and administrating relevant biological data from high-throughput investigations of cDNAs in order to systematically identify and characterize novel genes, as well as to comprehensively describe the function of the encoded proteins.
Collapse
Affiliation(s)
- Coral del Val
- Division of Molecular Biophysics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany.
| | | | | | | | | | | | | | | |
Collapse
|