1
|
Rajab SAS, Andersen LK, Kenter LW, Berlinsky DL, Borski RJ, McGinty AS, Ashwell CM, Ferket PR, Daniels HV, Reading BJ. Combinatorial metabolomic and transcriptomic analysis of muscle growth in hybrid striped bass (female white bass Morone chrysops x male striped bass M. saxatilis). BMC Genomics 2024; 25:580. [PMID: 38858615 PMCID: PMC11165755 DOI: 10.1186/s12864-024-10325-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 04/19/2024] [Indexed: 06/12/2024] Open
Abstract
BACKGROUND Understanding growth regulatory pathways is important in aquaculture, fisheries, and vertebrate physiology generally. Machine learning pattern recognition and sensitivity analysis were employed to examine metabolomic small molecule profiles and transcriptomic gene expression data generated from liver and white skeletal muscle of hybrid striped bass (white bass Morone chrysops x striped bass M. saxatilis) representative of the top and bottom 10 % by body size of a production cohort. RESULTS Larger fish (good-growth) had significantly greater weight, total length, hepatosomatic index, and specific growth rate compared to smaller fish (poor-growth) and also had significantly more muscle fibers of smaller diameter (≤ 20 µm diameter), indicating active hyperplasia. Differences in metabolomic pathways included enhanced energetics (glycolysis, citric acid cycle) and amino acid metabolism in good-growth fish, and enhanced stress, muscle inflammation (cortisol, eicosanoids) and dysfunctional liver cholesterol metabolism in poor-growth fish. The majority of gene transcripts identified as differentially expressed between groups were down-regulated in good-growth fish. Several molecules associated with important growth-regulatory pathways were up-regulated in muscle of fish that grew poorly: growth factors including agt and agtr2 (angiotensins), nicotinic acid (which stimulates growth hormone production), gadd45b, rgl1, zfp36, cebpb, and hmgb1; insulin-like growth factor signaling (igfbp1 and igf1); cytokine signaling (socs3, cxcr4); cell signaling (rgs13, rundc3a), and differentiation (rhou, mmp17, cd22, msi1); mitochondrial uncoupling proteins (ucp3, ucp2); and regulators of lipid metabolism (apoa1, ldlr). Growth factors pttg1, egfr, myc, notch1, and sirt1 were notably up-regulated in muscle of good-growing fish. CONCLUSION A combinatorial pathway analysis using metabolomic and transcriptomic data collectively suggested promotion of cell signaling, proliferation, and differentiation in muscle of good-growth fish, whereas muscle inflammation and apoptosis was observed in poor-growth fish, along with elevated cortisol (an anti-inflammatory hormone), perhaps related to muscle wasting, hypertrophy, and inferior growth. These findings provide important biomarkers and mechanisms by which growth is regulated in fishes and other vertebrates as well.
Collapse
Affiliation(s)
- Sarah A S Rajab
- Department of Applied Ecology, North Carolina State University, 100 Eugene Brooks Avenue, Box 7617, Raleigh, NC, 27695, USA
| | - Linnea K Andersen
- Department of Applied Ecology, North Carolina State University, 100 Eugene Brooks Avenue, Box 7617, Raleigh, NC, 27695, USA
| | - Linas W Kenter
- Department of Agriculture, Nutrition, and Food Systems, University of New Hampshire, Durham, NH, USA
| | - David L Berlinsky
- Department of Agriculture, Nutrition, and Food Systems, University of New Hampshire, Durham, NH, USA
| | - Russell J Borski
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
| | - Andrew S McGinty
- North Carolina State University, Pamlico Aquaculture Field Laboratory, Aurora, NC, USA
| | - Christopher M Ashwell
- Prestage Department of Poultry Science, North Carolina State University, Raleigh, NC, USA
| | - Peter R Ferket
- Prestage Department of Poultry Science, North Carolina State University, Raleigh, NC, USA
| | - Harry V Daniels
- Department of Applied Ecology, North Carolina State University, 100 Eugene Brooks Avenue, Box 7617, Raleigh, NC, 27695, USA
| | - Benjamin J Reading
- Department of Applied Ecology, North Carolina State University, 100 Eugene Brooks Avenue, Box 7617, Raleigh, NC, 27695, USA.
- North Carolina State University, Pamlico Aquaculture Field Laboratory, Aurora, NC, USA.
| |
Collapse
|
2
|
Franz A, Weber AI, Preußner M, Dimos N, Stumpf A, Ji Y, Moreno-Velasquez L, Voigt A, Schulz F, Neumann A, Kuropka B, Kühn R, Urlaub H, Schmitz D, Wahl MC, Heyd F. Branch point strength controls species-specific CAMK2B alternative splicing and regulates LTP. Life Sci Alliance 2023; 6:6/3/e202201826. [PMID: 36543542 PMCID: PMC9772828 DOI: 10.26508/lsa.202201826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 12/01/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022] Open
Abstract
Regulation and functionality of species-specific alternative splicing has remained enigmatic to the present date. Calcium/calmodulin-dependent protein kinase IIβ (CaMKIIβ) is expressed in several splice variants and plays a key role in learning and memory. Here, we identify and characterize several primate-specific CAMK2B splice isoforms, which show altered kinetic properties and changes in substrate specificity. Furthermore, we demonstrate that primate-specific CAMK2B alternative splicing is achieved through branch point weakening during evolution. We show that reducing branch point and splice site strengths during evolution globally renders constitutive exons alternative, thus providing novel mechanistic insight into cis-directed species-specific alternative splicing regulation. Using CRISPR/Cas9, we introduce a weaker, human branch point sequence into the mouse genome, resulting in strongly altered Camk2b splicing in the brains of mutant mice. We observe a strong impairment of long-term potentiation in CA3-CA1 synapses of mutant mice, thus connecting branch point-controlled CAMK2B alternative splicing with a fundamental function in learning and memory.
Collapse
Affiliation(s)
- Andreas Franz
- Freie Universität Berlin, Institute of Chemistry and Biochemistry, Laboratory of RNA Biochemistry, Berlin, Germany.,Freie Universität Berlin, Institute of Chemistry and Biochemistry, Laboratory of Structural Biochemistry, Berlin, Germany
| | - A Ioana Weber
- Freie Universität Berlin, Institute of Chemistry and Biochemistry, Laboratory of RNA Biochemistry, Berlin, Germany
| | - Marco Preußner
- Freie Universität Berlin, Institute of Chemistry and Biochemistry, Laboratory of RNA Biochemistry, Berlin, Germany
| | - Nicole Dimos
- Freie Universität Berlin, Institute of Chemistry and Biochemistry, Laboratory of Structural Biochemistry, Berlin, Germany
| | - Alexander Stumpf
- Neuroscience Research Centre (NWFZ), Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Yanlong Ji
- Bioanalytical Mass Spectrometry Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany.,Hematology/Oncology, Department of Medicine II, Johann Wolfgang Goethe University, Frankfurt am Main, Germany.,Frankfurt Cancer Institute, Goethe University, Frankfurt am Main, Germany
| | - Laura Moreno-Velasquez
- Neuroscience Research Centre (NWFZ), Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Anne Voigt
- Neuroscience Research Centre (NWFZ), Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Frederic Schulz
- Freie Universität Berlin, Institute of Chemistry and Biochemistry, Laboratory of RNA Biochemistry, Berlin, Germany
| | - Alexander Neumann
- Freie Universität Berlin, Institute of Chemistry and Biochemistry, Laboratory of RNA Biochemistry, Berlin, Germany
| | - Benno Kuropka
- Freie Universität Berlin, Mass Spectrometry Core Facility (BioSupraMol), Berlin, Germany
| | - Ralf Kühn
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Genome Engineering & Disease Models, Berlin, Germany
| | - Henning Urlaub
- Bioanalytical Mass Spectrometry Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany.,Institute of Clinical Chemistry, University Medical Center Göttingen, Göttingen, Germany
| | - Dietmar Schmitz
- Neuroscience Research Centre (NWFZ), Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Markus C Wahl
- Freie Universität Berlin, Institute of Chemistry and Biochemistry, Laboratory of Structural Biochemistry, Berlin, Germany.,Helmholtz-Zentrum Berlin für Materialien und Energie, Macromolecular Crystallography, Berlin, Germany
| | - Florian Heyd
- Freie Universität Berlin, Institute of Chemistry and Biochemistry, Laboratory of RNA Biochemistry, Berlin, Germany
| |
Collapse
|
3
|
Shi Y, Yao G, Zhang H, Jia H, Xiong P, He M. Proteome and Transcriptome Analysis of Gonads Reveals Intersex in Gigantidas haimaensis. BMC Genomics 2022; 23:174. [PMID: 35240981 PMCID: PMC8892766 DOI: 10.1186/s12864-022-08407-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 02/22/2022] [Indexed: 11/19/2022] Open
Abstract
Sex has proven to be one of the most intriguing areas of research across evolution, development, and ecology. Intersex or sex change occurs frequently in molluscs. The deep-sea mussel Gigantidas haimaensis often dominates within Haima cold seep ecosystems, but details of their reproduction remain unknown. Herein, we conducted a combined proteomic and transcriptomic analysis of G. haimaensis gonads to provide a systematic understanding of sexual development in deep-sea bivalves. A total of 2,452 out of 42,238 genes (5.81%) and 288 out of 7,089 proteins (4.06%) were significantly differentially expressed between ovaries and testes with a false discovery rate (FDR) <0.05. Candidate genes involved in sexual development were identified; among 12 differentially expressed genes between sexes, four ovary-biased genes (β-catenin, fem-1, forkhead box L2 and membrane progestin receptor α) were expressed significantly higher in males than females. Combining histological characteristics, we speculate that the males maybe intersex undergoing sex change, and implied that these genes may be involved in the process of male testis converting into female gonads in G. haimaensis. The results suggest that this adaptation may be based on local environmental factors, sedentary lifestyles, and patchy distribution, and sex change may facilitate adaptation to a changing environment and expansion of the population. The findings provide a valuable genetic resource to better understand the mechanisms of sex change and survival strategies in deep-sea bivalves.
Collapse
Affiliation(s)
- Yu Shi
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, 164 West Xingang Road, Guangzhou, 510301, China.,Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, 511458, China
| | - Gaoyou Yao
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, 164 West Xingang Road, Guangzhou, 510301, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hua Zhang
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, 164 West Xingang Road, Guangzhou, 510301, China.,Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, 511458, China
| | - Huixia Jia
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, 164 West Xingang Road, Guangzhou, 510301, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Panpan Xiong
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, 164 West Xingang Road, Guangzhou, 510301, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Maoxian He
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, 164 West Xingang Road, Guangzhou, 510301, China. .,Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, 511458, China.
| |
Collapse
|
4
|
Pillai J, Chincholkar T, Dixit R, Pandey M. A systematic review of proteomic biomarkers in oral squamous cell cancer. World J Surg Oncol 2021; 19:315. [PMID: 34711249 PMCID: PMC8555221 DOI: 10.1186/s12957-021-02423-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 10/06/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Head and neck squamous cell cancer (HNSCC) is the most common cancer associated with chewing tobacco, in the world. As this is divided in to sites and subsites, it does not make it to top 10 cancers. The most common subsite is the oral cancer. At the time of diagnosis, more than 50% of patients with oral squamous cell cancers (OSCC) had advanced disease, indicating the lack of availability of early detection and risk assessment biomarkers. The new protein biomarker development and discovery will aid in early diagnosis and treatment which lead to targeted treatment and ultimately a good prognosis. METHODS This systematic review was performed as per PRISMA guidelines. All relevant studies assessing characteristics of oral cancer and proteomics were considered for analysis. Only human studies published in English were included, and abstracts, incomplete articles, and cell line or animal studies were excluded. RESULTS A total of 308 articles were found, of which 112 were found to be relevant after exclusion. The present review focuses on techniques of cancer proteomics and discovery of biomarkers using these techniques. The signature of protein expression may be used to predict drug response and clinical course of disease and could be used to individualize therapy with such knowledge. CONCLUSIONS Prospective use of these markers in the clinical setting will enable early detection, prediction of response to treatment, improvement in treatment selection, and early detection of tumor recurrence for disease monitoring. However, most of these markers for OSCC are yet to be validated.
Collapse
Affiliation(s)
| | | | - Ruhi Dixit
- Department of Surgical Oncology, Institute of Medical Sciences, Banaras Hindu University, Varanasi, 221 005, India
| | - Manoj Pandey
- Department of Surgical Oncology, Institute of Medical Sciences, Banaras Hindu University, Varanasi, 221 005, India.
| |
Collapse
|
5
|
Prensner JR, Enache OM, Luria V, Krug K, Clauser KR, Dempster JM, Karger A, Wang L, Stumbraite K, Wang VM, Botta G, Lyons NJ, Goodale A, Kalani Z, Fritchman B, Brown A, Alan D, Green T, Yang X, Jaffe JD, Roth JA, Piccioni F, Kirschner MW, Ji Z, Root DE, Golub TR. Noncanonical open reading frames encode functional proteins essential for cancer cell survival. Nat Biotechnol 2021; 39:697-704. [PMID: 33510483 PMCID: PMC8195866 DOI: 10.1038/s41587-020-00806-2] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 12/16/2020] [Indexed: 01/30/2023]
Abstract
Although genomic analyses predict many noncanonical open reading frames (ORFs) in the human genome, it is unclear whether they encode biologically active proteins. Here we experimentally interrogated 553 candidates selected from noncanonical ORF datasets. Of these, 57 induced viability defects when knocked out in human cancer cell lines. Following ectopic expression, 257 showed evidence of protein expression and 401 induced gene expression changes. Clustered regularly interspaced short palindromic repeat (CRISPR) tiling and start codon mutagenesis indicated that their biological effects required translation as opposed to RNA-mediated effects. We found that one of these ORFs, G029442-renamed glycine-rich extracellular protein-1 (GREP1)-encodes a secreted protein highly expressed in breast cancer, and its knockout in 263 cancer cell lines showed preferential essentiality in breast cancer-derived lines. The secretome of GREP1-expressing cells has an increased abundance of the oncogenic cytokine GDF15, and GDF15 supplementation mitigated the growth-inhibitory effect of GREP1 knockout. Our experiments suggest that noncanonical ORFs can express biologically active proteins that are potential therapeutic targets.
Collapse
Affiliation(s)
- John R. Prensner
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215,Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115
| | - Oana M. Enache
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Victor Luria
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Karsten Krug
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Karl R. Clauser
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Amir Karger
- IT-Research Computing, Harvard Medical School, Boston, MA, USA, 02115
| | - Li Wang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Vickie M. Wang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Ginevra Botta
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Amy Goodale
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Zohra Kalani
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | | | - Adam Brown
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Douglas Alan
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Thomas Green
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Xiaoping Yang
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Jacob D. Jaffe
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Present address: Inzen Therapeutics, Cambridge, MA, 02139, USA
| | | | - Federica Piccioni
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Present address: Merck Research Laboratories, Boston, MA, 02115, USA
| | - Marc W. Kirschner
- Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
| | - Zhe Ji
- Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611,Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL 60628
| | - David E. Root
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA
| | - Todd R. Golub
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.,Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02215,Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, 02115,Corresponding author: Address correspondence to: Todd R. Golub, MD, Chief Scientific Officer, Broad Institute of Harvard and MIT, Room 4013, 415 Main Street, Cambridge, MA, 02142, , Phone: 617-714-7050
| |
Collapse
|
6
|
Czajeczny D, Kabzińska K, Wójciak RW. FROM GREAT GENETICS TO NEUROPSYCHOLOGY – OUTLINE OF THE RESEARCH ON THE ASSOCIATION BETWEEN MICROBIOTA AND HUMAN BEHAVIOUR. POSTĘPY MIKROBIOLOGII - ADVANCEMENTS OF MICROBIOLOGY 2020. [DOI: 10.21307/pm-2020.59.1.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
7
|
Abstract
High-throughput sequencing-based methods and their applications in the study of transcriptomes have revolutionized our understanding of alternative splicing. Networks of functionally coordinated and biologically important alternative splicing events continue to be discovered in an ever-increasing diversity of cell types in the context of physiologically normal and disease states. These studies have been complemented by efforts directed at defining sequence codes governing splicing and their cognate trans-acting factors, which have illuminated important combinatorial principles of regulation. Additional studies have revealed critical roles of position-dependent, multivalent protein-RNA interactions that direct splicing outcomes. Investigations of evolutionary changes in RNA binding proteins, splice variants, and associated cis elements have further shed light on the emergence, mechanisms, and functions of splicing networks. Progress in these areas has emphasized the need for a coordinated, community-based effort to systematically address the functions of individual splice variants associated with normal and disease biology.
Collapse
|
8
|
Innovating the Concept and Practice of Two-Dimensional Gel Electrophoresis in the Analysis of Proteomes at the Proteoform Level. Proteomes 2019; 7:proteomes7040036. [PMID: 31671630 PMCID: PMC6958347 DOI: 10.3390/proteomes7040036] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/15/2019] [Accepted: 10/28/2019] [Indexed: 12/21/2022] Open
Abstract
Two-dimensional gel electrophoresis (2DE) is an important and well-established technical platform enabling extensive top-down proteomic analysis. However, the long-held but now largely outdated conventional concepts of 2DE have clearly impacted its application to in-depth investigations of proteomes at the level of protein species/proteoforms. It is time to popularize a new concept of 2DE for proteomics. With the development and enrichment of the proteome concept, any given “protein” is now recognized to consist of a series of proteoforms. Thus, it is the proteoform, rather than the canonical protein, that is the basic unit of a proteome, and each proteoform has a specific isoelectric point (pI) and relative mass (Mr). Accordingly, using 2DE, each proteoform can routinely be resolved and arrayed according to its different pI and Mr. Each detectable spot contains multiple proteoforms derived from the same gene, as well as from different genes. Proteoforms derived from the same gene are distributed into different spots in a 2DE pattern. High-resolution 2DE is thus actually an initial level of separation to address proteome complexity and is effectively a pre-fractionation method prior to analysis using mass spectrometry (MS). Furthermore, stable isotope-labeled 2DE coupled with high-sensitivity liquid chromatography-tandem MS (LC-MS/MS) has tremendous potential for the large-scale detection, identification, and quantification of the proteoforms that constitute proteomes.
Collapse
|
9
|
Hatje K, Mühlhausen S, Simm D, Kollmar M. The Protein-Coding Human Genome: Annotating High-Hanging Fruits. Bioessays 2019; 41:e1900066. [PMID: 31544971 DOI: 10.1002/bies.201900066] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 08/07/2019] [Indexed: 12/19/2022]
Abstract
The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.
Collapse
Affiliation(s)
- Klas Hatje
- Roche Pharmaceutical Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstr. 124, 4070, Basel, Switzerland
| | - Stefanie Mühlhausen
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| | - Dominic Simm
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany.,Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany
| | - Martin Kollmar
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| |
Collapse
|
10
|
Zhang Z, Ruan H, Liu CJ, Ye Y, Gong J, Diao L, Guo AY, Han L. tRic: a user-friendly data portal to explore the expression landscape of tRNAs in human cancers. RNA Biol 2019; 17:1674-1679. [PMID: 31432762 DOI: 10.1080/15476286.2019.1657744] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Transfer RNAs (tRNAs) play critical roles in human cancer. Currently, no database provides the expression landscape and clinical relevance of tRNAs across a variety of human cancers. Utilizing miRNA-seq data from The Cancer Genome Atlas, we quantified the relative expression of tRNA genes and merged them into the codon level and amino level across 31 cancer types. The expression of tRNAs is associated with clinical features of patient smoking history and overall survival, and disease stage, subtype, and grade. We further analysed codon frequency and amino acid frequency for each protein coding gene and linked alterations of tRNA expression with protein translational efficiency. We include these data resources in a user-friendly data portal, tRic (tRNA in cancer, https://hanlab.uth.edu/tRic/ or http://bioinfo.life.hust.edu.cn/tRic/), which can be of significant interest to the research community.
Collapse
Affiliation(s)
- Zhao Zhang
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston , Houston, TX, USA
| | - Hang Ruan
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston , Houston, TX, USA
| | - Chun-Jie Liu
- Department of Bioinformatics and Systems Biology, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology , Wuhan, Hubei, PR China
| | - Youqiong Ye
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston , Houston, TX, USA
| | - Jing Gong
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston , Houston, TX, USA
| | - Lixia Diao
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center , Houston, TX, USA
| | - An-Yuan Guo
- Department of Bioinformatics and Systems Biology, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology , Wuhan, Hubei, PR China
| | - Leng Han
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston , Houston, TX, USA.,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston , Houston, TX, USA
| |
Collapse
|
11
|
Abstract
Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function. One of the most fundamental queries is the extent to which the combinations of DNA-, RNA- and PTM-level variations explode the complexity of the human proteome. Here, we outline what we know from current databases and measurement strategies including mass spectrometry-based proteomics. In doing so, we examine prevailing notions about the number of modifications displayed on human proteins and how they combine to generate the protein diversity underlying health and disease. We frame central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today. We use this framework to assess existing data and to ask the question, "How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?" We also explore prospects for improving measurements to better regularize protein-level biology and efficiently associate PTMs to function and phenotype.
Collapse
|
12
|
Zhang Z, Ye Y, Gong J, Ruan H, Liu CJ, Xiang Y, Cai C, Guo AY, Ling J, Diao L, Weinstein JN, Han L. Global analysis of tRNA and translation factor expression reveals a dynamic landscape of translational regulation in human cancers. Commun Biol 2018; 1:234. [PMID: 30588513 PMCID: PMC6303286 DOI: 10.1038/s42003-018-0239-8] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 11/27/2018] [Indexed: 12/14/2022] Open
Abstract
The protein translational system, including transfer RNAs (tRNAs) and several categories of enzymes, plays a key role in regulating cell proliferation. Translation dysregulation also contributes to cancer development, though relatively little is known about the changes that occur to the translational system in cancer. Here, we present global analyses of tRNAs and three categories of enzymes involved in translational regulation in ~10,000 cancer patients across 31 cancer types from The Cancer Genome Atlas. By analyzing the expression levels of tRNAs at the gene, codon, and amino acid levels, we identified unequal alterations in tRNA expression, likely due to the uneven distribution of tRNAs decoding different codons. We find that overexpression of tRNAs recognizing codons with a low observed-over-expected ratio may overcome the translational bottleneck in tumorigenesis. We further observed overall overexpression and amplification of tRNA modification enzymes, aminoacyl-tRNA synthetases, and translation factors, which may play synergistic roles with overexpression of tRNAs to activate the translational systems across multiple cancer types.
Collapse
Affiliation(s)
- Zhao Zhang
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Youqiong Ye
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Jing Gong
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Hang Ruan
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Chun-Jie Liu
- Department of Bioinformatics and Systems Biology, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology Wuhan, 430074 Hubei, People’s Republic of China
| | - Yu Xiang
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - Chunyan Cai
- Department of Internal Medicine, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| | - An-Yuan Guo
- Department of Bioinformatics and Systems Biology, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Key Laboratory of Molecular Biophysics of the Ministry of Education, College of Life Science and Technology, Huazhong University of Science and Technology Wuhan, 430074 Hubei, People’s Republic of China
| | - Jiqiang Ling
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742 USA
| | - Lixia Diao
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - John N. Weinstein
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 USA
| | - Leng Han
- Department of Biochemistry and Molecular Biology, McGovern Medical School at The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
- Center for Precision Health, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA
| |
Collapse
|
13
|
Sandhu C, Qureshi A, Emili A. Panomics for Precision Medicine. Trends Mol Med 2017; 24:85-101. [PMID: 29217119 DOI: 10.1016/j.molmed.2017.11.001] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 11/11/2017] [Accepted: 11/13/2017] [Indexed: 12/24/2022]
Abstract
Medicine is poised to undergo a digital transformation. High-throughput platforms are creating terabytes of genomic, transcriptomic, proteomic, and metabolomic data. The challenge is to interpret these data in a meaningful manner - to uncover relationships that are not readily apparent between molecular profiles and states of health or disease. This will require the development of novel data pipelines and computational tools. The combined analysis of multi-dimensional data is referred to as 'panomics'. The ultimate hope of integrative panomics is that it will lead to the discovery and application of novel markers and targeted therapeutics that drive forward a new era of 'precision medicine' where inter-individual variation is accounted for in the treatment of patients.
Collapse
Affiliation(s)
| | - Alia Qureshi
- Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Andrew Emili
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
14
|
Mittal P, Klingler-Hoffmann M, Arentz G, Zhang C, Kaur G, Oehler MK, Hoffmann P. Proteomics of endometrial cancer diagnosis, treatment, and prognosis. Proteomics Clin Appl 2015; 10:217-29. [DOI: 10.1002/prca.201500055] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/13/2015] [Accepted: 11/02/2015] [Indexed: 11/08/2022]
Affiliation(s)
- Parul Mittal
- Adelaide Proteomics Centre; School of Biological Sciences; The University of Adelaide; Adelaide Australia
- Institute for Photonics and Advanced Sensing (IPAS); The University of Adelaide; Adelaide Australia
| | - Manuela Klingler-Hoffmann
- Adelaide Proteomics Centre; School of Biological Sciences; The University of Adelaide; Adelaide Australia
- Institute for Photonics and Advanced Sensing (IPAS); The University of Adelaide; Adelaide Australia
| | - Georgia Arentz
- Adelaide Proteomics Centre; School of Biological Sciences; The University of Adelaide; Adelaide Australia
- Institute for Photonics and Advanced Sensing (IPAS); The University of Adelaide; Adelaide Australia
| | - Chao Zhang
- Adelaide Proteomics Centre; School of Biological Sciences; The University of Adelaide; Adelaide Australia
- Institute for Photonics and Advanced Sensing (IPAS); The University of Adelaide; Adelaide Australia
| | - Gurjeet Kaur
- Institute for Research in Molecular Medicine; Universiti Sains Malaysia; Minden Pulau Pinang Malaysia
| | - Martin K. Oehler
- Department of Gynaecological Oncology; Royal Adelaide Hospital; North Terrace Adelaide Australia
| | - Peter Hoffmann
- Adelaide Proteomics Centre; School of Biological Sciences; The University of Adelaide; Adelaide Australia
| |
Collapse
|
15
|
Richards S. It's more than stamp collecting: how genome sequencing can unify biological research. Trends Genet 2015; 31:411-21. [PMID: 26003218 DOI: 10.1016/j.tig.2015.04.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Revised: 04/16/2015] [Accepted: 04/17/2015] [Indexed: 10/23/2022]
Abstract
The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, while the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to 'big science' survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need.
Collapse
Affiliation(s)
- Stephen Richards
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
| |
Collapse
|
16
|
Dippold RP, Fisher SA. A bioinformatic and computational study of myosin phosphatase subunit diversity. Am J Physiol Regul Integr Comp Physiol 2014; 307:R256-70. [PMID: 24898838 PMCID: PMC4121627 DOI: 10.1152/ajpregu.00145.2014] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2014] [Accepted: 05/25/2014] [Indexed: 01/01/2023]
Abstract
Variability in myosin phosphatase (MP) subunits may provide specificity in signaling pathways that regulate muscle tone. We utilized public databases and computational algorithms to investigate the phylogenetic diversity of MP regulatory (PPP1R12A-C) and inhibitory (PPP1R14A-D) subunits. The comparison of exonic coding sequences and expression data confirmed or refuted the existence of isoforms and their tissue-specific expression in different model organisms. The comparison of intronic and exonic sequences identified potential expressional regulatory elements. As examples, smooth muscle MP regulatory subunit (PPP1R12A) is highly conserved through evolution. Its alternative exon E24 is present in fish through mammals with two invariant features: 1) a reading frame shift generating a premature termination codon and 2) a hexanucleotide sequence adjacent to the 3' splice site hypothesized to be a novel suppressor of exon splicing. A characteristic of the striated muscle MP regulatory subunit (PPP1R12B) locus is numerous and phylogenetically variable transcriptional start sites. In fish this locus only codes for the small (M21) subunit, suggesting the primordial function of this gene. Inhibitory subunits show little intragenic variability; their diversity is thought to have arisen by expansion and tissue-specific expression of different gene family members. We demonstrate differences in the regulatory landscape between smooth muscle enriched (PPP1R14A) and more ubiquitously expressed (PPP1R14B) family members and identify deeply conserved intronic sequence and predicted transcriptional cis-regulatory elements. This bioinformatic and computational study has uncovered a number of attributes of MP subunits that supports selection of ideal model organisms and testing of hypotheses regarding their physiological significance and regulated expression.
Collapse
Affiliation(s)
- Rachael P Dippold
- Department of Medicine, Cardiology, University of Maryland Baltimore, Baltimore, Maryland
| | - Steven A Fisher
- Department of Medicine, Cardiology, University of Maryland Baltimore, Baltimore, Maryland
| |
Collapse
|
17
|
Shabalina SA, Ogurtsov AY, Spiridonov NA, Koonin EV. Evolution at protein ends: major contribution of alternative transcription initiation and termination to the transcriptome and proteome diversity in mammals. Nucleic Acids Res 2014; 42:7132-44. [PMID: 24792168 PMCID: PMC4066770 DOI: 10.1093/nar/gku342] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20984, USA
| | - Aleksey Y Ogurtsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20984, USA
| | - Nikolay A Spiridonov
- Division of Therapeutic Proteins, Center for Drug Evaluation and Research, US Food and Drug Administration, Bethesda, MD 20892, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20984, USA
| |
Collapse
|
18
|
Roy B, Haupt LM, Griffiths LR. Review: Alternative Splicing (AS) of Genes As An Approach for Generating Protein Complexity. Curr Genomics 2013; 14:182-94. [PMID: 24179441 PMCID: PMC3664468 DOI: 10.2174/1389202911314030004] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Revised: 02/08/2013] [Accepted: 02/25/2013] [Indexed: 12/22/2022] Open
Abstract
Prior to the completion of the human genome project, the human genome was thought to have a greater number of genes as it seemed structurally and functionally more complex than other simpler organisms. This along with the belief of “one gene, one protein”, were demonstrated to be incorrect. The inequality in the ratio of gene to protein formation gave rise to the theory of alternative splicing (AS). AS is a mechanism by which one gene gives rise to multiple protein products. Numerous databases and online bioinformatic tools are available for the detection and analysis of AS. Bioinformatics provides an important approach to study mRNA and protein diversity by various tools such as expressed sequence tag (EST) sequences obtained from completely processed mRNA. Microarrays and deep sequencing approaches also aid in the detection of splicing events. Initially it was postulated that AS occurred only in about 5% of all genes but was later found to be more abundant. Using bioinformatic approaches, the level of AS in human genes was found to be fairly high with 35-59% of genes having at least one AS form. Our ability to determine and predict AS is important as disorders in splicing patterns may lead to abnormal splice variants resulting in genetic diseases. In addition, the diversity of proteins produced by AS poses a challenge for successful drug discovery and therefore a greater understanding of AS would be beneficial.
Collapse
Affiliation(s)
- Bishakha Roy
- Genomics Research Centre, Griffith Health Institute, Griffith University Gold Coast, Queensland 4222, Australia
| | | | | |
Collapse
|
19
|
Transcriptome sequencing and de novo annotation of the critically endangered Adriatic sturgeon. BMC Genomics 2013; 14:407. [PMID: 23773438 PMCID: PMC3691660 DOI: 10.1186/1471-2164-14-407] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Accepted: 06/04/2013] [Indexed: 12/14/2022] Open
Abstract
Background Sturgeons are a group of Condrostean fish with very high evolutionary, economical and conservation interest. The eggs of these living fossils represent one of the most high prized foods of animal origin. The intense fishing pressure on wild stocks to harvest caviar has caused in the last decades a dramatic decline of their distribution and abundance leading the International Union for Conservation of Nature to list them as the more endangered group of species. As a direct consequence, world-wide efforts have been made to develop sturgeon aquaculture programmes for caviar production. In this context, the characterization of the genes involved in sex determination could provide relevant information for the selective farming of the more profitable females. Results The 454 sequencing of two cDNA libraries from the gonads and brain of one male and one female full-sib A. naccarii, yielded 182,066 and 167,776 reads respectively, which, after strict quality control, were iterative assembled into more than 55,000 high quality ESTs. The average per-base coverage reached by assembling the two libraries was 4X. The multi-step annotation process resulted in 16% successfully annotated sequences with GO terms. We screened the transcriptome for 32 sex-related genes and highlighted 7 genes that are potentially specifically expressed, 5 in male and 2 in females, at the first life stage at which sex is histologically identifiable. In addition we identified 21,791 putative EST-linked SNPs and 5,295 SSRs. Conclusions This study represents the first large massive release of sturgeon transcriptome information that we organized into the public database AnaccariiBase, which is freely available at http://compgen.bio.unipd.it/anaccariibase/. This transcriptomic data represents an important source of information for further studies on sturgeon species. The hundreds of putative EST-linked molecular makers discovered in this study will be invaluable for sturgeon reintroduction and breeding programs.
Collapse
|
20
|
Chen G, Wang C, Shi L, Qu X, Chen J, Yang J, Shi C, Chen L, Zhou P, Ning B, Tong W, Shi T. Incorporating the human gene annotations in different databases significantly improved transcriptomic and genetic analyses. RNA (NEW YORK, N.Y.) 2013; 19:479-89. [PMID: 23431329 PMCID: PMC3677258 DOI: 10.1261/rna.037473.112] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 01/14/2013] [Indexed: 05/18/2023]
Abstract
Human gene annotation is crucial for conducting transcriptomic and genetic studies; however, the impacts of human gene annotations in diverse databases on related studies have been less evaluated. To enable full use of various human annotation resources and better understand the human transcriptome, here we systematically compare the human annotations present in RefSeq, Ensembl (GENCODE), and AceView on diverse transcriptomic and genetic analyses. We found that the human gene annotations in the three databases are far from complete. Although Ensembl and AceView annotated more genes than RefSeq, more than 15,800 genes from Ensembl (or AceView) are within the intergenic and intronic regions of AceView (or Ensembl) annotation. The human transcriptome annotations in RefSeq, Ensembl, and AceView had distinct effects on short-read mapping, gene and isoform expression profiling, and differential expression calling. Furthermore, our findings indicate that the integrated annotation of these databases can obtain a more complete gene set and significantly enhance those transcriptomic analyses. We also observed that many more known SNPs were located within genes annotated in Ensembl and AceView than in RefSeq. In particular, 1033 of 3041 trait/disease-associated SNPs involved in about 200 human traits/diseases that were previously reported to be in RefSeq intergenic regions could be relocated within Ensembl and AceView genes. Our findings illustrate that a more complete transcriptome generated by incorporating human gene annotations in diverse databases can strikingly improve the overall results of transcriptomic and genetic studies.
Collapse
Affiliation(s)
- Geng Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Charles Wang
- Functional Genomics Core, Beckman Research Institute, City of Hope Comprehensive Cancer Center, Duarte, California 91010, USA
| | - Leming Shi
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas 72079, USA
| | - Xiongfei Qu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Jiwei Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Jianmin Yang
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Caiping Shi
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Long Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Peiying Zhou
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Baitang Ning
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas 72079, USA
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas 72079, USA
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai 200241, China
- Corresponding authorE-mail
| |
Collapse
|
21
|
Tan HT, Lee YH, Chung MCM. Cancer proteomics. MASS SPECTROMETRY REVIEWS 2012; 31:583-605. [PMID: 22422534 DOI: 10.1002/mas.20356] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2011] [Revised: 11/16/2011] [Accepted: 11/16/2011] [Indexed: 05/31/2023]
Abstract
Cancer presents high mortality and morbidity globally, largely due to its complex and heterogenous nature, and lack of biomarkers for early diagnosis. A proteomics study of cancer aims to identify and characterize functional proteins that drive the transformation of malignancy, and to discover biomarkers to detect early-stage cancer, predict prognosis, determine therapy efficacy, identify novel drug targets, and ultimately develop personalized medicine. The various sources of human samples such as cell lines, tissues, and plasma/serum are probed by a plethora of proteomics tools to discover novel biomarkers and elucidate mechanisms of tumorigenesis. Innovative proteomics technologies and strategies have been designed for protein identification, quantitation, fractionation, and enrichment to delve deeper into the oncoproteome. In addition, there is the need for high-throughput methods for biomarker validation, and integration of the various platforms of oncoproteome data to fully comprehend cancer biology.
Collapse
Affiliation(s)
- Hwee Tong Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | | | | |
Collapse
|
22
|
KUZNETSOV VLADIMIRA, PICKALOV VALERYV, SENKO OLEGV, KNOTT GARYD. ANALYSIS OF THE EVOLVING PROTEOMES: PREDICTIONS OF THE NUMBER OF PROTEIN DOMAINS IN NATURE AND THE NUMBER OF GENES IN EUKARYOTIC ORGANISMS. J BIOL SYST 2012. [DOI: 10.1142/s0218339002000767] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Motivation: Obtaining accurate estimates of the numbers of protein-coding genes and protein domains in a proteome, and the number of protein domains in nature is a daunting challenge. Computational analysis of the protein domain sets in the proteomes of many species allows us to estimate these numbers and to find their evolution relationships.Results: We have analyzed the distributions of the number of occurrences of protein domains in sample proteomes of the 70 fully sequenced genome organisms of three major kingdoms of life: Archaea, Bacteria and Eukaryota. We found that a large fraction of the identified distinct protein domains (i.e., unique domains and homologous domain families) in these 70 proteomes (1051 (23%) out of 4493) are found in at least one organism in each of these kingdoms of life and that 43 (1%) of these domains are common to all the 70 organisms. All the observed domain occurrence frequency distributions for these 70 proteomes are well fitted by a family of Pareto-like functions, associated with the steady state distributions of a linear Markov random process. We present explicit formulas that accurately predict the number of distinct protein domains and the number of protein-coding genes for a given organism as functions of the number of non-redundant domain-to-protein links in the proteomes. These functions allows us to predict that there are 42,740, 27,900, and 21,200 protein-coding genes/open reading frames in the human, A. thaliana, and mouse genomes, respectively. We also estimate that there are 5271, 2955, and 4915 distinct protein domains in the human, A. thaliana, and mouse proteomes, respectively, and about 5500 distinct protein domains in the entire "proteome world".
Collapse
Affiliation(s)
- VLADIMIR A. KUZNETSOV
- The Laboratory of Integrative and Medical Biophysics, National Institute of Child Health and Human Development, 13 South Drive, Bethesda, MD 20892, USA
| | - VALERY V. PICKALOV
- Institute of Theoretical and Applied Mechanics SB RAS, Novosibirsk, 630090, Russia
| | - OLEG V. SENKO
- Computer Center of Russian Academy of Sciences, Vavilov str. 40, 117967 Moscow, Russia
| | - GARY D. KNOTT
- Civilized Software, Inc., 12109 Heritage Park Circle, Silver Spring, MD 20906, USA
| |
Collapse
|
23
|
Luo S, Mach J, Abramson B, Ramirez R, Schurr R, Barone P, Copenhaver G, Folkerts O. The cotton centromere contains a Ty3-gypsy-like LTR retroelement. PLoS One 2012; 7:e35261. [PMID: 22536361 DOI: 10.1371/journal.pone.0035261] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 03/13/2012] [Indexed: 01/16/2023] Open
Abstract
The centromere is a repeat-rich structure essential for chromosome segregation; with the long-term aim of understanding centromere structure and function, we set out to identify cotton centromere sequences. To isolate centromere-associated sequences from cotton, (Gossypium hirsutum) we surveyed tandem and dispersed repetitive DNA in the genus. Centromere-associated elements in other plants include tandem repeats and, in some cases, centromere-specific retroelements. Examination of cotton genomic survey sequences for tandem repeats yielded sequences that did not localize to the centromere. However, among the repetitive sequences we also identified a gypsy-like LTR retrotransposon (Centromere Retroelement Gossypium, CRG) that localizes to the centromere region of all chromosomes in domestic upland cotton, Gossypium hirsutum, the major commercially grown cotton. The location of the functional centromere was confirmed by immunostaining with antiserum to the centromere-specific histone CENH3, which co-localizes with CRG hybridization on metaphase mitotic chromosomes. G. hirsutum is an allotetraploid composed of A and D genomes and CRG is also present in the centromere regions of other AD cotton species. Furthermore, FISH and genomic dot blot hybridization revealed that CRG is found in D-genome diploid cotton species, but not in A-genome diploid species, indicating that this retroelement may have invaded the A-genome centromeres during allopolyploid formation and amplified during evolutionary history. CRG is also found in other diploid Gossypium species, including B and E2 genome species, but not in the C, E1, F, and G genome species tested. Isolation of this centromere-specific retrotransposon from Gossypium provides a probe for further understanding of centromere structure, and a tool for future engineering of centromere mini-chromosomes in this important crop species.
Collapse
Affiliation(s)
- Song Luo
- Chromatin, Inc., Chicago, Illinois, United States of America
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Consiglio A, Carella M, De Caro G, Delle Foglie G, Giovannelli C, Grillo G, Ianigro M, Licciulli F, Palumbo O, Piepoli A, Ranieri E, Liuni S. BEAT: Bioinformatics Exon Array Tool to store, analyze and visualize Affymetrix GeneChip Human Exon Array data from disease experiments. BMC Bioinformatics 2012; 13 Suppl 4:S21. [PMID: 22536968 PMCID: PMC3314565 DOI: 10.1186/1471-2105-13-s4-s21] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND It is known from recent studies that more than 90% of human multi-exon genes are subject to Alternative Splicing (AS), a key molecular mechanism in which multiple transcripts may be generated from a single gene. It is widely recognized that a breakdown in AS mechanisms plays an important role in cellular differentiation and pathologies. Polymerase Chain Reactions, microarrays and sequencing technologies have been applied to the study of transcript diversity arising from alternative expression. Last generation Affymetrix GeneChip Human Exon 1.0 ST Arrays offer a more detailed view of the gene expression profile providing information on the AS patterns. The exon array technology, with more than five million data points, can detect approximately one million exons, and it allows performing analyses at both gene and exon level. In this paper we describe BEAT, an integrated user-friendly bioinformatics framework to store, analyze and visualize exon arrays datasets. It combines a data warehouse approach with some rigorous statistical methods for assessing the AS of genes involved in diseases. Meta statistics are proposed as a novel approach to explore the analysis results. BEAT is available at http://beat.ba.itb.cnr.it. RESULTS BEAT is a web tool which allows uploading and analyzing exon array datasets using standard statistical methods and an easy-to-use graphical web front-end. BEAT has been tested on a dataset with 173 samples and tuned using new datasets of exon array experiments from 28 colorectal cancer and 26 renal cell cancer samples produced at the Medical Genetics Unit of IRCCS Casa Sollievo della Sofferenza.To highlight all possible AS events, alternative names, accession Ids, Gene Ontology terms and biochemical pathways annotations are integrated with exon and gene level expression plots. The user can customize the results choosing custom thresholds for the statistical parameters and exploiting the available clinical data of the samples for a multivariate AS analysis. CONCLUSIONS Despite exon array chips being widely used for transcriptomics studies, there is a lack of analysis tools offering advanced statistical features and requiring no programming knowledge. BEAT provides a user-friendly platform for a comprehensive study of AS events in human diseases, displaying the analysis results with easily interpretable and interactive tables and graphics.
Collapse
Affiliation(s)
- Arianna Consiglio
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy
| | - Massimo Carella
- Medical Genetics Unit, Casa Sollievo della Sofferenza IRCCS, San Giovanni Rotondo Foggia, 71013, Italy
| | - Giorgio De Caro
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy
| | - Gianfranco Delle Foglie
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy
| | - Candida Giovannelli
- Institute of Intelligent Systems for Automation - ISSIA, National Research Council, Bari, 70126, Italy
| | - Giorgio Grillo
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy
| | - Massimo Ianigro
- Institute of Intelligent Systems for Automation - ISSIA, National Research Council, Bari, 70126, Italy
| | - Flavio Licciulli
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy
| | - Orazio Palumbo
- Medical Genetics Unit, Casa Sollievo della Sofferenza IRCCS, San Giovanni Rotondo Foggia, 71013, Italy
| | - Ada Piepoli
- Department and Laboratory of Gastroenterology Unit, Casa Sollievo della Sofferenza IRCCS, San Giovanni Rotondo Foggia, 71013, Italy
| | - Elena Ranieri
- Department of Biomedical Science, University of Foggia, Foggia, 71122, Italy
| | - Sabino Liuni
- Institute for Biomedical Technologies of Bari - ITB, National Research Council, Bari, 70126, Italy
| |
Collapse
|
25
|
Zheng Z, Xu F. Neuroplasticity may play a role in inter-individual difference among neuropsychiatric disease treatment efficacy. Dev Psychobiol 2012; 54:369-71. [DOI: 10.1002/dev.20561] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Accepted: 04/18/2011] [Indexed: 02/04/2023]
|
26
|
Ha HS, Chung WK, Ahn K, Bae JH, Park SJ, Moon JW, Nam KH, Han K, Cho HG, Kim HS. Development of GEBRET: a web-based analysis tool for retroelements in primate genomes. Genes Genomics 2011. [DOI: 10.1007/s13258-011-0103-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
27
|
On parameters of the human genome. J Theor Biol 2011; 288:92-104. [DOI: 10.1016/j.jtbi.2011.07.021] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Revised: 06/28/2011] [Accepted: 07/21/2011] [Indexed: 02/06/2023]
|
28
|
Rigault P, Boyle B, Lepage P, Cooke JEK, Bousquet J, MacKay JJ. A white spruce gene catalog for conifer genome analyses. PLANT PHYSIOLOGY 2011; 157:14-28. [PMID: 21730200 PMCID: PMC3165865 DOI: 10.1104/pp.111.179663] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2011] [Accepted: 06/24/2011] [Indexed: 05/18/2023]
Abstract
Several angiosperm plant genomes, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), poplar (Populus trichocarpa), and grapevine (Vitis vinifera), have been sequenced, but the lack of reference genomes in gymnosperm phyla reduces our understanding of plant evolution and restricts the potential impacts of genomics research. A gene catalog was developed for the conifer tree Picea glauca (white spruce) through large-scale expressed sequence tag sequencing and full-length cDNA sequencing to facilitate genome characterizations, comparative genomics, and gene mapping. The resource incorporates new and publicly available sequences into 27,720 cDNA clusters, 23,589 of which are represented by full-length insert cDNAs. Expressed sequence tags, mate-pair cDNA clone analysis, and custom sequencing were integrated through an iterative process to improve the accuracy of clustering outcomes. The entire catalog spans 30 Mb of unique transcribed sequence. We estimated that the P. glauca nuclear genome contains up to 32,520 transcribed genes owing to incomplete, partially sequenced, and unsampled transcripts and that its transcriptome could span up to 47 Mb. These estimates are in the same range as the Arabidopsis and rice transcriptomes. Next-generation methods confirmed and enhanced the catalog by providing deeper coverage for rare transcripts, by extending many incomplete clusters, and by augmenting the overall transcriptome coverage to 38 Mb of unique sequence. Genomic sample sequencing at 8.5% of the 19.8-Gb P. glauca genome identified 1,495 clusters representing highly repeated sequences among the cDNA clusters. With a conifer transcriptome in full view, functional and protein domain annotations clearly highlighted the divergences between conifers and angiosperms, likely reflecting their respective evolutionary paths.
Collapse
|
29
|
Bastepe M. The GNAS Locus: Quintessential Complex Gene Encoding Gsalpha, XLalphas, and other Imprinted Transcripts. Curr Genomics 2011; 8:398-414. [PMID: 19412439 PMCID: PMC2671723 DOI: 10.2174/138920207783406488] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2007] [Revised: 09/22/2007] [Accepted: 09/28/2007] [Indexed: 12/14/2022] Open
Abstract
The currently estimated number of genes in the human genome is much smaller than previously predicted. As an explanation for this disparity, most individual genes have multiple transcriptional units that represent a variety of biologically important gene products. GNAS exemplifies a gene of such complexity. One of its products is the alpha-subunit of the stimulatory heterotrimeric G protein (Gsalpha), a ubiquitous signaling protein essential for numerous different cellular responses. Loss-of-function and gain-of-function mutations within Gsalpha-coding GNAS exons are found in various human disorders, including Albright's hereditary osteodystrophy, pseudohypoparathyroidism, fibrous dysplasia of bone, and some tumors of different origin. While Gsalpha expression in most tissues is biallelic, paternal Gsalpha expression is silenced in a small number of tissues, playing an important role in the development of phenotypes associated with GNAS mutations. Additional products derived exclusively from the paternal GNAS allele include XLalphas, a protein partially identical to Gsalpha, and two non-coding RNA molecules, the A/B transcript and the antisense transcript. The maternal GNAS allele leads to NESP55, a chromogranin-like neuroendocrine secretory protein. In vivo animal models have demonstrated the importance of each of the exclusively imprinted GNAS products in normal mammalian physiology. However, although one or more of these products are also disrupted by most naturally occurring GNAS mutations, their roles in disease pathogenesis remain unknown. To further our understanding of the significance of this gene in physiology and pathophysiology, it will be important to elucidate the cellular roles and the mechanisms regulating the expression of each GNAS product.
Collapse
Affiliation(s)
- Murat Bastepe
- Endocrine Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
30
|
Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer. Mol Biol Rep 2011; 39:2981-90. [PMID: 21681423 DOI: 10.1007/s11033-011-1060-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Accepted: 06/08/2011] [Indexed: 10/18/2022]
Abstract
Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.
Collapse
|
31
|
Tan U. Uner tan syndrome: history, clinical evaluations, genetics, and the dynamics of human quadrupedalism. Open Neurol J 2010; 4:78-89. [PMID: 21258577 PMCID: PMC3024602 DOI: 10.2174/1874205x01004010078] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2010] [Revised: 06/14/2010] [Accepted: 06/15/2010] [Indexed: 11/22/2022] Open
Abstract
This review includes for the first time a dynamical systems analysis of human quadrupedalism in Uner Tan syndrome, which is characterized by habitual quadrupedalism, impaired intelligence, and rudimentary speech. The first family was discovered in a small village near Iskenderun, and families were later found in Adana and two other small villages near Gaziantep and Canakkale. In all the affected individuals dynamic balance was impaired during upright walking, and they habitually preferred walking on all four extremities. MRI scans showed inferior cerebellovermian hypoplasia with slightly simplified cerebral gyri in three of the families, but appeared normal in the fourth. PET scans showed a decreased glucose metabolic activity in the cerebellum, vermis and, to a lesser extent the cerebral cortex, except for one patient, whose MRI scan also appeared to be normal. All four families had consanguineous marriages in their pedigrees, suggesting autosomal recessive transmission. The syndrome was genetically heterogeneous. Since the initial discoveries more cases have been found, and these exhibit facultative quadrupedal locomotion, and in one case, late childhood onset. It has been suggested that the human quadrupedalism may, at least, be a phenotypic example of reverse evolution. From the viewpoint of dynamic systems theory, it was concluded there may not be a single factor that predetermines human quadrupedalism in Uner Tan syndrome, but that it may involve self-organization, brain plasticity, and rewiring, from the many decentralized and local interactions among neuronal, genetic, and environmental subsystems.
Collapse
Affiliation(s)
- Uner Tan
- Department of Physiology, Çukurova University, Medical School, 01330 Adana, Turkey
| |
Collapse
|
32
|
Dunham I, Beare DM, Collins JE. The characteristics of human genes: analysis of human chromosome 22. Comp Funct Genomics 2010; 4:635-46. [PMID: 18629020 PMCID: PMC2447302 DOI: 10.1002/cfg.335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2003] [Revised: 09/04/2003] [Accepted: 09/08/2003] [Indexed: 11/11/2022] Open
Affiliation(s)
- Ian Dunham
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | | | |
Collapse
|
33
|
Simpson AJ, de Souza SJ, Camargo AA, Brentani RR. Definition of the gene content of the human genome: the need for deep experimental verification. Comp Funct Genomics 2010; 2:169-75. [PMID: 18628909 PMCID: PMC2447206 DOI: 10.1002/cfg.81] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2001] [Accepted: 04/05/2001] [Indexed: 11/06/2022] Open
Abstract
Based on the analysis of the drafts of the human genome sequence, it is being speculated that our species may possess an unexpectedly low number of genes. The quality of the drafts, the impossibility of accurate gene prediction and the lack of sufficient transcript sequence data, however, render such speculations very premature. The complexity of human gene structure requires additional and extensive experimental verification of transcripts that may result in major revisions of these early estimates of the number of human genes.
Collapse
Affiliation(s)
- A J Simpson
- The Ludwig Institute for Cancer Research, Rua Professor Antônio Prudente 109, São Paulo, SP 01509-010, Brazil.
| | | | | | | |
Collapse
|
34
|
Abstract
Many people expected the question 'How many genes in the human genome?' to be resolved with the publication of the genome sequence in 2001, but estimates continue to fluctuate.
Collapse
Affiliation(s)
- Mihaela Pertea
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | - Steven L Salzberg
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
35
|
Next generation transcriptomes for next generation genomes using est2assembly. BMC Bioinformatics 2009; 10:447. [PMID: 20034392 PMCID: PMC3087352 DOI: 10.1186/1471-2105-10-447] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Accepted: 12/24/2009] [Indexed: 11/10/2022] Open
Abstract
Background The decreasing costs of capillary-based Sanger sequencing and next generation technologies, such as 454 pyrosequencing, have prompted an explosion of transcriptome projects in non-model species, where even shallow sequencing of transcriptomes can now be used to examine a range of research questions. This rapid growth in data has outstripped the ability of researchers working on non-model species to analyze and mine transcriptome data efficiently. Results Here we present a semi-automated platform 'est2assembly' that processes raw sequence data from Sanger or 454 sequencing into a hybrid de-novo assembly, annotates it and produces GMOD compatible output, including a SeqFeature database suitable for GBrowse. Users are able to parameterize assembler variables, judge assembly quality and determine the optimal assembly for their specific needs. We used est2assembly to process Drosophila and Bicyclus public Sanger EST data and then compared them to published 454 data as well as eight new insect transcriptome collections. Conclusions Analysis of such a wide variety of data allows us to understand how these new technologies can assist EST project design. We determine that assembler parameterization is as essential as standardized methods to judge the output of ESTs projects. Further, even shallow sequencing using 454 produces sufficient data to be of wide use to the community. est2assembly is an important tool to assist manual curation for gene models, an important resource in their own right but especially for species which are due to acquire a genome project using Next Generation Sequencing.
Collapse
|
36
|
Whittle CA, Krochko JE. Transcript profiling provides evidence of functional divergence and expression networks among ribosomal protein gene paralogs in Brassica napus. THE PLANT CELL 2009; 21:2203-19. [PMID: 19706795 PMCID: PMC2751962 DOI: 10.1105/tpc.109.068411] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2009] [Revised: 06/14/2009] [Accepted: 07/15/2009] [Indexed: 05/19/2023]
Abstract
The plant ribosome is composed of 80 distinct ribosomal (r)-proteins. In Arabidopsis thaliana, each r-protein is encoded by two or more highly similar paralogous genes, although only one copy of each r-protein is incorporated into the ribosome. Brassica napus is especially suited to the comparative study of r-protein gene paralogs due to its documented history of genome duplication as well as the recent availability of large EST data sets. We have identified 996 putative r-protein genes spanning 79 distinct r-proteins in B. napus using EST data from 16 tissue collections. A total of 23,408 tissue-specific r-protein ESTs are associated with this gene set. Comparative analysis of the transcript levels for these unigenes reveals that a large fraction of r-protein genes are differentially expressed and that the number of paralogs expressed for each r-protein varies extensively with tissue type in B. napus. In addition, in many cases the paralogous genes for a specific r-protein are not transcribed in concert and have highly contrasting expression patterns among tissues. Thus, each tissue examined has a novel r-protein transcript population. Furthermore, hierarchical clustering reveals that particular paralogs for nonhomologous r-protein genes cluster together, suggesting that r-protein paralog combinations are associated with specific tissues in B. napus and, thus, may contribute to tissue differentiation and/or specialization. Altogether, the data suggest that duplicated r-protein genes undergo functional divergence into highly specialized paralogs and coexpression networks and that, similar to recent reports for yeast, these are likely actively involved in differentiation, development, and/or tissue-specific processes.
Collapse
|
37
|
Endogenous retroviral LTRs as promoters for human genes: a critical assessment. Gene 2009; 448:105-14. [PMID: 19577618 DOI: 10.1016/j.gene.2009.06.020] [Citation(s) in RCA: 221] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Revised: 06/10/2009] [Accepted: 06/22/2009] [Indexed: 12/24/2022]
Abstract
Gene regulatory changes are thought to be major factors driving species evolution, with creation of new regulatory regions likely being instrumental in contributing to diversity among vertebrates. There is growing appreciation for the role of transposable elements (TEs) in gene regulation and, indeed, laboratory investigations have confirmed many specific examples of mammalian genes regulated by promoters donated by endogenous retroviruses (ERVs) or other TEs. Bioinformatics studies have revealed hundreds of additional instances where this is likely to be the case. Since the long terminal repeats (LTRs) of retroviruses naturally contain abundant transcriptional regulatory signals, roles for ERV LTRs in regulating mammalian genes are eminently plausible. Moreover, it seems reasonable that exaptation of an LTR regulatory module provides opportunities for evolution of new gene regulatory patterns. In this Review we summarize known examples of LTRs that function as human gene alternative promoters, as well as the evidence that LTR exaptation has resulted in a pattern of novel gene expression significantly different from the pattern before LTR insertion or from that of gene orthologs lacking the LTR. Available data suggest that, while new expression patterns can arise as a result of LTR usage, this situation is relatively rare and is largely restricted to the placenta. In many cases, the LTR appears to be a minor, alternative promoter with an expression pattern similar to that of the native promoter(s) and hence likely exerts a subtle overall effect on gene expression. We discuss these findings and offer evolutionary models to explain these trends.
Collapse
|
38
|
Nordström KJV, Mirza MAI, Almén MS, Gloriam DE, Fredriksson R, Schiöth HB. Critical evaluation of the FANTOM3 non-coding RNA transcripts. Genomics 2009; 94:169-76. [PMID: 19505569 DOI: 10.1016/j.ygeno.2009.05.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2007] [Revised: 05/25/2009] [Accepted: 05/26/2009] [Indexed: 01/15/2023]
Abstract
We studied the genomic positions of 38,129 putative ncRNAs from the RIKEN dataset in relation to protein-coding genes. We found that the dataset has 41% sense, 6% antisense, 24% intronic and 29% intergenic transcripts. Interestingly, 17,678 (47%) of the FANTOM3 transcripts were found to potentially be internally primed from longer transcripts. The highest fraction of these transcripts was found among the intronic transcripts and as many as 77% or 6929 intronic transcripts were both internally primed and unspliced. We defined a filtered subset of 8535 transcripts that did not overlap with protein-coding genes, did not contain ORFs longer than 100 residues and were not internally primed. This dataset contains 53% of the FANTOM3 transcripts associated to known ncRNA in RNAdb and expands previous similar efforts with 6523 novel transcripts. This bioinformatic filtering of the FANTOM3 non-coding dataset has generated a lead dataset of transcripts without signs of being artefacts, providing a suitable dataset for investigation with hybridization-based techniques.
Collapse
|
39
|
Illingworth RS, Bird AP. CpG islands--'a rough guide'. FEBS Lett 2009; 583:1713-20. [PMID: 19376112 DOI: 10.1016/j.febslet.2009.04.012] [Citation(s) in RCA: 578] [Impact Index Per Article: 38.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2009] [Revised: 04/04/2009] [Accepted: 04/06/2009] [Indexed: 02/07/2023]
Abstract
Mammalian genomes are punctuated by DNA sequences containing an atypically high frequency of CpG sites termed CpG islands (CGIs). CGIs generally lack DNA methylation and associate with the majority of annotated gene promoters. Many studies, however, have identified examples of CGI methylation in malignant cells, leading to improper gene silencing. CGI methylation also occurs in normal tissues and is known to function in X-inactivation and genomic imprinting. More recently, differential methylation has been shown between tissues, suggesting a potential role in transcriptional regulation during cell specification. Many of these tissue-specific methylated CGIs localise to regions distal to promoters, the regulatory function of which remains to be determined.
Collapse
Affiliation(s)
- Robert S Illingworth
- Wellcome Trust Centre for Cell Biology, Michael Swann Building, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JR, United Kingdom.
| | | |
Collapse
|
40
|
Alexandrov NN, Brover VV, Freidin S, Troukhan ME, Tatarinova TV, Zhang H, Swaller TJ, Lu YP, Bouck J, Flavell RB, Feldmann KA. Insights into corn genes derived from large-scale cDNA sequencing. PLANT MOLECULAR BIOLOGY 2009; 69:179-94. [PMID: 18937034 PMCID: PMC2709227 DOI: 10.1007/s11103-008-9415-4] [Citation(s) in RCA: 149] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 10/01/2008] [Indexed: 05/19/2023]
Abstract
We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701-EU977132 (FLI cDNA) and FK944382-FL482108 (EST).
Collapse
|
41
|
|
42
|
Opsal MA, Lien S, Brenna-Hansen S, Olsen HG, Våge DI. Association analysis of the constructed linkage maps covering TLR2 and TLR4 with clinical mastitis in Norwegian Red cattle. J Anim Breed Genet 2008; 125:110-8. [PMID: 18363976 DOI: 10.1111/j.1439-0388.2007.00704.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Toll-like receptors (TLR) are important cell-surface molecules mediating immune responses. Previous studies have identified TLR2 and TLR4 as potential candidate genes for disease resistance. In this study, dense linkage maps comprising single nucleotide polymorphisms (SNPs) have been constructed for the chromosomal regions harbouring TLR2 and TLR4 on bovine chromosome 17 and 8. The most likely marker orders for both regions were compared with the corresponding human map positions and used to reorder bovine scaffolds available from the bovine genome sequence assembly (Btau_3.1). A combined linkage and linkage disequilibrium method was used to investigate possible associations between the TLR genes and mastitis susceptibility recorded in the Norwegian Red cattle population. The analysis did not detect any significant association between the chromosomal regions surrounding TLR2 and TLR4 and mastitis in Norwegian Red cattle.
Collapse
Affiliation(s)
- M A Opsal
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Aas, Norway.
| | | | | | | | | |
Collapse
|
43
|
Salzburger W, Renn SCP, Steinke D, Braasch I, Hofmann HA, Meyer A. Annotation of expressed sequence tags for the East African cichlid fish Astatotilapia burtoni and evolutionary analyses of cichlid ORFs. BMC Genomics 2008; 9:96. [PMID: 18298844 PMCID: PMC2279125 DOI: 10.1186/1471-2164-9-96] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2007] [Accepted: 02/25/2008] [Indexed: 11/13/2022] Open
Abstract
Background The cichlid fishes in general, and the exceptionally diverse East African haplochromine cichlids in particular, are famous examples of adaptive radiation and explosive speciation. Here we report the collection and annotation of more than 12,000 expressed sequence tags (ESTs) generated from three different cDNA libraries obtained from the East African haplochromine cichlid species Astatotilapia burtoni and Metriaclima zebra. Results We first annotated more than 12,000 newly generated cichlid ESTs using the Gene Ontology classification system. For evolutionary analyses, we combined these ESTs with all available sequence data for haplochromine cichlids, which resulted in a total of more than 45,000 ESTs. The ESTs represent a broad range of molecular functions and biological processes. We compared the haplochromine ESTs to sequence data from those available for other fish model systems such as pufferfish (Takifugu rubripes and Tetraodon nigroviridis), trout, and zebrafish. We characterized genes that show a faster or slower rate of base substitutions in haplochromine cichlids compared to other fish species, as this is indicative of a relaxed or reinforced selection regime. Four of these genes showed the signature of positive selection as revealed by calculating Ka/Ks ratios. Conclusion About 22% of the surveyed ESTs were found to have cichlid specific rate differences suggesting that these genes might play a role in lineage specific characteristics of cichlids. We also conclude that the four genes with a Ka/Ks ratio greater than one appear as good candidate genes for further work on the genetic basis of evolutionary success of haplochromine cichlid fishes.
Collapse
Affiliation(s)
- Walter Salzburger
- Lehrstuhl für Zoologie und Evolutionsbiologie, Department of Biology, University of Konstanz, 78467 Konstanz, Germany.
| | | | | | | | | | | |
Collapse
|
44
|
Liang C, Wang G, Liu L, Ji G, Fang L, Liu Y, Carter K, Webb JS, Dean JFD. ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs). BMC Genomics 2007; 8:134. [PMID: 17535431 PMCID: PMC1894976 DOI: 10.1186/1471-2164-8-134] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2006] [Accepted: 05/29/2007] [Indexed: 11/30/2022] Open
Abstract
Background With the advent of low-cost, high-throughput sequencing, the amount of public domain Expressed Sequence Tag (EST) sequence data available for both model and non-model organism is growing exponentially. While these data are widely used for characterizing various genomes, they also present a serious challenge for data quality control and validation due to their inherent deficiencies, particularly for species without genome sequences. Description ConiferEST is an integrated system for data reprocessing, visualization and mining of conifer ESTs. In its current release, Build 1.0, it houses 172,229 loblolly pine EST sequence reads, which were obtained from reprocessing raw DNA sequencer traces using our software – WebTraceMiner. The trace files were downloaded from NCBI Trace Archive. ConiferEST provides biologists unique, easy-to-use data visualization and mining tools for a variety of putative sequence features including cloning vector segments, adapter sequences, restriction endonuclease recognition sites, polyA and polyT runs, and their corresponding Phred quality values. Based on these putative features, verified sequence features such as 3' and/or 5' termini of cDNA inserts in either sense or non-sense strand have been identified in-silico. Interestingly, only 30.03% of the designated 3' ESTs were found to have an authenticated 5' terminus in the non-sense strand (i.e., polyT tails), while fewer than 5.34% of the designated 5' ESTs had a verified 5' terminus in the sense strand. Such previously ignored features provide valuable insight for data quality control and validation of error-prone ESTs, as well as the ability to identify novel functional motifs embedded in large EST datasets. We found that "double-termini adapters" were effective indicators of potential EST chimeras. For all sequences with in-silico verified termini/terminus, we used InterProScan to assign protein domain signatures, results of which are available for in-depth exploration using our biologist-friendly web interfaces. Conclusion ConiferEST represents a unique and complementary public resource for EST data integration and mining in conifers by reprocessing raw DNA traces, identifying putative sequence features and determining and annotating in-silico verified features. Seamlessly integrated with other public resources, ConiferEST provides biologists powerful tools to verify data, visualize abnormalities, including EST chimeras, and explore large EST datasets.
Collapse
Affiliation(s)
- Chun Liang
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Gang Wang
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Lin Liu
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian, 361005, China
| | - Lin Fang
- Beijing Genomics Institute, Beijing 101300, China
| | - Yuansheng Liu
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Kikia Carter
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Jason S Webb
- Department of Botany, Miami University, Oxford, Ohio 45056, USA
| | - Jeffrey FD Dean
- Warnell School of Forestry and Natural Resources, University of Georgia, Athens, Georgia 30602, USA
| |
Collapse
|
45
|
Fierro AC, Thuret R, Coen L, Perron M, Demeneix BA, Wegnez M, Gyapay G, Weissenbach J, Wincker P, Mazabraud A, Pollet N. Exploring nervous system transcriptomes during embryogenesis and metamorphosis in Xenopus tropicalis using EST analysis. BMC Genomics 2007; 8:118. [PMID: 17506875 PMCID: PMC1890556 DOI: 10.1186/1471-2164-8-118] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2006] [Accepted: 05/16/2007] [Indexed: 11/26/2022] Open
Abstract
Background The western African clawed frog Xenopus tropicalis is an anuran amphibian species now used as model in vertebrate comparative genomics. It provides the same advantages as Xenopus laevis but is diploid and has a smaller genome of 1.7 Gbp. Therefore X. tropicalis is more amenable to systematic transcriptome surveys. We initiated a large-scale partial cDNA sequencing project to provide a functional genomics resource on genes expressed in the nervous system during early embryogenesis and metamorphosis in X. tropicalis. Results A gene index was defined and analysed after the collection of over 48,785 high quality sequences. These partial cDNA sequences were obtained from an embryonic head and retina library (30,272 sequences) and from a metamorphic brain and spinal cord library (27,602 sequences). These ESTs are estimated to represent 9,693 transcripts derived from an estimated 6,000 genes. Comparison of these cDNA sequences with protein databases indicates that 46% contain their start codon. Further annotation included Gene Ontology functional classification, InterPro domain analysis, alternative splicing and non-coding RNA identification. Gene expression profiles were derived from EST counts and used to define transcripts specific to metamorphic stages of development. Moreover, these ESTs allowed identification of a set of 225 polymorphic microsatellites that can be used as genetic markers. Conclusion These cDNA sequences permit in silico cloning of numerous genes and will facilitate studies aimed at deciphering the roles of cognate genes expressed in the nervous system during neural development and metamorphosis. The genomic resources developed to study X. tropicalis biology will accelerate exploration of amphibian physiology and genetics. In particular, the model will facilitate analysis of key questions related to anuran embryogenesis and metamorphosis and its associated regulatory processes.
Collapse
Affiliation(s)
- Ana C Fierro
- CNRS UMR 8080, F-91405 Orsay, France
- Univ Paris Sud, F-91405 Orsay, France
- Programme d'Épigénomique, Univ Evry, Tour Évry 2, 10è étage, 523 Terrasses de l'Agora, 91034 Evry cedex, France
| | - Raphaël Thuret
- CNRS UMR 8080, F-91405 Orsay, France
- Univ Paris Sud, F-91405 Orsay, France
| | - Laurent Coen
- CNRS UMR 5166, Evolution des Régulations Endocriniennes, USM 501, Département Régulations, Développement et Diversité Moléculaire, Muséum National d'Histoire Naturelle, 7 rue Cuvier, 75231 Paris Cedex 5, France
| | - Muriel Perron
- CNRS UMR 8080, F-91405 Orsay, France
- Univ Paris Sud, F-91405 Orsay, France
| | - Barbara A Demeneix
- CNRS UMR 5166, Evolution des Régulations Endocriniennes, USM 501, Département Régulations, Développement et Diversité Moléculaire, Muséum National d'Histoire Naturelle, 7 rue Cuvier, 75231 Paris Cedex 5, France
| | - Maurice Wegnez
- CNRS UMR 8080, F-91405 Orsay, France
- Univ Paris Sud, F-91405 Orsay, France
| | - Gabor Gyapay
- Genoscope and CNRS UMR 8030, 2 rue Gaston Crémieux CP5706, 91057 Evry, France
| | - Jean Weissenbach
- Genoscope and CNRS UMR 8030, 2 rue Gaston Crémieux CP5706, 91057 Evry, France
| | - Patrick Wincker
- Genoscope and CNRS UMR 8030, 2 rue Gaston Crémieux CP5706, 91057 Evry, France
| | - André Mazabraud
- CNRS UMR 8080, F-91405 Orsay, France
- Univ Paris Sud, F-91405 Orsay, France
| | - Nicolas Pollet
- CNRS UMR 8080, F-91405 Orsay, France
- Univ Paris Sud, F-91405 Orsay, France
- Programme d'Épigénomique, Univ Evry, Tour Évry 2, 10è étage, 523 Terrasses de l'Agora, 91034 Evry cedex, France
| |
Collapse
|
46
|
Georg RC, Gomes SL. Transcriptome analysis in response to heat shock and cadmium in the aquatic fungus Blastocladiella emersonii. EUKARYOTIC CELL 2007; 6:1053-62. [PMID: 17449658 PMCID: PMC1951522 DOI: 10.1128/ec.00053-07] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The global transcriptional response of the chytridiomycete Blastocladiella emersonii to environmental stress conditions was explored by sequencing a large number of expressed sequence tags (ESTs) from three distinct cDNA libraries, constructed with mRNA extracted from cells exposed to heat shock and different concentrations of cadmium chloride. A total of 6,350 high-quality EST sequences were obtained and assembled into 2,326 putative unigenes, 51% of them not previously described in B. emersonii. To approximately 59% of the unigenes it was possible to assign an orthologue in another organism, whereas 41% of them remained without a putative identification, with transcripts related to protein folding and antioxidant activity being highly enriched in the stress libraries. A microarray chip was constructed encompassing 3,773 distinct ESTs from the B. emersonii transcriptome presently available, which correspond to a wide range of biological processes. Global gene expression analysis of B. emersonii cells exposed to stress conditions revealed a large number of differentially expressed genes: 122 up- and 60 downregulated genes during heat shock and 189 up- and 110 downregulated genes during exposure to cadmium. The main functional categories represented among the upregulated genes were protein folding and proteolysis, proteins with antioxidant properties, and cellular transport. Interestingly, in response to cadmium stress, B. emersonii cells induced genes encoding six different glutathione S-transferases and six distinct metacaspases, as well as genes coding for several proteins of sulfur amino acid metabolism, indicating that cadmium causes oxidative stress and apoptosis in this fungus. All sequences described in this study have been submitted to the GenBank EST section with the accession numbers EE 730389 to EE 736848.
Collapse
Affiliation(s)
- Raphaela C Georg
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Av. Prof. Lineu Prestes 748, 05508-000 São Paulo, Brazil
| | | |
Collapse
|
47
|
Dragulescu-Andrasi A, Rapireddy S, He G, Bhattacharya B, Hyldig-Nielsen JJ, Zon G, Ly DH. Cell-permeable peptide nucleic acid designed to bind to the 5'-untranslated region of E-cadherin transcript induces potent and sequence-specific antisense effects. J Am Chem Soc 2007; 128:16104-12. [PMID: 17165763 DOI: 10.1021/ja063383v] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Establishing a general and effective method for regulating gene expression in mammalian systems is important for many aspects of biological and biomedical research. Herein we report the antisense activities of a cell-permeable, guanidine-based peptide nucleic acid (PNA) called GPNA. We show that a GPNA oligomer designed to bind to the transcriptional start-site of human E-cadherin gene induces potent and sequence-specific antisense effects and is less toxic to the cells than the corresponding PNA-polyarginine conjugate. GPNA confers its silencing effect by blocking protein translation. The findings reported in this study provide a molecular framework for designing the next generation cell-permeable nucleic acid mimics for regulating gene expression in live cells and intact organisms.
Collapse
|
48
|
Abstract
DNA microarrays make it possible, for the first time, to record the complete genomic signals that guide the progression of cellular processes. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment, and drug development. This chapter reviews the first data-driven models that were created from these genome-scale data, through adaptations and generalizations of mathematical frameworks from matrix algebra that have proven successful in describing the physical world, in such diverse areas as mechanics and perception: the singular value decomposition model, the generalized singular value decomposition model comparative model, and the pseudoinverse projection integrative model. These models provide mathematical descriptions of the genetic networks that generate and sense the measured data, where the mathematical variables and operations represent biological reality. The variables, patterns uncovered in the data, correlate with activities of cellular elements such as regulators or transcription factors that drive the measured signals and cellular states where these elements are active. The operations, such as data reconstruction, rotation, and classification in subspaces of selected patterns, simulate experimental observation of only the cellular programs that these patterns represent. These models are illustrated in the analyses of RNA expression data from yeast and human during their cell cycle programs and DNA-binding data from yeast cell cycle transcription factors and replication initiation proteins. Two alternative pictures of RNA expression oscillations during the cell cycle that emerge from these analyses, which parallel well-known designs of physical oscillators, convey the capacity of the models to elucidate the design principles of cellular systems, as well as guide the design of synthetic ones. In these analyses, the power of the models to predict previously unknown biological principles is demonstrated with a prediction of a novel mechanism of regulation that correlates DNA replication initiation with cell cycle-regulated RNA transcription in yeast. These models may become the foundation of a future in which biological systems are modeled as physical systems are today.
Collapse
Affiliation(s)
- Orly Alter
- Department of Biomedical Engineering, Institute for Cellular and Molecular Biology and Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
49
|
Opsal MA, Våge DI, Hayes B, Berget I, Lien S. Genomic organization and transcript profiling of the bovine toll-like receptor gene cluster TLR6-TLR1-TLR10. Gene 2006; 384:45-50. [PMID: 16950576 DOI: 10.1016/j.gene.2006.06.027] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2006] [Revised: 06/12/2006] [Accepted: 06/30/2006] [Indexed: 11/30/2022]
Abstract
Toll-like receptors (TLRs) are a family of recognition receptors playing a crucial role in the innate immune system. Different combinations of TLRs are thought to be crucial for effective immune response, thus insight into the organization and expression of TLRs is important for understanding disease resistance. Mastitis is the most frequent and costly disease in dairy production, and the innate immune system is considered to be important in the first line defence against this disease. In the present paper we have characterized the genomic organization of TLR6-TLR1-TLR10 in a approximately 50 kb region of bovine chromosome 6, including 5'-untranslated exons not previously described. A method for gene expression analysis was developed and used for transcription profiling of the three paralogous genes in different bovine tissues. The expression analysis showed similar expression profiles for TLR1 and TLR6, which indicate a co-regulation of these two genes in cattle. TLR10 had a different expression profile, pointing toward a stronger functional diversification compared to TLR1 and TLR6. The differences in expression are in accordance with the evolutionary history of this gene cluster, where TLR10 diverged from the common ancestral gene before the duplication event that created TLR1 and TLR6.
Collapse
Affiliation(s)
- Monica Aa Opsal
- Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Box 5003, N-1432 Aas, Norway
| | | | | | | | | |
Collapse
|
50
|
Lopato S, Borisjuk L, Milligan AS, Shirley N, Bazanova N, Parsley K, Langridge P. Systematic identification of factors involved in post-transcriptional processes in wheat grain. PLANT MOLECULAR BIOLOGY 2006; 62:637-53. [PMID: 16941218 DOI: 10.1007/s11103-006-9046-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2006] [Accepted: 07/06/2006] [Indexed: 05/11/2023]
Abstract
Post-transcriptional processing of primary transcripts can significantly affect both the quantity and the structure of mature mRNAs and the corresponding protein products. It is an important mechanism of gene regulation in animals, yeast and plants. Here we have investigated the interactive networks of pre-mRNA processing factors in the developing grain of wheat (Triticum aestivum), one of the world's major food staples. As a first step we isolated a homologue of the plant specific AtRSZ33 splicing factor, which has been shown to be involved in the early stages of embryo development in Arabidopsis. Real-time PCR showed that the wheat gene, designated TaRSZ38, is expressed mainly in young, developing organs (flowers, root, stem), and expression peaks in immature grain. In situ hybridization and immunodetection revealed preferential abundance of TaRSZ38 in mitotically active tissues of the major storage organ of the grain, the endosperm. The protein encoded by TaRSZ38 was subsequently used as a starting bait in a two-hybrid screen to identify additional factors in grain that are involved in pre-mRNA processing. Most of the identified proteins showed high homology to known splicing factors and splicing related proteins, supporting a role for TaRSZ38 in spliceosome formation and 5' site selection. Several clones were selected as baits in further yeast two-hybrid screens. In total, cDNAs for 16 proteins were isolated. Among these proteins, TaRSZ22, TaSRp30, TaU1-70K, and the large and small subunits of TaU2AF, are wheat homologues of known plant splicing factors. Several, additional proteins are novel for plants and show homology to known pre-mRNA splicing, splicing related and mRNA export factors from yeast and mammals.
Collapse
Affiliation(s)
- Sergiy Lopato
- Australian Centre for Plant Functional Genomics, The University of Adelaide, PMB1, Glen Osmond, SA 5064, Australia.
| | | | | | | | | | | | | |
Collapse
|