1
|
Shimada MK. Splicing Modulators Are Involved in Human Polyglutamine Diversification via Protein Complexes Shuttling between Nucleus and Cytoplasm. Int J Mol Sci 2023; 24:ijms24119622. [PMID: 37298574 DOI: 10.3390/ijms24119622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/24/2023] [Accepted: 05/30/2023] [Indexed: 06/12/2023] Open
Abstract
Length polymorphisms of polyglutamine (polyQs) in triplet-repeat-disease-causing genes have diversified during primate evolution despite them conferring a risk of human-specific diseases. To explain the evolutionary process of this diversification, there is a need to focus on mechanisms by which rapid evolutionary changes can occur, such as alternative splicing. Proteins that can bind polyQs are known to act as splicing factors and may provide clues about the rapid evolutionary process. PolyQs are also characterized by the formation of intrinsically disordered (ID) regions, so I hypothesized that polyQs are involved in the transportation of various molecules between the nucleus and cytoplasm to regulate mechanisms characteristic of humans such as neural development. To determine target molecules for empirical research to understand the evolutionary change, I explored protein-protein interactions (PPIs) involving the relevant proteins. This study identified pathways related to polyQ binding as hub proteins scattered across various regulatory systems, including regulation via PQBP1, VCP, or CREBBP. Nine ID hub proteins with both nuclear and cytoplasmic localization were found. Functional annotations suggested that ID proteins containing polyQs are involved in regulating transcription and ubiquitination by flexibly changing PPI formation. These findings explain the relationships among splicing complex, polyQ length variations, and modifications in neural development.
Collapse
Affiliation(s)
- Makoto K Shimada
- Center for Medical Science, Fujita Health University, Toyoake 470-1192, Japan
| |
Collapse
|
2
|
Sahoo TR, Vipsita S, Patra S. Complex Prediction in Large PPI Networks Using Expansion and Stripe of Core Cliques. Interdiscip Sci 2022:10.1007/s12539-022-00541-z. [PMID: 36306022 DOI: 10.1007/s12539-022-00541-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 06/16/2023]
Abstract
The widespread availability and importance of large-scale protein-protein interaction (PPI) data demand a flurry of research efforts to understand the organisation of a cell and its functionality by analysing these data at the network level. In the bioinformatics and data mining fields, network clustering acquired a lot of attraction to examine a PPI network's topological and functional aspects. The clustering of PPI networks has been proven to be an excellent method for discovering functional modules, disclosing functions of unknown proteins, and other tasks in numerous research over the last decade. This research proposes a unique graph mining approach to detect protein complexes using dense neighbourhoods (highly connected regions) in an interaction graph. Our technique first finds size-3 cliques associated with each edge (protein interaction), and then these core cliques are expanded to form high-density subgraphs. Loosely connected proteins are stripped out from these subgraphs to produce a potential protein complex. Finally, the redundancy is removed based on the Jaccard coefficient. Computational results are presented on the yeast and human protein interaction dataset to highlight our proposed technique's efficiency. Predicted protein complexes of the proposed approach have a significantly higher score of similarity to those used as gold standards in the CYC-2008 and CORUM benchmark databases than other existing approaches.
Collapse
Affiliation(s)
| | - Swati Vipsita
- CSE, IIIT Bhubaneswar, Gothapatna, Bhubaneswar, Odisha, 751003, India
| | - Sabyasachi Patra
- CSE, IIIT Bhubaneswar, Gothapatna, Bhubaneswar, Odisha, 751003, India
| |
Collapse
|
3
|
Emamjomeh A, Zahiri J, Asadian M, Behmanesh M, Fakheri BA, Mahdevar G. Identification, Prediction and Data Analysis of Noncoding RNAs: A Review. Med Chem 2019; 15:216-230. [PMID: 30484409 DOI: 10.2174/1573406414666181015151610] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2017] [Revised: 06/03/2018] [Accepted: 09/30/2018] [Indexed: 12/13/2022]
Abstract
BACKGROUND Noncoding RNAs (ncRNAs) which play an important role in various cellular processes are important in medicine as well as in drug design strategies. Different studies have shown that ncRNAs are dis-regulated in cancer cells and play an important role in human tumorigenesis. Therefore, it is important to identify and predict such molecules by experimental and computational methods, respectively. However, to avoid expensive experimental methods, computational algorithms have been developed for accurately and fast prediction of ncRNAs. OBJECTIVE The aim of this review was to introduce the experimental and computational methods to identify and predict ncRNAs structure. Also, we explained the ncRNA's roles in cellular processes and drugs design, briefly. METHOD In this survey, we will introduce ncRNAs and their roles in biological and medicinal processes. Then, some important laboratory techniques will be studied to identify ncRNAs. Finally, the state-of-the-art models and algorithms will be introduced along with important tools and databases. RESULTS The results showed that the integration of experimental and computational approaches improves to identify ncRNAs. Moreover, the high accurate databases, algorithms and tools were compared to predict the ncRNAs. CONCLUSION ncRNAs prediction is an exciting research field, but there are different difficulties. It requires accurate and reliable algorithms and tools. Also, it should be mentioned that computational costs of such algorithm including running time and usage memory are very important. Finally, some suggestions were presented to improve computational methods of ncRNAs gene and structural prediction.
Collapse
Affiliation(s)
- Abbasali Emamjomeh
- Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Plant Breeding and Biotechnology (PBB), University of Zabol, Zabol, Iran
| | - Javad Zahiri
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Mehrdad Asadian
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Mehrdad Behmanesh
- Department of Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Barat A Fakheri
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Ghasem Mahdevar
- Department of Mathematics, Faculty of Sciences, University of Isfahan, Isfahan, Iran
| |
Collapse
|
4
|
Yang Y, Junjie P, Sanjun C, Ma Y. Long non-coding RNAs in Colorectal Cancer: Progression and Future Directions. J Cancer 2017; 8:3212-3225. [PMID: 29158793 PMCID: PMC5665037 DOI: 10.7150/jca.19794] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 08/29/2017] [Indexed: 12/25/2022] Open
Abstract
Identification of the colorectal adenoma-carcinoma sequence with its corresponding genetic and epigenetic alterations has significantly increased our knowledge of the etiopathogenesis of colorectal cancer (CRC). However, the molecular mechanisms of colorectal carcinogenesis and metastasis haven't been clearly elucidated. Long non-coding ribonucleic acids (lncRNAs) are key participants of gene regulations rather than “noises”. Accumulative studies have implicated that the aberrant expressions of lncRNAs are tightly corelated to CRC screening, diagnosis, prognosis and therapeutic outcomes. Our review focuses on recent findings on the involvement of lncRNAs in CRC oncogenesis and the lncRNA-based clinical implications in patients with CRC.
Collapse
Affiliation(s)
- Yongzhi Yang
- Department of Colorectal Surgery, Fudan University Shanghai Cancer Center, Shanghai, 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China
| | - Peng Junjie
- Department of Colorectal Surgery, Fudan University Shanghai Cancer Center, Shanghai, 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China
| | - Cai Sanjun
- Department of Colorectal Surgery, Fudan University Shanghai Cancer Center, Shanghai, 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China
| | - Yanlei Ma
- Department of Colorectal Surgery, Fudan University Shanghai Cancer Center, Shanghai, 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, 200032, China
| |
Collapse
|
5
|
Jia S, Gao L, Gao Y, Nastos J, Wen X, Zhang X, Wang H. Exploring triad-rich substructures by graph-theoretic characterizations in complex networks. PHYSICA A: STATISTICAL MECHANICS AND ITS APPLICATIONS 2017; 468:53-69. [DOI: 10.1016/j.physa.2016.10.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
6
|
Abstract
Many publicly available data repositories and resources have been developed to support protein-related information management, data-driven hypothesis generation, and biological knowledge discovery. To help researchers quickly find the appropriate protein-related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era.
Collapse
Affiliation(s)
- Chuming Chen
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA.
| | - Hongzhan Huang
- Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, 19711, USA
| | - Cathy H Wu
- Center for Bioinformatics and Computational Biology, Department of Computer and Information Sciences, University of Delaware, Newark, DE, 19711, USA
- Protein Information Resource, Department of Biochemistry and Molecular and Cellular Biology, Georgetown University Medical Center, Washington, DC, 20007, USA
| |
Collapse
|
7
|
Yuan J, Yue H, Zhang M, Luo J, Liu L, Wu W, Xiao T, Chen X, Chen X, Zhang D, Xing R, Tong X, Wu N, Zhao J, Lu Y, Guo M, Chen R. Transcriptional profiling analysis and functional prediction of long noncoding RNAs in cancer. Oncotarget 2016; 7:8131-42. [PMID: 26812883 PMCID: PMC4884981 DOI: 10.18632/oncotarget.6993] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Accepted: 01/01/2016] [Indexed: 12/21/2022] Open
Abstract
Long noncoding RNAs (lncRNAs), which are noncoding RNAs (ncRNAs) with length more than 200 nucleotides (nt), have been demonstrated to be involved in various types of cancer. Consequently, it has been frequently discussed that lncRNAs with aberrant expression in cancer serve as potential diagnostic biomarkers and therapeutic targets. However, one major challenge of developing cancer biomarkers is tumor heterogeneity which means that tumor cells show different cellular morphology, metastatic potential as well as gene expression. In this study, a custom designed microarray platform covering both mRNAs and lncRNAs was applied to tumor tissues of gastric, colon, liver and lung. 316 and 157 differentially expressed (DE-) protein coding genes and lncRNAs common to these four types of cancer were identified respectively. Besides, the functional roles of common DE-lncRNAs were inferred based on their expression and genomic position correlation with mRNAs. Moreover, mRNAs and lncRNAs with tissue specificity were also identified, suggesting their particular roles with regard to specific biogenesis and functions of different organs. Based on the large-scale survey of mRNAs and lncRNAs in four types of cancer, this study may offer new biomarkers common or specific for various types of cancer.
Collapse
Affiliation(s)
- Jiao Yuan
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haiyan Yue
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Meiying Zhang
- Department of Gastroenterology and Hepatology, Chinese PLA General Hospital, Beijing 100853, China
| | - Jianjun Luo
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Lihui Liu
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Wu
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tengfei Xiao
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaowei Chen
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaomin Chen
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Dongdong Zhang
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Rui Xing
- Laboratory of Molecular Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital and Institute, Beijing 100142, China
| | - Xin Tong
- PLA General Hospital Cancer Center Key Laboratory, Medical School of Chinese PLA, Beijing 100853, China
| | - Nan Wu
- Laboratory of Molecular Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital and Institute, Beijing 100142, China
| | - Jian Zhao
- PLA General Hospital Cancer Center Key Laboratory, Medical School of Chinese PLA, Beijing 100853, China.,International Joint Cancer Institute, The Second Military Medical University, Shanghai 200433, China
| | - Youyong Lu
- Laboratory of Molecular Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital and Institute, Beijing 100142, China
| | - Mingzhou Guo
- Department of Gastroenterology and Hepatology, Chinese PLA General Hospital, Beijing 100853, China
| | - Runsheng Chen
- Key Laboratory of RNA Biology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,Beijing Key Laboratory of Noncoding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
8
|
de Rooy DP, Tsonaka R, Andersson ML, Forslind K, Zhernakova A, Frank-Bertoncelj M, de Kovel CG, Koeleman BP, van der Heijde DM, Huizinga TW, Toes RE, Houwing-Duistermaat JJ, Ospelt C, Svensson B, van der Helm-van Mil AH. Genetic Factors for the Severity of ACPA-negative Rheumatoid Arthritis in 2 Cohorts of Early Disease: A Genome-wide Study. J Rheumatol 2015; 42:1383-91. [DOI: 10.3899/jrheum.140741] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/24/2015] [Indexed: 11/22/2022]
Abstract
Objective.Rheumatoid arthritis (RA) that is negative for anticitrullinated protein antibodies (ACPA) is a subentity of RA, characterized by less severe disease. At the individual level, however, considerable differences in the severity of joint destruction occur. We performed a study on genetic factors underlying the differences in joint destruction in ACPA-negative patients.Methods.A genome-wide association study was done with 262 ACPA-negative patients with early RA included in the Leiden Early Arthritis Clinic and related to radiographic joint destruction over 7 years. Significant single-nucleotide polymorphisms (SNP) were evaluated for association with progression of radiographic joint destruction in 253 ACPA-negative patients with early RA included in the Better Anti-Rheumatic Farmaco Therapy (BARFOT) study. According to the Bonferroni correction of the number of tested SNP, the threshold for significance was p < 2 × 10−7 in phase 1 and 0.0045 in phase 2. In both cohorts, joint destruction was measured by Sharp/van der Heijde method with good reproducibility.Results.Thirty-three SNP associated with severity of joint destruction (p < 2 × 10−7) in phase 1. In phase 2, rs2833522 (p = 0.0049) showed borderline significance. A combined analysis of both the Leiden and BARFOT datasets of rs2833522 confirmed this association with joint destruction (p = 3.57 × 10−9); the minor allele (A) associated with more severe damage (for instance, after 7 yrs followup, patients carrying AA had 1.22 times more joint damage compared to patients carrying AG and 1.50 times more joint damage than patients carrying GG). In silico analysis using the ENCODE and Ensembl databases showed presence of H3K4me3 histone mark, transcription factors, and long noncoding RNA in the region of rs2833522, an intergenic SNP located between HUNK and SCAF4.Conclusion.Rs2833522 might be associated with the severity of joint destruction in ACPA-negative RA.
Collapse
|
9
|
DeBoever C, Ghia EM, Shepard PJ, Rassenti L, Barrett CL, Jepsen K, Jamieson CHM, Carson D, Kipps TJ, Frazer KA. Transcriptome sequencing reveals potential mechanism of cryptic 3' splice site selection in SF3B1-mutated cancers. PLoS Comput Biol 2015; 11:e1004105. [PMID: 25768983 PMCID: PMC4358997 DOI: 10.1371/journal.pcbi.1004105] [Citation(s) in RCA: 169] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 12/29/2014] [Indexed: 01/12/2023] Open
Abstract
Mutations in the splicing factor SF3B1 are found in several cancer types and have been associated with various splicing defects. Using transcriptome sequencing data from chronic lymphocytic leukemia, breast cancer and uveal melanoma tumor samples, we show that hundreds of cryptic 3’ splice sites (3’SSs) are used in cancers with SF3B1 mutations. We define the necessary sequence context for the observed cryptic 3’ SSs and propose that cryptic 3’SS selection is a result of SF3B1 mutations causing a shift in the sterically protected region downstream of the branch point. While most cryptic 3’SSs are present at low frequency (<10%) relative to nearby canonical 3’SSs, we identified ten genes that preferred out-of-frame cryptic 3’SSs. We show that cancers with mutations in the SF3B1 HEAT 5-9 repeats use cryptic 3’SSs downstream of the branch point and provide both a mechanistic model consistent with published experimental data and affected targets that will guide further research into the oncogenic effects of SF3B1 mutation. A key goal of cancer genomics studies is to identify genes that are recurrently mutated at a rate above background and likely contribute to cancer development. Many such recurrently mutated genes have been identified over the last few years, but we often do not know the underlying mechanisms by which they contribute to cancer growth. Unexpectedly, several genes in the spliceosome, the collection of RNAs and proteins that remove introns from transcribed RNAs, are recurrently mutated in different cancers. Here, we have examined mutations in the splicing factor SF3B1, a key component of the spliceosome, and identified a global splicing defect present in different cancers with SF3B1 mutations by comparing the expression of splice junctions using generalized linear models. While prior studies have reported a limited number of aberrant splicing events in SF3B1-mutated cancers, we have established that SF3B1 mutations are associated with usage of hundreds of atypical splice sites at the 3’ end of the intron. We have identified nucleotide sequence requirements for these cryptic splice sites that are consistent with a proposed mechanistic model. These findings greatly expand our understanding of the effect of SF3B1 mutations on splicing and provide new targets for determining the oncogenic effect of SF3B1 mutations.
Collapse
Affiliation(s)
- Christopher DeBoever
- Bioinformatics and Systems Biology, University of California San Diego, La Jolla, California, United States of America
| | - Emanuela M. Ghia
- Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Peter J. Shepard
- Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
- Department of Pediatrics and Rady Children's Hospital, University of California San Diego, La Jolla, California, United States of America
| | - Laura Rassenti
- Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Christian L. Barrett
- Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
- Department of Pediatrics and Rady Children's Hospital, University of California San Diego, La Jolla, California, United States of America
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Kristen Jepsen
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Catriona H. M. Jamieson
- Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
- Sanford Consortium for Regenerative Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Dennis Carson
- Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
- Sanford Consortium for Regenerative Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Thomas J. Kipps
- Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Kelly A. Frazer
- Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
- Department of Pediatrics and Rady Children's Hospital, University of California San Diego, La Jolla, California, United States of America
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
10
|
Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat Genet 2014; 46:558-66. [PMID: 24777452 DOI: 10.1038/ng.2965] [Citation(s) in RCA: 209] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 04/02/2014] [Indexed: 12/12/2022]
Abstract
The importance of microRNAs and long noncoding RNAs in the regulation of pluripotency has been documented; however, the noncoding components of stem cell gene networks remain largely unknown. Here we investigate the role of noncoding RNAs in the pluripotent state, with particular emphasis on nuclear and retrotransposon-derived transcripts. We have performed deep profiling of the nuclear and cytoplasmic transcriptomes of human and mouse stem cells, identifying a class of previously undetected stem cell-specific transcripts. We show that long terminal repeat (LTR)-derived transcripts contribute extensively to the complexity of the stem cell nuclear transcriptome. Some LTR-derived transcripts are associated with enhancer regions and are likely to be involved in the maintenance of pluripotency.
Collapse
|
11
|
Bolton KA, Ross JP, Grice DM, Bowden NA, Holliday EG, Avery-Kiejda KA, Scott RJ. STaRRRT: a table of short tandem repeats in regulatory regions of the human genome. BMC Genomics 2013; 14:795. [PMID: 24228761 PMCID: PMC3840602 DOI: 10.1186/1471-2164-14-795] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2013] [Accepted: 11/05/2013] [Indexed: 11/22/2022] Open
Abstract
Background Tandem repeats (TRs) are unstable regions commonly found within genomes that have consequences for evolution and disease. In humans, polymorphic TRs are known to cause neurodegenerative and neuromuscular disorders as well as being associated with complex diseases such as diabetes and cancer. If present in upstream regulatory regions, TRs can modify chromatin structure and affect transcription; resulting in altered gene expression and protein abundance. The most common TRs are short tandem repeats (STRs), or microsatellites. Promoter located STRs are considerably more polymorphic than coding region STRs. As such, they may be a common driver of phenotypic variation. To study STRs located in regulatory regions, we have performed genome-wide analysis to identify all STRs present in a region that is 2 kilobases upstream and 1 kilobase downstream of the transcription start sites of genes. Results The Short Tandem Repeats in Regulatory Regions Table, STaRRRT, contains the results of the genome-wide analysis, outlining the characteristics of 5,264 STRs present in the upstream regulatory region of 4,441 human genes. Gene set enrichment analysis has revealed significant enrichment for STRs in cellular, transcriptional and neurological system gene promoters and genes important in ion and calcium homeostasis. The set of enriched terms has broad similarity to that seen in coding regions, suggesting that regulatory region STRs are subject to similar evolutionary pressures as STRs in coding regions and may, like coding region STRs, have an important role in controlling gene expression. Conclusions STaRRRT is a readily-searchable resource for investigating potentially polymorphic STRs that could influence the expression of any gene of interest. The processes and genes enriched for regulatory region STRs provide potential novel targets for diagnosing and treating disease, and support a role for these STRs in the evolution of the human genome.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Rodney J Scott
- Centre for Information-Based Medicine, Hunter Medical Research Institute, Newcastle, NSW, Australia.
| |
Collapse
|
12
|
Marx H, Lemeer S, Klaeger S, Rattei T, Kuster B. MScDB: a mass spectrometry-centric protein sequence database for proteomics. J Proteome Res 2013; 12:2386-98. [PMID: 23627461 DOI: 10.1021/pr400215r] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein sequence databases are indispensable tools for life science research including mass spectrometry (MS)-based proteomics. In current database construction processes, sequence similarity clustering is used to reduce redundancies in the source data. Albeit powerful, it ignores the peptide-centric nature of proteomic data and the fact that MS is able to distinguish similar sequences. Therefore, we introduce an approach that structures the protein sequence space at the peptide level using theoretical and empirical information from large-scale proteomic data to generate a mass spectrometry-centric protein sequence database (MScDB). The core modules of MScDB are an in-silico proteolytic digest and a peptide-centric clustering algorithm that groups protein sequences that are indistinguishable by mass spectrometry. Analysis of various MScDB uses cases against five complex human proteomes, resulting in 69 peptide identifications not present in UniProtKB as well as 79 putative single amino acid polymorphisms. MScDB retains ~99% of the identifications in comparison to common databases despite a 3-48% increase in the theoretical peptide search space (but comparable protein sequence space). In addition, MScDB enables cross-species applications such as human/mouse graft models, and our results suggest that the uncertainty in protein assignments to one species can be smaller than 20%.
Collapse
Affiliation(s)
- Harald Marx
- Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354 Freising, Germany
| | | | | | | | | |
Collapse
|
13
|
Hong S, Chen X, Jin L, Xiong M. Canonical correlation analysis for RNA-seq co-expression networks. Nucleic Acids Res 2013; 41:e95. [PMID: 23460206 PMCID: PMC3632131 DOI: 10.1093/nar/gkt145] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Digital transcriptome analysis by next-generation sequencing discovers substantial mRNA variants. Variation in gene expression underlies many biological processes and holds a key to unravelling mechanism of common diseases. However, the current methods for construction of co-expression networks using overall gene expression are originally designed for microarray expression data, and they overlook a large number of variations in gene expressions. To use information on exon, genomic positional level and allele-specific expressions, we develop novel component-based methods, single and bivariate canonical correlation analysis, for construction of co-expression networks with RNA-seq data. To evaluate the performance of our methods for co-expression network inference with RNA-seq data, they are applied to lung squamous cell cancer expression data from TCGA database and our bipolar disorder and schizophrenia RNA-seq study. The preliminary results demonstrate that the co-expression networks constructed by canonical correlation analysis and RNA-seq data provide rich genetic and molecular information to gain insight into biological processes and disease mechanism. Our new methods substantially outperform the current statistical methods for co-expression network construction with microarray expression data or RNA-seq data based on overall gene expression levels.
Collapse
Affiliation(s)
- Shengjun Hong
- State Key Laboratory of Genetic Engineering and MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai 200433, China
| | | | | | | |
Collapse
|
14
|
Dayem Ullah AZ, Lemoine NR, Chelala C. A practical guide for the functional annotation of genetic variations using SNPnexus. Brief Bioinform 2013; 14:437-47. [PMID: 23395730 DOI: 10.1093/bib/bbt004] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Broader functional annotation of known as well as putative genetic variations is a valuable mean for prioritizing targets in disease studies and large-scale genotyping projects. In this article, we present a practical guide to SNPnexus, a web-based tool that provides an aggregate set of functional annotations for genomic variation data by characterizing related consequences at the transcriptome/proteome levels with in-depth analysis of potential deleterious effects, inferring physical and cytogenetic mapping, reporting related HapMap data, finding overlaps with potential regulatory, structural as well as conserved elements and retrieving links with previously reported genetic disease studies. We focus on the SNPnexus query system, its annotation categories and the biological interpretation of results.
Collapse
Affiliation(s)
- Abu Z Dayem Ullah
- Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | | | | |
Collapse
|
15
|
Wijaya E, Frith MC, Horton P, Asai K. Finding protein-coding genes through human polymorphisms. PLoS One 2013; 8:e54210. [PMID: 23349826 PMCID: PMC3551959 DOI: 10.1371/journal.pone.0054210] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 12/10/2012] [Indexed: 11/29/2022] Open
Abstract
Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.
Collapse
Affiliation(s)
- Edward Wijaya
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Japan.
| | | | | | | |
Collapse
|
16
|
Abstract
Long non-coding RNA have emerged as an increasingly well studied subset of non-coding RNAs (ncRNAs) following their recent discovery in a number of organisms including humans and characterization of their functional and regulatory roles in variety of distinct cellular mechanisms. The recent annotations of long ncRNAs in humans peg their numbers as similar to protein-coding genes. However, despite the rapid advancements in the field the functional characterization and biological roles of most of the long ncRNAs still remain unidentified, although some candidate long ncRNAs have been extensively studied for their roles in cancers and biological phenomena such as X-inactivation and epigenetic regulation of genes. A number of recent reports suggest an exciting possibility of long ncRNAs mediating host response and immune function, suggesting an elaborate network of regulatory interactions mediated through ncRNAs in infection. The present role of long ncRNAs in host-pathogen cross talk is limited to a handful of mechanistically distinct examples. The current commentary chronicles the findings of these reports on the role of long ncRNAs in infection biology and further highlights the bottlenecks and future directions toward understanding the biological significance of the role of long ncRNAs in infection biology.
Collapse
Affiliation(s)
- Vinod Scaria
- GN Ramachandran Knowledge Center for Genome Informatics, Institute of Genomics and Integrative Biology, Council of Scientific and Industrial Research Delhi, India
| | | |
Collapse
|
17
|
Ross JP, Shaw JM, Molloy PL. Identification of differentially methylated regions using streptavidin bisulfite ligand methylation enrichment (SuBLiME), a new method to enrich for methylated DNA prior to deep bisulfite genomic sequencing. Epigenetics 2012; 8:113-27. [PMID: 23257838 PMCID: PMC3549874 DOI: 10.4161/epi.23330] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
We have developed a method that enriches for methylated cytosines by capturing the fraction of bisulfite-treated DNA with unconverted cytosines. The method, called streptavidin bisulfite ligand methylation enrichment (SuBLiME), involves the specific labeling (using a biotin-labeled nucleotide ligand) of methylated cytosines in bisulfite-converted DNA. This step is then followed by affinity capture, using streptavidin-coupled magnetic beads. SuBLiME is highly adaptable and can be combined with deep sequencing library generation and/or genomic complexity-reduction. In this pilot study, we enriched methylated DNA from Csp6I-cut complexity-reduced genomes of colorectal cancer cell lines (HCT-116, HT-29 and SW-480) and normal blood leukocytes with the aim of discovering colorectal cancer biomarkers. Enriched libraries were sequenced with SOLiD-3 technology. In pairwise comparisons, we scored a total of 1,769 gene loci and 33 miRNA loci as differentially methylated between the cell lines and leukocytes. Of these, 516 loci were differently methylated in at least two promoter-proximal CpG sites over two discrete Csp6I fragments. Identified methylated gene loci were associated with anatomical development, differentiation and cell signaling. The data correlated with good agreement to a number of published colorectal cancer DNA methylation biomarkers and genomic data sets. SuBLiME is effective in the enrichment of methylated nucleic acid and in the detection of known and novel biomarkers.
Collapse
Affiliation(s)
- Jason P Ross
- Preventative Health National Research Flagship, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Sydney, NSW, Australia.
| | | | | |
Collapse
|
18
|
Imanishi T, Nagai Y, Habara T, Yamasaki C, Takeda JI, Mikami S, Bando Y, Tojo H, Nishimura T. Full-length Transcriptome-based H-InvDB Throws a New Light on Chromosome-centric Proteomics. J Proteome Res 2012; 12:62-6. [DOI: 10.1021/pr300861a] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Tadashi Imanishi
- Biomedicinal Information Research
Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- Department of Molecular Life
Science, Division of Basic Medical Science and Molecular Medicine, Tokai University, School of Medicine, Kanagawa, Japan
| | - Yoko Nagai
- Biomedicinal Information Research
Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Takuya Habara
- Biomedicinal Information Research
Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Chisato Yamasaki
- Biomedicinal Information Research
Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Jun-ichi Takeda
- Biomedicinal Information Research
Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | | | | | - Hiromasa Tojo
- Department of Biophysics and Biochemistry, Osaka University, Graduate School of Medicine, Osaka,
Japan
| | - Toshihide Nishimura
- Biosys Technologies, Inc., Tokyo, Japan
- Department of Surgery
I, Tokyo Medical University, Tokyo, Japan
| |
Collapse
|
19
|
Kikugawa S, Nishikata K, Murakami K, Sato Y, Suzuki M, Altaf-Ul-Amin M, Kanaya S, Imanishi T. PCDq: human protein complex database with quality index which summarizes different levels of evidences of protein complexes predicted from h-invitational protein-protein interactions integrative dataset. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 2:S7. [PMID: 23282181 PMCID: PMC3521179 DOI: 10.1186/1752-0509-6-s2-s7] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Background Proteins interact with other proteins or biomolecules in complexes to perform cellular functions. Existing protein-protein interaction (PPI) databases and protein complex databases for human proteins are not organized to provide protein complex information or facilitate the discovery of novel subunits. Data integration of PPIs focused specifically on protein complexes, subunits, and their functions. Predicted candidate complexes or subunits are also important for experimental biologists. Description Based on integrated PPI data and literature, we have developed a human protein complex database with a complex quality index (PCDq), which includes both known and predicted complexes and subunits. We integrated six PPI data (BIND, DIP, MINT, HPRD, IntAct, and GNP_Y2H), and predicted human protein complexes by finding densely connected regions in the PPI networks. They were curated with the literature so that missing proteins were complemented and some complexes were merged, resulting in 1,264 complexes comprising 9,268 proteins with 32,198 PPIs. The evidence level of each subunit was assigned as a categorical variable. This indicated whether it was a known subunit, and a specific function was inferable from sequence or network analysis. To summarize the categories of all the subunits in a complex, we devised a complex quality index (CQI) and assigned it to each complex. We examined the proportion of consistency of Gene Ontology (GO) terms among protein subunits of a complex. Next, we compared the expression profiles of the corresponding genes and found that many proteins in larger complexes tend to be expressed cooperatively at the transcript level. The proportion of duplicated genes in a complex was evaluated. Finally, we identified 78 hypothetical proteins that were annotated as subunits of 82 complexes, which included known complexes. Of these hypothetical proteins, after our prediction had been made, four were reported to be actual subunits of the assigned protein complexes. Conclusions We constructed a new protein complex database PCDq including both predicted and curated human protein complexes. CQI is a useful source of experimentally confirmed information about protein complexes and subunits. The predicted protein complexes can provide functional clues about hypothetical proteins. PCDq is freely available at http://h-invitational.jp/hinv/pcdq/.
Collapse
Affiliation(s)
- Shingo Kikugawa
- Integrated Databases and Systems Biology Team, Biological Information Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Takeda JI, Yamasaki C, Murakami K, Nagai Y, Sera M, Hara Y, Obi N, Habara T, Gojobori T, Imanishi T. H-InvDB in 2013: an omics study platform for human functional gene and transcript discovery. Nucleic Acids Res 2012. [PMID: 23197657 PMCID: PMC3531145 DOI: 10.1093/nar/gks1245] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
H-InvDB (http://www.h-invitational.jp/) is a comprehensive human gene database started in 2004. In the latest version, H-InvDB 8.0, a total of 244 709 human complementary DNA was mapped onto the hg19 reference genome and 43 829 gene loci, including nonprotein-coding ones, were identified. Of these loci, 35 631 were identified as potential protein-coding genes, and 22 898 of these were identical to known genes. In our analysis, 19 309 annotated genes were specific to H-InvDB and not found in RefSeq and Ensembl. In fact, 233 genes of the 19 309 turned out to have protein functions in this version of H-InvDB; they were annotated as unknown protein functions in the previous version. Furthermore, 11 genes were identified as known Mendelian disorder genes. It is advantageous that many biologically functional genes are hidden in the H-InvDB unique genes. As large-scale proteomic projects have been conducted to elucidate the functions of all human proteins, we have enhanced the proteomic information with an advanced protein view and new subdatabase of protein complexes (Protein Complex Database with quality index). We propose that H-InvDB is an important resource for finding novel candidate targets for medical care and drug development.
Collapse
Affiliation(s)
- Jun-Ichi Takeda
- Integrated Database and Systems Biology Team, Biomedicinal Information Research Center, National Institute of Advanced Industrial Science and Technology, Aomi 2-4-7, Koto-ku, Tokyo 135-0064, Japan
| | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Belinky F, Bahir I, Stelzer G, Zimmerman S, Rosen N, Nativ N, Dalah I, Iny Stein T, Rappaport N, Mituyama T, Safran M, Lancet D. Non-redundant compendium of human ncRNA genes in GeneCards. ACTA ACUST UNITED AC 2012; 29:255-61. [PMID: 23172862 DOI: 10.1093/bioinformatics/bts676] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Non-coding RNA (ncRNA) genes are increasingly acknowledged for their importance in the human genome. However, there is no comprehensive non-redundant database for all such human genes. RESULTS We leveraged the effective platform of GeneCards, the human gene compendium, together with the power of fRNAdb and additional primary sources, to judiciously unify all ncRNA gene entries obtainable from 15 different primary sources. Overlapping entries were clustered to unified locations based on an algorithm employing genomic coordinates. This allowed GeneCards' gamut of relevant entries to rise ∼5-fold, resulting in ∼80,000 human non-redundant ncRNAs, belonging to 14 classes. Such 'grand unification' within a regularly updated data structure will assist future ncRNA research. AVAILABILITY AND IMPLEMENTATION All of these non-coding RNAs are included among the ∼122,500 entries in GeneCards V3.09, along with pertinent annotation, automatically mined by its built-in pipeline from 100 data sources. This information is available at www.genecards.org. CONTACT Frida.Belinky@weizmann.ac.il SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Frida Belinky
- Department of Molecular Genetics, The Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Guo X, Zhang Y, Wang P, Li T, Fu W, Mo X, Shi T, Zhang Z, Chen Y, Ma D, Han W. VSTM1-v2, a novel soluble glycoprotein, promotes the differentiation and activation of Th17 cells. Cell Immunol 2012; 278:136-42. [PMID: 22960280 DOI: 10.1016/j.cellimm.2012.07.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2012] [Revised: 07/06/2012] [Accepted: 07/30/2012] [Indexed: 12/24/2022]
Abstract
Cytokines are soluble proteins that mediate immune reactions and are responsible for communication among immune cells. CD4(+) T cells are the principle sources of cytokines of adaptive immunity. Cytokines play critical roles in the differentiation and effector function of CD4(+) T cells. They also play key roles in diseases, and some of them have been developed into drugs in the forms of recombinant cytokines, soluble receptors and neutralizing antibodies. Therefore, identifying novel potential cytokines is necessary and beneficial for better understanding immunology and enhancing human health. To find novel potential cytokines, we carried out an integrated bioinformatics analysis on the whole human genome. Cytokine candidates were selected for cDNA cloning, sub-cloning, secretion verification, expression profile analysis and functional study. Here, we report a novel soluble protein, VSTM1-v2, which is a classical secretory glycoprotein mainly expressed in immune tissues, and can promote the differentiation and activation of Th17 cells.
Collapse
Affiliation(s)
- Xiaohuan Guo
- Peking University Center for Human Disease Genomics, Department of Immunology, Key Laboratory of Medical Immunology, Ministry of Health, School of Basic Medical Sciences, Peking University Health Science Center, 38 Xueyuan Road, Beijing 100191, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Sirota FL, Batagov A, Schneider G, Eisenhaber B, Eisenhaber F, Maurer-Stroh S. Beware of moving targets: reference proteome content fluctuates substantially over the years. J Bioinform Comput Biol 2012; 10:1250020. [PMID: 22867629 DOI: 10.1142/s0219720012500205] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Reference proteomes are generated by increasingly sophisticated annotation pipelines as part of regular genome build releases; yet, the corresponding changes in reference proteomes' content are dramatic. In the history of the NCBI-curated human proteome, the total number of entries has remained roughly constant but approximately half of the proteins from the 2003 build 33 are no longer represented by entries in current releases, while about the same number of new proteins have been added (for sequence identity thresholds 50-90%). Although mostly hypothetical proteins are affected, there are also spectacular cases of entry removal/addition of well studied proteins. The changes between the 2003 and recent human proteomes are in a similar order of magnitude as the differences between recent human and chimpanzee proteome releases. As an application example, we show that the proteome fluctuations affect the interpretation (about 74% of hits) of organelle-specific mass-spectrometry data. Although proteome quality tends to improve with more recent releases as, for example, the fraction of proteins with functional annotation has increased over time, existing evidence implies that, apparently, the proteome content still remains incomplete, not just pertaining to isoforms/sequence variants but also to proteins and their families that are clearly distinct.
Collapse
Affiliation(s)
- Fernanda L Sirota
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore.
| | | | | | | | | | | |
Collapse
|
24
|
Evolutionary growth process of highly conserved sequences in vertebrate genomes. Gene 2012; 504:1-5. [PMID: 22580082 DOI: 10.1016/j.gene.2012.05.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2011] [Revised: 04/27/2012] [Accepted: 05/02/2012] [Indexed: 11/22/2022]
Abstract
Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage.
Collapse
|
25
|
Krzyzanowski PM, Muro EM, Andrade-Navarro MA. Computational approaches to discovering noncoding RNA. WILEY INTERDISCIPLINARY REVIEWS-RNA 2012; 3:567-79. [PMID: 22555938 DOI: 10.1002/wrna.1121] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
New developments are being brought to the field of molecular biology with the mounting evidence that RNA transcripts not translated into protein (noncoding RNAs, ncRNAs) hold a variety of biological functions. Computational discovery of ncRNAs is one of these developments, fueled not only by the urge to characterize these sequences but also by necessity to prioritize ones with the most relevant functions for experimental verification. The heterogeneity in size and mode of activity of ncRNAs is reflected in the corresponding diversity of computational methods for their study. Sequence and structural analysis, conservation across species, and relative position to other genomic elements are being used for ncRNA detection. In addition, the recent development of techniques that allow deep sequencing of cell transcripts either globally or from isolated ncRNA-related material is leading the field toward increased use of such high-throughput data. We expect that imminent breakthroughs will include the classification of newer types of ncRNA and new insights into miRNA and piRNA biology, eventually leading toward the completion of a catalog of all human ncRNAs.
Collapse
|
26
|
Dayem Ullah AZ, Lemoine NR, Chelala C. SNPnexus: a web server for functional annotation of novel and publicly known genetic variants (2012 update). Nucleic Acids Res 2012; 40:W65-70. [PMID: 22544707 PMCID: PMC3394262 DOI: 10.1093/nar/gks364] [Citation(s) in RCA: 134] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Broader functional annotation of single nucleotide variations is a valuable mean for prioritizing targets in further disease studies and large-scale genotyping projects. We originally developed SNPnexus to assess the potential significance of known and novel SNPs on the major transcriptome, proteome, regulatory and structural variation models in order to identify the phenotypically important variants. Being committed to providing continuous support to the scientific community, we have substantially improved SNPnexus over time by incorporating a broader range of variations such as insertions/deletions, block substitutions, IUPAC codes submission and region-based analysis, expanding the query size limit, and most importantly including additional categories for the assessment of functional impact. SNPnexus provides a comprehensive set of annotations for genomic variation data by characterizing related functional consequences at the transcriptome/proteome levels of seven major annotation systems with in-depth analysis of potential deleterious effects, inferring physical and cytogenetic mapping, reporting information on HapMap genotype/allele data, finding overlaps with potential regulatory elements, structural variations and conserved elements, and retrieving links with previously reported genetic disease studies. SNPnexus has a user-friendly web interface with an improved query structure, enhanced functional annotation categories and flexible output presentation making it practically useful for biologists. SNPnexus is freely available at http://www.snp-nexus.org.
Collapse
Affiliation(s)
- Abu Z Dayem Ullah
- Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | | | | |
Collapse
|
27
|
Bu D, Yu K, Sun S, Xie C, Skogerbø G, Miao R, Xiao H, Liao Q, Luo H, Zhao G, Zhao H, Liu Z, Liu C, Chen R, Zhao Y. NONCODE v3.0: integrative annotation of long noncoding RNAs. Nucleic Acids Res 2011; 40:D210-5. [PMID: 22135294 PMCID: PMC3245065 DOI: 10.1093/nar/gkr1175] [Citation(s) in RCA: 302] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Facilitated by the rapid progress of high-throughput sequencing technology, a large number of long noncoding RNAs (lncRNAs) have been identified in mammalian transcriptomes over the past few years. LncRNAs have been shown to play key roles in various biological processes such as imprinting control, circuitry controlling pluripotency and differentiation, immune responses and chromosome dynamics. Notably, a growing number of lncRNAs have been implicated in disease etiology. With the increasing number of published lncRNA studies, the experimental data on lncRNAs (e.g. expression profiles, molecular features and biological functions) have accumulated rapidly. In order to enable a systematic compilation and integration of this information, we have updated the NONCODE database (http://www.noncode.org) to version 3.0 to include the first integrated collection of expression and functional lncRNA data obtained from re-annotated microarray studies in a single database. NONCODE has a user-friendly interface with a variety of search or browse options, a local Genome Browser for visualization and a BLAST server for sequence-alignment search. In addition, NONCODE provides a platform for the ongoing collation of ncRNAs reported in the literature. All data in NONCODE are open to users, and can be downloaded through the website or obtained through the SOAP API and DAS services.
Collapse
Affiliation(s)
- Dechao Bu
- Bioinformatics Research Group, Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, PR China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Taniya T, Tanaka S, Yamaguchi-Kabata Y, Hanaoka H, Yamasaki C, Maekawa H, Barrero RA, Lenhard B, Datta MW, Shimoyama M, Bumgarner R, Chakraborty R, Hopkinson I, Jia L, Hide W, Auffray C, Minoshima S, Imanishi T, Gojobori T. A prioritization analysis of disease association by data-mining of functional annotation of human genes. Genomics 2011; 99:1-9. [PMID: 22019378 DOI: 10.1016/j.ygeno.2011.10.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2011] [Revised: 09/16/2011] [Accepted: 10/06/2011] [Indexed: 11/15/2022]
Abstract
Complex diseases result from contributions of multiple genes that act in concert through pathways. Here we present a method to prioritize novel candidates of disease-susceptibility genes depending on the biological similarities to the known disease-related genes. The extent of disease-susceptibility of a gene is prioritized by analyzing seven features of human genes captured in H-InvDB. Taking rheumatoid arthritis (RA) and prostate cancer (PC) as two examples, we evaluated the efficiency of our method. Highly scored genes obtained included TNFSF12 and OSM as candidate disease genes for RA and PC, respectively. Subsequent characterization of these genes based upon an extensive literature survey reinforced the validity of these highly scored genes as possible disease-susceptibility genes. Our approach, Prioritization ANalysis of Disease Association (PANDA), is an efficient and cost-effective method to narrow down a large set of genes into smaller subsets that are most likely to be involved in the disease pathogenesis.
Collapse
Affiliation(s)
- Takayuki Taniya
- Japan Biological Information Research Center, Japan Biological Informatics Consortium, AIST Bio-IT Research Building 7F, 2-4-7 Aomi, Tokyo 135-0064, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Abstract
Recent advances in high-throughput sequencing have facilitated the genome-wide studies of small non-coding RNAs (sRNAs). Numerous studies have highlighted the role of various classes of sRNAs at different levels of gene regulation and disease. The fast growth of sequence data and the diversity of sRNA species have prompted the need to organise them in annotation databases. There are currently several databases that collect sRNA data. Various tools are provided for access, with special emphasis on the well-characterised family of micro-RNAs. The striking heterogeneity of the new classes of sRNAs and the lack of sufficient functional annotation, however, make integration of these datasets a difficult task. This review describes the currently available databases for human sRNAs that are accessible via the internet, and some of the large datasets for human sRNAs from high-throughput sequencing experiments that are so far only available as supplementary data in publications. Some of the main issues related to the integration and annotation of sRNA datasets are also discussed.
Collapse
Affiliation(s)
- Eneritz Agirre
- Department of Computational Genomics, Universitat Pompeu Fabra, Dr. Aiguader 88, E08003 Barcelona, Spain
| | | |
Collapse
|
30
|
Katayama T, Wilkinson MD, Vos R, Kawashima T, Kawashima S, Nakao M, Yamamoto Y, Chun HW, Yamaguchi A, Kawano S, Aerts J, Aoki-Kinoshita KF, Arakawa K, Aranda B, Bonnal RJ, Fernández JM, Fujisawa T, Gordon PM, Goto N, Haider S, Harris T, Hatakeyama T, Ho I, Itoh M, Kasprzyk A, Kido N, Kim YJ, Kinjo AR, Konishi F, Kovarskaya Y, von Kuster G, Labarga A, Limviphuvadh V, McCarthy L, Nakamura Y, Nam Y, Nishida K, Nishimura K, Nishizawa T, Ogishima S, Oinn T, Okamoto S, Okuda S, Ono K, Oshita K, Park KJ, Putnam N, Senger M, Severin J, Shigemoto Y, Sugawara H, Taylor J, Trelles O, Yamasaki C, Yamashita R, Satoh N, Takagi T. The 2nd DBCLS BioHackathon: interoperable bioinformatics Web services for integrated applications. J Biomed Semantics 2011; 2:4. [PMID: 21806842 PMCID: PMC3170566 DOI: 10.1186/2041-1480-2-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2011] [Accepted: 08/02/2011] [Indexed: 01/19/2023] Open
Abstract
Background The interaction between biological researchers and the bioinformatics tools they use is still hampered by incomplete interoperability between such tools. To ensure interoperability initiatives are effectively deployed, end-user applications need to be aware of, and support, best practices and standards. Here, we report on an initiative in which software developers and genome biologists came together to explore and raise awareness of these issues: BioHackathon 2009. Results Developers in attendance came from diverse backgrounds, with experts in Web services, workflow tools, text mining and visualization. Genome biologists provided expertise and exemplar data from the domains of sequence and pathway analysis and glyco-informatics. One goal of the meeting was to evaluate the ability to address real world use cases in these domains using the tools that the developers represented. This resulted in i) a workflow to annotate 100,000 sequences from an invertebrate species; ii) an integrated system for analysis of the transcription factor binding sites (TFBSs) enriched based on differential gene expression data obtained from a microarray experiment; iii) a workflow to enumerate putative physical protein interactions among enzymes in a metabolic pathway using protein structure data; iv) a workflow to analyze glyco-gene-related diseases by searching for human homologs of glyco-genes in other species, such as fruit flies, and retrieving their phenotype-annotated SNPs. Conclusions Beyond deriving prototype solutions for each use-case, a second major purpose of the BioHackathon was to highlight areas of insufficiency. We discuss the issues raised by our exploration of the problem/solution space, concluding that there are still problems with the way Web services are modeled and annotated, including: i) the absence of several useful data or analysis functions in the Web service "space"; ii) the lack of documentation of methods; iii) lack of compliance with the SOAP/WSDL specification among and between various programming-language libraries; and iv) incompatibility between various bioinformatics data formats. Although it was still difficult to solve real world problems posed to the developers by the biological researchers in attendance because of these problems, we note the promise of addressing these issues within a semantic framework.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Database Center for Life Science, Research Organization of Information and Systems, 2-11-16 Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Suzuki R, Tanaka M, Takanashi M, Hussain A, Yuan B, Toyoda H, Kuroda M. Anthocyanidins-enriched bilberry extracts inhibit 3T3-L1 adipocyte differentiation via the insulin pathway. Nutr Metab (Lond) 2011; 8:14. [PMID: 21385419 PMCID: PMC3063807 DOI: 10.1186/1743-7075-8-14] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 03/08/2011] [Indexed: 11/26/2022] Open
Abstract
Background Obesity and metabolic syndrome are important public concerns, and there is increasing demand for effective therapeutic strategies. Flavonoids are expected to improve the risk factors associated with metabolic syndrome. Anthocyanidins are a kind of flavonoids; well known for their anti-oxidative, anti-inflammatory and anti-tumor properties. However, their effects on adipocytes and molecular systems are not well defined. In this study, we examined the effects of anthocyanidins-enriched bilberry extracts on adipocyte differentiation. Methods Utilizing 3T3-L1 cell line, we investigated that bilberry extracts and anthocyanidins induced inhibition of lipid accumulation during adipogenesis. To identify what is the most important bilberry mediated-effect, we analyzed the expressions of key transcriptional factors associated with adipocyte differentiation by Real Time (RT)-PCR. From the results of RT-PCR, we hypothesized that bilberry extracts and anthocyanidins blocks insulin signal, we determined the phosphorylation of tyrosine residues of insulin receptor substrate 1 (IRS1) protein by western blotting analysis. In addition, we compared the whole-genome expression profiles of early stage of adipocyte differentiation under four different growth conditions (DMSO, bilberry, two anthocyanidins) by microarray analyses and Gene Set Enrichment Analysis (GSEA). Results Exposure to bilberry extracts and anthocyanidins during adipocyte differentiation inhibited 3T3-L1 differentiation. During this period, bilberry extracts and anthocyanidin significantly decreased a key adipocyte differentiation-associated marker, peroxisome proliferator-activated receptor- γ (Ppar γ ) and sterol regulatory element-binding protein 1c (Srebp1c). Western blotting analysis showed that bilberry extracts and anthocyanidin decreased the phosphorylation of tyrosine residues of IRS1. In addition, microarray experiments and GSEA data revealed significantly altered expression of the known genes of the insulin pathway in cells treated with bilberry extracts or anthocyanidins in the early differentiation stages. Conclusions Our data demonstrate that anthocyanidin enriched bilberry extracts strongly inhibit the adipocyte differentiation via the insulin pathway. Furthermore, bilberry extracts might be used as a potential complementary treatment for the obese patients with metabolic syndrome.
Collapse
Affiliation(s)
- Rieko Suzuki
- Department of Molecular Pathology, Tokyo Medical University, Tokyo, Japan
| | - Masami Tanaka
- Department of Molecular Pathology, Tokyo Medical University, Tokyo, Japan
| | | | - Aashiq Hussain
- Department of Molecular Pathology, Tokyo Medical University, Tokyo, Japan
| | - Bo Yuan
- Department of Clinical Molecular Gene, Tokyo University of Pharmacy and Life Science, Tokyo, Japan
| | - Hiroo Toyoda
- Department of Clinical Molecular Gene, Tokyo University of Pharmacy and Life Science, Tokyo, Japan
| | - Masahiko Kuroda
- Department of Molecular Pathology, Tokyo Medical University, Tokyo, Japan
| |
Collapse
|
32
|
Endo T, Ueno K, Yonezawa K, Mineta K, Hotta K, Satou Y, Yamada L, Ogasawara M, Takahashi H, Nakajima A, Nakachi M, Nomura M, Yaguchi J, Sasakura Y, Yamasaki C, Sera M, Yoshizawa AC, Imanishi T, Taniguchi H, Inaba K. CIPRO 2.5: Ciona intestinalis protein database, a unique integrated repository of large-scale omics data, bioinformatic analyses and curated annotation, with user rating and reviewing functionality. Nucleic Acids Res 2010; 39:D807-14. [PMID: 21071393 PMCID: PMC3013717 DOI: 10.1093/nar/gkq1144] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The Ciona intestinalis protein database (CIPRO) is an integrated protein database for the tunicate species C. intestinalis. The database is unique in two respects: first, because of its phylogenetic position, Ciona is suitable model for understanding vertebrate evolution; and second, the database includes original large-scale transcriptomic and proteomic data. Ciona intestinalis has also been a favorite of developmental biologists. Therefore, large amounts of data exist on its development and morphology, along with a recent genome sequence and gene expression data. The CIPRO database is aimed at collecting those published data as well as providing unique information from unpublished experimental data, such as 3D expression profiling, 2D-PAGE and mass spectrometry-based large-scale analyses at various developmental stages, curated annotation data and various bioinformatic data, to facilitate research in diverse areas, including developmental, comparative and evolutionary biology. For medical and evolutionary research, homologs in humans and major model organisms are intentionally included. The current database is based on a recently developed KH model containing 36 034 unique sequences, but for higher usability it covers 89 683 all known and predicted proteins from all gene models for this species. Of these sequences, more than 10 000 proteins have been manually annotated. Furthermore, to establish a community-supported protein database, these annotations are open to evaluation by users through the CIPRO website. CIPRO 2.5 is freely accessible at http://cipro.ibio.jp/2.5.
Collapse
Affiliation(s)
- Toshinori Endo
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Bareke E, Pierre M, Gaigneaux A, De Meulder B, Depiereux S, Berger F, Habra N, Depiereux E. PathEx: a novel multi factors based datasets selector web tool. BMC Bioinformatics 2010; 11:528. [PMID: 20969778 PMCID: PMC2978222 DOI: 10.1186/1471-2105-11-528] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Accepted: 10/22/2010] [Indexed: 11/27/2022] Open
Abstract
Background Microarray experiments have become very popular in life science research. However, if such experiments are only considered independently, the possibilities for analysis and interpretation of many life science phenomena are reduced. The accumulation of publicly available data provides biomedical researchers with a valuable opportunity to either discover new phenomena or improve the interpretation and validation of other phenomena that partially understood or well known. This can only be achieved by intelligently exploiting this rich mine of information. Description Considering that technologies like microarrays remain prohibitively expensive for researchers with limited means to order their own experimental chips, it would be beneficial to re-use previously published microarray data. For certain researchers interested in finding gene groups (requiring many replicates), there is a great need for tools to help them to select appropriate datasets for analysis. These tools may be effective, if and only if, they are able to re-use previously deposited experiments or to create new experiments not initially envisioned by the depositors. However, the generation of new experiments requires that all published microarray data be completely annotated, which is not currently the case. Thus, we propose the PathEx approach. Conclusion This paper presents PathEx, a human-focused web solution built around a two-component system: one database component, enriched with relevant biological information (expression array, omics data, literature) from different sources, and another component comprising sophisticated web interfaces that allow users to perform complex dataset building queries on the contents integrated into the PathEx database.
Collapse
Affiliation(s)
- Eric Bareke
- Molecular Biology Research Unit, University of Namur - FUNDP, Namur, Belgium.
| | | | | | | | | | | | | | | |
Collapse
|