1
|
Sang J, Zou D, Wang Z, Wang F, Zhang Y, Xia L, Li Z, Ma L, Li M, Xu B, Liu X, Wu S, Liu L, Niu G, Li M, Luo Y, Hu S, Hao L, Zhang Z. IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:161-172. [PMID: 32683045 PMCID: PMC7646092 DOI: 10.1016/j.gpb.2018.12.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 11/28/2018] [Accepted: 12/29/2018] [Indexed: 12/19/2022]
Abstract
Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function. Although the availability of massive RNA-seq data provides great opportunities for gene model refinement, few efforts have been made to adopt these precious data in rice genome reannotation. Here we reannotate the rice (Oryza sativa L. ssp. japonica) genome based on integration of large-scale RNA-seq data and release a new annotation system IC4R-2.0. In general, IC4R-2.0 significantly improves the completeness of gene structure, identifies a number of novel genes, and integrates a variety of functional annotations. Furthermore, long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) are systematically characterized in the rice genome. Performance evaluation shows that compared to previous annotation systems, IC4R-2.0 achieves higher integrity and quality, primarily attributable to massive RNA-seq data applied in genome annotation. Consequently, we incorporate the improved annotations into the Information Commons for Rice (IC4R), a database integrating multiple omics data of rice, and accordingly update IC4R by providing more user-friendly web interfaces and implementing a series of practical online tools. Together, the updated IC4R, which is equipped with the improved annotations, bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species. The IC4R-2.0 annotation system and related resources are freely accessible at http://ic4r.org/.
Collapse
Affiliation(s)
- Jian Sang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dong Zou
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhennan Wang
- University of Chinese Academy of Sciences, Beijing 100049, China; State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Fan Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yuansheng Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lin Xia
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhaohua Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lina Ma
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Mengwei Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bingxiang Xu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaonan Liu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shuangyang Wu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lin Liu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangyi Niu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Man Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yingfeng Luo
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Songnian Hu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Lili Hao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
2
|
Database Resources of the BIG Data Center in 2018. Nucleic Acids Res 2019; 46:D14-D20. [PMID: 29036542 PMCID: PMC5753194 DOI: 10.1093/nar/gkx897] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 09/23/2017] [Indexed: 02/07/2023] Open
Abstract
The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn.
Collapse
|
3
|
Hong WJ, Kim YJ, Chandran AKN, Jung KH. Infrastructures of systems biology that facilitate functional genomic study in rice. RICE (NEW YORK, N.Y.) 2019; 12:15. [PMID: 30874968 PMCID: PMC6419666 DOI: 10.1186/s12284-019-0276-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Accepted: 03/06/2019] [Indexed: 05/08/2023]
Abstract
Rice (Oryza sativa L.) is both a major staple food for the worldwide population and a model crop plant for studying the mode of action of agronomically valuable traits, providing information that can be applied to other crop plants. Due to the development of high-throughput technologies such as next generation sequencing and mass spectrometry, a huge mass of multi-omics data in rice has been accumulated. Through the integration of those data, systems biology in rice is becoming more advanced.To facilitate such systemic approaches, we have summarized current resources, such as databases and tools, for systems biology in rice. In this review, we categorize the resources using six omics levels: genomics, transcriptomics, proteomics, metabolomics, integrated omics, and functional genomics. We provide the names, websites, references, working states, and number of citations for each individual database or tool and discuss future prospects for the integrated understanding of rice gene functions.
Collapse
Affiliation(s)
- Woo-Jong Hong
- Graduate School of Biotechnology & Crop Biotech Institute, Kyung Hee University, Yongin, 17104, Korea
| | - Yu-Jin Kim
- Graduate School of Biotechnology & Crop Biotech Institute, Kyung Hee University, Yongin, 17104, Korea
| | | | - Ki-Hong Jung
- Graduate School of Biotechnology & Crop Biotech Institute, Kyung Hee University, Yongin, 17104, Korea.
| |
Collapse
|
4
|
Song S, Zhang Z. Database Resources in BIG Data Center: Submission, Archiving, and Integration of Big Data in Plant Science. MOLECULAR PLANT 2019; 12:279-281. [PMID: 30716410 DOI: 10.1016/j.molp.2019.01.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Revised: 01/26/2019] [Accepted: 01/27/2019] [Indexed: 06/09/2023]
Affiliation(s)
- Shuhui Song
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Zhang Zhang
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
5
|
Zhang Z, Zhao W, Xiao J, Bao Y, Wang F, Hao L, Zhu J, Chen T, Zhang S, Chen X, Tang B, Zhou Q, Wang Z, Dong L, Wang Y, Ma Y, Wang F, Zhang Z, Wang Z, Chen M, Tian D, Li C, Dong L, Teng X, Tang B, Du Z, Yuan N, Zeng J, Zhang Z, Wang J, Shi S, Zhang Y, Wang Q, Pan M, Qian Q, Song S, Niu G, Li M, Xia L, Zou D, Zhang Y, Sang J, Li M, Zhang Y, Wang P, Wang F, Zhang Y, Gao Q, Xiao J, Hao L, Liang F, Li M, Zou D, Li R, Liu L, Cao J, Sang J, Zou D, Li M, Abbasi AA, Shireen H, Wang P, Zhang Y, Li Z, Wang Q, Xia L, Xiong Z, Jiang M, Guo T, Li Z, Zhang H, Pan M, Ma L, Li M, Niu G, Xia L, Zou D, Zhang Y, Sang J, Li Z, Gao R, Li R, Zhang T, Bao Y, Zhang Z, Tang B, Zhou Q, Dong L, Li W, Zhang X, Lan L, Zhai S, Bao Y, Zhang Y, Wang G, Zhao W, Sang J, Wang Z, Zou D, Zhang Y, Hao L, Bao Y, Zhang Z, Zhao W, Xiao J, Lan L, Xue Y, Sun Y, Yu L, Zhai S, Sun M, Chen H, Zhang Z, Zhao W, Xiao J, Bao Y, Song S, Hao L, Li R, Ma L, Wang Y, Tang B, Chen M, Hu H, Guo AY, Lin S, Xue Y, Wang C, Xue Y, Ning W, Xue Y, Zhang Y, Xue Y, Luo H, Gao F, Guo Y, Xue Y, Zhang Q, Guo AY, Zhou J, Xue Y, Huang Z, Cui Q, Miao YR, Guo AY, Ruan C, Xue Y, Yuan C, Chen M, Jinpu J, Gao G, Xu H, Xue Y, Li Y, Li CY, Tang Q, Guo AY, Peng D, Deng W. Database Resources of the BIG Data Center in 2019. Nucleic Acids Res 2019; 47:D8-D14. [PMID: 30365034 PMCID: PMC6323991 DOI: 10.1093/nar/gky993] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/08/2018] [Accepted: 10/10/2018] [Indexed: 01/23/2023] Open
Abstract
The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of multi-omics data generated at unprecedented scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big data integration and value-added curation. Resources with significant updates in the past year include BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Science Wikis (a catalog of biological knowledge wikis for community annotations) and IC4R (Information Commons for Rice). Newly released resources include EWAS Atlas (a knowledgebase of epigenome-wide association studies), iDog (an integrated omics data resource for dog) and RNA editing resources (for editome-disease associations and plant RNA editosome, respectively). To promote biodiversity and health big data sharing around the world, the Open Biodiversity and Health Big Data (BHBD) initiative is introduced. All of these resources are publicly accessible at http://bigd.big.ac.cn.
Collapse
|
6
|
Liao P, Li S, Cui X, Zheng Y. A comprehensive review of web-based resources of non-coding RNAs for plant science research. Int J Biol Sci 2018; 14:819-832. [PMID: 29989090 PMCID: PMC6036741 DOI: 10.7150/ijbs.24593] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Accepted: 03/14/2018] [Indexed: 01/06/2023] Open
Abstract
Non-coding RNAs (ncRNAs) are transcribed from genome but not translated into proteins. Many ncRNAs are key regulators of plants growth and development, metabolism and stress tolerance. In order to make the web-based ncRNA resources for plant science research be more easily accessible and understandable, we made a comprehensive review for 83 web-based resources of three types, including genome databases containing ncRNA data, microRNA (miRNA) databases and long non-coding RNA (lncRNA) databases. To facilitate effective usage of these resources, we also suggested some preferred resources of miRNAs and lncRNAs for performing meaningful analysis.
Collapse
Affiliation(s)
- Peiran Liao
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan, 650500,China
| | - Shipeng Li
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan, 650500,China
| | - Xiuming Cui
- Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan, 650500,China
- Yunnan key laboratory of Panax notoginseng, Kunming, Yunnan, 650500, China
| | - Yun Zheng
- Yunnan Key Laboratory of Primate Biomedical Research, Institute of Primate Translational Medicine, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China
| |
Collapse
|
7
|
Chen Q, Panyam NC, Elangovan A, Verspoor K. BioCreative VI Precision Medicine Track system performance is constrained by entity recognition and variations in corpus characteristics. Database (Oxford) 2018; 2018:5255181. [PMID: 30576491 PMCID: PMC6301335 DOI: 10.1093/database/bay122] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2018] [Revised: 09/24/2018] [Accepted: 10/16/2018] [Indexed: 01/01/2023]
Abstract
Precision medicine aims to provide personalized treatments based on individual patient profiles. One critical step towards precision medicine is leveraging knowledge derived from biomedical publications-a tremendous literature resource presenting the latest scientific discoveries on genes, mutations and diseases. Biomedical natural language processing (BioNLP) plays a vital role in supporting automation of this process. BioCreative VI Track 4 brings community effort to the task of automatically identifying and extracting protein-protein interactions (PPi) affected by mutations (PPIm), important in the precision medicine context for capturing individual genotype variation related to disease.We present the READ-BioMed team's approach to identifying PPIm-related publications and to extracting specific PPIm information from those publications in the context of the BioCreative VI PPIm track. We observe that current BioNLP tools are insufficient to recognise entities for these two tasks; the best existing mutation recognition tool achieves only 55% recall in the document triage training set, while relation extraction performance is limited by the low recall performance of gene entity recognition. We develop the models accordingly: for document triage, we develop term lists capturing interactions and mutations to complement BioNLP tools, and select effective features via a feature contribution study, whereas an ensemble of BioNLP tools is employed for relation extraction.Our best document triage model achieves an F-score of 66.77% while our best model for relation extraction achieved an F-score of 35.09% over the final (updated post-task) test set. Impacting the document triage task, the characteristics of mutations are statistically different in the training and testing sets. While a vital new direction for biomedical text mining research, this early attempt to tackle the problem of identifying genetic variation of substantial biological significance highlights the importance of representative training data and the cascading impact of tool limitations in a modular system.
Collapse
Affiliation(s)
- Qingyu Chen
- School of Computing and Information Systems, The University of Melbourne, Parkville VIC Australia
| | - Nagesh C Panyam
- School of Computing and Information Systems, The University of Melbourne, Parkville VIC Australia
| | - Aparna Elangovan
- School of Computing and Information Systems, The University of Melbourne, Parkville VIC Australia
| | - Karin Verspoor
- School of Computing and Information Systems, The University of Melbourne, Parkville VIC Australia
| |
Collapse
|
8
|
Chen T, Li M, He Q, Zou L, Li Y, Chang C, Zhao D, Zhu Y. LiverWiki: a wiki-based database for human liver. BMC Bioinformatics 2017; 18:452. [PMID: 29029599 PMCID: PMC5640914 DOI: 10.1186/s12859-017-1852-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 10/02/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent advances in omics technology have produced a large amount of liver-related data. A comprehensive and up-to-date source of liver-related data is needed to allow biologists to access the latest data. However, current liver-related data sources each cover only a specific part of the liver. It is difficult for them to keep pace with the rapid increase of liver-related data available at those data resources. Integrating diverse liver-related data is a critical yet formidable challenge, as it requires sustained human effort. RESULTS We present LiverWiki, a first wiki-based database that integrates liver-related genes, homolog genes, gene expressions in microarray datasets and RNA-Seq datasets, proteins, protein interactions, post-translational modifications, associated pathways, diseases, metabolites identified in the metabolomics datasets, and literatures into an easily accessible and searchable resource for community-driven sharing. LiverWiki houses information in a total of 141,897 content pages, including 19,787 liver-related gene pages, 17,077 homolog gene pages, 50,251 liver-related protein pages, 36,122 gene expression pages, 2067 metabolites identified in the metabolomics datasets, 16,366 disease-related molecules, and 227 liver disease pages. Other than assisting users in searching, browsing, reviewing, refining the contents on LiverWiki, the most important contribution of LiverWiki is to allow the community to create and update biological data of liver in visible and editable tables. This integrates newly produced data with existing knowledge. Implemented in mediawiki, LiverWiki provides powerful extensions to support community contributions. CONCLUSIONS The main goal of LiverWiki is to provide the research community with comprehensive liver-related data, as well as to allow the research community to share their liver-related data flexibly and efficiently. It also enables rapid sharing new discoveries by allowing the discoveries to be integrated and shared immediately, rather than relying on expert curators. The database is available online at http://liverwiki.hupo.org.cn /.
Collapse
Affiliation(s)
- Tao Chen
- Beijing Institute of Life Omics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 33 Life Science Park Rd, Changping District, Beijing, 102206, China
| | - Mansheng Li
- Beijing Institute of Life Omics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 33 Life Science Park Rd, Changping District, Beijing, 102206, China
| | - Qiang He
- School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, Victoria, 3122, Australia
| | - Lei Zou
- Institute of Computer Science and Technology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing, 100871, China
| | - Youhuan Li
- Institute of Computer Science and Technology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing, 100871, China
| | - Cheng Chang
- Beijing Institute of Life Omics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 33 Life Science Park Rd, Changping District, Beijing, 102206, China
| | - Dongyan Zhao
- Institute of Computer Science and Technology, Peking University, No.5 Yiheyuan Road Haidian District, Beijing, 100871, China
| | - Yunping Zhu
- Beijing Institute of Life Omics, State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Radiation Medicine, 33 Life Science Park Rd, Changping District, Beijing, 102206, China.
| |
Collapse
|
9
|
Lopez JR, Erickson JE, Munoz P, Saballos A, Felderhoff TJ, Vermerris W. QTLs Associated with Crown Root Angle, Stomatal Conductance, and Maturity in Sorghum. THE PLANT GENOME 2017; 10. [PMID: 28724080 DOI: 10.3835/plantgenome2016.04.0038] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Three factors that directly affect the water inputs in cropping systems are root architecture, length of the growing season, and stomatal conductance to water vapor (). Deeper-rooted cultivars will perform better under water-limited conditions because they can access water stored deeper in the soil profile. Reduced limits transpiration rate () and thus throughout the vegetative phase conserves water that may be used during grain filling in water-limited environments. Additionally, growing early-maturing varieties in regions that rely on soil-stored water is a key water management strategy. To further our understanding of the genetic basis underlying root depth, growing season length, and we conducted a quantitative trait locus (QTL) study. A QTL for crown root angle (a proxy for root depth) new to sorghum was identified in chromosome 3. For , a QTL in chromosome seven was identified. In a follow-up field study it was determined that the QTL for was associated with reduced but not with net carbon assimilation rate () or shoot biomass. No differences in guard-cell length or stomatal density were observed among the lines, leading to the conclusion that the observed differences in must be explained by partial stomatal closure. The well-studied maturity gene was identified in the QTL for maturity. The transgressive segregation of the population was explained by the possible interaction of with other loci. Finally, the most probable position of the genes underlying the QTLs and candidate genes were proposed.
Collapse
|
10
|
The BIG Data Center: from deposition to integration to translation. Nucleic Acids Res 2016; 45:D18-D24. [PMID: 27899658 PMCID: PMC5210546 DOI: 10.1093/nar/gkw1060] [Citation(s) in RCA: 404] [Impact Index Per Article: 50.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 10/19/2016] [Accepted: 10/21/2016] [Indexed: 02/06/2023] Open
Abstract
Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn.
Collapse
Affiliation(s)
- BIG Data Center Members
- To whom correspondence should be addressed Zhang Zhang. Tel: +86 10 8409 7261; Fax: +86 10 8409 7720;
| |
Collapse
|
11
|
Abstract
Rice is the most important staple food for a large part of the world's human population and also a key model organism for plant research. Here, we present Information Commons for Rice (IC4R; http://ic4r.org), a rice knowledgebase featuring adoption of an extensible and sustainable architecture that integrates multiple omics data through community-contributed modules. Each module is developed and maintained by different committed groups, deals with data collection, processing and visualization, and delivers data on-demand via web services. In the current version, IC4R incorporates a variety of rice data through multiple committed modules, including genome-wide expression profiles derived entirely from RNA-Seq data, resequencing-based genomic variations obtained from re-sequencing data of thousands of rice varieties, plant homologous genes covering multiple diverse plant species, post-translational modifications, rice-related literatures and gene annotations contributed by the rice research community. Unlike extant related databases, IC4R is designed for scalability and sustainability and thus also features collaborative integration of rice data and low costs for database update and maintenance. Future directions of IC4R include incorporation of other omics data and association of multiple omics data with agronomically important traits, dedicating to build IC4R into a valuable knowledgebase for both basic and translational researches in rice.
Collapse
|
12
|
Zou D, Ma L, Yu J, Zhang Z. Biological databases for human research. GENOMICS PROTEOMICS & BIOINFORMATICS 2015; 13:55-63. [PMID: 25712261 PMCID: PMC4411498 DOI: 10.1016/j.gpb.2015.01.006] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 01/16/2015] [Accepted: 01/16/2015] [Indexed: 01/01/2023]
Abstract
The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation.
Collapse
Affiliation(s)
- Dong Zou
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Lina Ma
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
13
|
Ma L, Li A, Zou D, Xu X, Xia L, Yu J, Bajic VB, Zhang Z. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic Acids Res 2014; 43:D187-92. [PMID: 25399417 PMCID: PMC4383965 DOI: 10.1093/nar/gku1167] [Citation(s) in RCA: 103] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) perform a diversity of functions in numerous important biological processes and are implicated in many human diseases. In this report we present lncRNAWiki (http://lncrna.big.ac.cn), a wiki-based platform that is open-content and publicly editable and aimed at community-based curation and collection of information on human lncRNAs. Current related databases are dependent primarily on curation by experts, making it laborious to annotate the exponentially accumulated information on lncRNAs, which inevitably requires collective efforts in community-based curation of lncRNAs. Unlike existing databases, lncRNAWiki features comprehensive integration of information on human lncRNAs obtained from multiple different resources and allows not only existing lncRNAs to be edited, updated and curated by different users but also the addition of newly identified lncRNAs by any user. It harnesses community collective knowledge in collecting, editing and annotating human lncRNAs and rewards community-curated efforts by providing explicit authorship based on quantified contributions. LncRNAWiki relies on the underling knowledge of scientific community for collective and collaborative curation of human lncRNAs and thus has the potential to serve as an up-to-date and comprehensive knowledgebase for human lncRNAs.
Collapse
Affiliation(s)
- Lina Ma
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Ang Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Dong Zou
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xingjian Xu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lin Xia
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Vladimir B Bajic
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
14
|
Zhang Z, Zhu W, Luo J. Bringing biocuration to China. GENOMICS PROTEOMICS & BIOINFORMATICS 2014; 12:153-5. [PMID: 25042682 PMCID: PMC4411340 DOI: 10.1016/j.gpb.2014.07.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 07/10/2014] [Indexed: 11/30/2022]
Abstract
Biocuration involves adding value to biomedical data by the processes of standardization, quality control and information transferring (also known as data annotation). It enhances data interoperability and consistency, and is critical in translating biomedical data into scientific discovery. Although China is becoming a leading scientific data producer, biocuration is still very new to the Chinese biomedical data community. In fact, there currently lacks an equivalent acknowledged word in Chinese for the word “curation”. Here we propose its Chinese translation as “审编” (Pinyin: shěn biān), based on its implied meanings taken by biomedical data community. The 8th International Biocuration Conference to be held in China (http://biocuration2015.tilsi.org) next year bears the potential to raise the general awareness in China of the significant role of biocuration in scientific discovery. However, challenges are ahead in its implementation.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Weimin Zhu
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing 100730, China; Taicang Institute of Life Sciences Information, Taicang 215400, China
| | - Jingchu Luo
- College of Life Sciences and Center for Bioinformatics, Peking University, Beijing 100871, China
| |
Collapse
|