1
|
Ara T, Kodama Y, Tokimatsu T, Fukuda A, Kosuge T, Mashima J, Tanizawa Y, Tanjo T, Ogasawara O, Fujisawa T, Nakamura Y, Arita M. DDBJ update in 2023: the MetaboBank for metabolomics data and associated metadata. Nucleic Acids Res 2024; 52:D67-D71. [PMID: 37971299 PMCID: PMC10767850 DOI: 10.1093/nar/gkad1046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 10/21/2023] [Accepted: 10/27/2023] [Indexed: 11/19/2023] Open
Abstract
The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) provides database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), DDBJ accepts and distributes nucleotide sequence data as well as their study and sample information along with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute (EBI). Besides INSDC databases, the DDBJ Center provides databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank) and human genetic and phenotypic data (JGA: Japanese Genotype-phenotype Archive). These database systems have been built on the National Institute of Genetics (NIG) supercomputer, which is also open for domestic life science researchers to analyze large-scale sequence data. This paper reports recent updates on the archival databases and the services of the DDBJ Center, highlighting the newly redesigned MetaboBank. MetaboBank uses BioProject and BioSample in its metadata description making it suitable for multi-omics large studies. Its collaboration with MetaboLights at EBI brings synergy in locating and reusing public data.
Collapse
Affiliation(s)
- Takeshi Ara
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yuichi Kodama
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Toshiaki Tokimatsu
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Asami Fukuda
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Jun Mashima
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yasuhiro Tanizawa
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Tomoya Tanjo
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Takatomo Fujisawa
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Masanori Arita
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| |
Collapse
|
2
|
O’Connor LM, O’Connor BA, Zeng J, Lo CH. Data Mining of Microarray Datasets in Translational Neuroscience. Brain Sci 2023; 13:1318. [PMID: 37759919 PMCID: PMC10527016 DOI: 10.3390/brainsci13091318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/04/2023] [Accepted: 09/10/2023] [Indexed: 09/29/2023] Open
Abstract
Data mining involves the computational analysis of a plethora of publicly available datasets to generate new hypotheses that can be further validated by experiments for the improved understanding of the pathogenesis of neurodegenerative diseases. Although the number of sequencing datasets is on the rise, microarray analysis conducted on diverse biological samples represent a large collection of datasets with multiple web-based programs that enable efficient and convenient data analysis. In this review, we first discuss the selection of biological samples associated with neurological disorders, and the possibility of a combination of datasets, from various types of samples, to conduct an integrated analysis in order to achieve a holistic understanding of the alterations in the examined biological system. We then summarize key approaches and studies that have made use of the data mining of microarray datasets to obtain insights into translational neuroscience applications, including biomarker discovery, therapeutic development, and the elucidation of the pathogenic mechanisms of neurodegenerative diseases. We further discuss the gap to be bridged between microarray and sequencing studies to improve the utilization and combination of different types of datasets, together with experimental validation, for more comprehensive analyses. We conclude by providing future perspectives on integrating multi-omics, to advance precision phenotyping and personalized medicine for neurodegenerative diseases.
Collapse
Affiliation(s)
- Lance M. O’Connor
- College of Biological Sciences, University of Minnesota, Minneapolis, MN 55455, USA;
| | - Blake A. O’Connor
- School of Pharmacy, University of Wisconsin, Madison, WI 53705, USA;
| | - Jialiu Zeng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 308232, Singapore;
| | - Chih Hung Lo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 308232, Singapore;
| |
Collapse
|
3
|
O'Connor LM, O'Connor BA, Lim SB, Zeng J, Lo CH. Integrative multi-omics and systems bioinformatics in translational neuroscience: A data mining perspective. J Pharm Anal 2023; 13:836-850. [PMID: 37719197 PMCID: PMC10499660 DOI: 10.1016/j.jpha.2023.06.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 06/20/2023] [Accepted: 06/25/2023] [Indexed: 09/19/2023] Open
Abstract
Bioinformatic analysis of large and complex omics datasets has become increasingly useful in modern day biology by providing a great depth of information, with its application to neuroscience termed neuroinformatics. Data mining of omics datasets has enabled the generation of new hypotheses based on differentially regulated biological molecules associated with disease mechanisms, which can be tested experimentally for improved diagnostic and therapeutic targeting of neurodegenerative diseases. Importantly, integrating multi-omics data using a systems bioinformatics approach will advance the understanding of the layered and interactive network of biological regulation that exchanges systemic knowledge to facilitate the development of a comprehensive human brain profile. In this review, we first summarize data mining studies utilizing datasets from the individual type of omics analysis, including epigenetics/epigenomics, transcriptomics, proteomics, metabolomics, lipidomics, and spatial omics, pertaining to Alzheimer's disease, Parkinson's disease, and multiple sclerosis. We then discuss multi-omics integration approaches, including independent biological integration and unsupervised integration methods, for more intuitive and informative interpretation of the biological data obtained across different omics layers. We further assess studies that integrate multi-omics in data mining which provide convoluted biological insights and offer proof-of-concept proposition towards systems bioinformatics in the reconstruction of brain networks. Finally, we recommend a combination of high dimensional bioinformatics analysis with experimental validation to achieve translational neuroscience applications including biomarker discovery, therapeutic development, and elucidation of disease mechanisms. We conclude by providing future perspectives and opportunities in applying integrative multi-omics and systems bioinformatics to achieve precision phenotyping of neurodegenerative diseases and towards personalized medicine.
Collapse
Affiliation(s)
- Lance M. O'Connor
- College of Biological Sciences, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Blake A. O'Connor
- School of Pharmacy, University of Wisconsin, Madison, WI, 53705, USA
| | - Su Bin Lim
- Department of Biochemistry and Molecular Biology, Ajou University School of Medicine, Suwon, 16499, South Korea
| | - Jialiu Zeng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| | - Chih Hung Lo
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, 308232, Singapore
| |
Collapse
|
4
|
Tanizawa Y, Fujisawa T, Kodama Y, Kosuge T, Mashima J, Tanjo T, Nakamura Y. DNA Data Bank of Japan (DDBJ) update report 2022. Nucleic Acids Res 2022; 51:D101-D105. [PMID: 36420889 PMCID: PMC9825463 DOI: 10.1093/nar/gkac1083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 10/24/2022] [Accepted: 11/22/2022] [Indexed: 11/26/2022] Open
Abstract
The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) maintains database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), our primary mission is to collect and distribute nucleotide sequence data, as well as their study and sample information, in collaboration with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute. In addition to INSDC resources, the Center operates databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank), and human genetic and phenotypic data (JGA: Japanese Genotype-Phenotype Archive). These databases are built on the supercomputer of the National Institute of Genetics, whose remaining computational capacity is actively utilized by domestic researchers for large-scale biological data analyses. Here, we report our recent updates and the activities of our services.
Collapse
Affiliation(s)
- Yasuhiro Tanizawa
- To whom correspondence should be addressed. Tel: +55 981 6859; Fax: +55 981 6889;
| | - Takatomo Fujisawa
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yuichi Kodama
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Jun Mashima
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Tomoya Tanjo
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yasukazu Nakamura
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| |
Collapse
|
5
|
Bhandari N, Walambe R, Kotecha K, Khare SP. A comprehensive survey on computational learning methods for analysis of gene expression data. Front Mol Biosci 2022; 9:907150. [PMID: 36458095 PMCID: PMC9706412 DOI: 10.3389/fmolb.2022.907150] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 09/28/2022] [Indexed: 09/19/2023] Open
Abstract
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
Collapse
Affiliation(s)
- Nikita Bhandari
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Rahee Walambe
- Electronics and Telecommunication Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Ketan Kotecha
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Satyajeet P. Khare
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, India
| |
Collapse
|
6
|
Hephzibah Cathryn R, Udhaya Kumar S, Younes S, Zayed H, George Priya Doss C. A review of bioinformatics tools and web servers in different microarray platforms used in cancer research. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:85-164. [PMID: 35871897 DOI: 10.1016/bs.apcsb.2022.05.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Over the past decade, conventional lab work strategies have gradually shifted from being limited to a laboratory setting towards a bioinformatics era to help manage and process the vast amounts of data generated by omics technologies. The present work outlines the latest contributions of bioinformatics in analyzing microarray data and their application to cancer. We dissect different microarray platforms and their use in gene expression in cancer models. We highlight how computational advances empowered the microarray technology in gene expression analysis. The study on protein-protein interaction databases classified into primary, derived, meta-database, and prediction databases describes the strategies to curate and predict novel interaction networks in silico. In addition, we summarize the areas of bioinformatics where neural graph networks are currently being used, such as protein functions, protein interaction prediction, and in silico drug discovery and development. We also discuss the role of deep learning as a potential tool in the prognosis, diagnosis, and treatment of cancer. Integrating these resources efficiently, practically, and ethically is likely to be the most challenging task for the healthcare industry over the next decade; however, we believe that it is achievable in the long term.
Collapse
Affiliation(s)
- R Hephzibah Cathryn
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - S Udhaya Kumar
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India
| | - Salma Younes
- Department of Biomedical Sciences, College of Health and Sciences, Qatar University, QU Health, Doha, Qatar
| | - Hatem Zayed
- Department of Biomedical Sciences, College of Health and Sciences, Qatar University, QU Health, Doha, Qatar
| | - C George Priya Doss
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
7
|
Suzuki T, Ono Y, Bono H. Comparison of Oxidative and Hypoxic Stress Responsive Genes from Meta-Analysis of Public Transcriptomes. Biomedicines 2021; 9:biomedicines9121830. [PMID: 34944646 PMCID: PMC8698900 DOI: 10.3390/biomedicines9121830] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 11/23/2021] [Accepted: 11/30/2021] [Indexed: 01/11/2023] Open
Abstract
Analysis of RNA-sequencing (RNA-seq) data is an effective means to analyze the gene expression levels under specific conditions and discover new biological knowledge. More than 74,000 experimental series with RNA-seq have been stored in public databases as of 20 October 2021. Since this huge amount of expression data accumulated from past studies is a promising source of new biological insights, we focused on a meta-analysis of 1783 runs of RNA-seq data under the conditions of two types of stressors: oxidative stress (OS) and hypoxia. The collected RNA-seq data of OS were organized as the OS dataset to retrieve and analyze differentially expressed genes (DEGs). The OS-induced DEGs were compared with the hypoxia-induced DEGs retrieved from a previous study. The results from the meta-analysis of OS transcriptomes revealed two genes, CRIP1 and CRIP3, which were particularly downregulated, suggesting a relationship between OS and zinc homeostasis. The comparison between meta-analysis of OS and hypoxia showed that several genes were differentially expressed under both stress conditions, and it was inferred that the downregulation of cell cycle-related genes is a mutual biological process in both OS and hypoxia.
Collapse
|
8
|
Okido T, Kodama Y, Mashima J, Kosuge T, Fujisawa T, Ogasawara O. DNA Data Bank of Japan (DDBJ) update report 2021. Nucleic Acids Res 2021; 50:D102-D105. [PMID: 34751405 PMCID: PMC8689959 DOI: 10.1093/nar/gkab995] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/08/2021] [Accepted: 10/12/2021] [Indexed: 11/29/2022] Open
Abstract
The Bioinformation and DDBJ (DNA Data Bank of Japan) Center (DDBJ Center; https://www.ddbj.nig.ac.jp) operates archival databases that collect nucleotide sequences, study and sample information, and distribute them without access restriction to progress life science research as a member of the International Nucleotide Sequence Database Collaboration (INSDC), in collaboration with the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute. Besides the INSDC databases, the DDBJ Center also provides the Genomic Expression Archive for functional genomics data and the Japanese Genotype-phenotype Archive for human data requiring controlled access. Additionally, the DDBJ Center started a new public repository, MetaboBank, for experimental raw data and metadata from metabolomics research in October 2020. In response to the COVID-19 pandemic, the DDBJ Center openly shares SARS-CoV-2 genome sequences in collaboration with Shizuoka Prefecture and Keio University. The operation of DDBJ is based on the National Institute of Genetics (NIG) supercomputer, which is open for large-scale sequence data analysis for life science researchers. This paper reports recent updates on the archival databases and the services of DDBJ.
Collapse
Affiliation(s)
- Toshihisa Okido
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yuichi Kodama
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Jun Mashima
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Takehide Kosuge
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Takatomo Fujisawa
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| |
Collapse
|
9
|
ARGEOS: A New Bioinformatic Tool for Detailed Systematics Search in GEO and ArrayExpress. BIOLOGY 2021; 10:biology10101026. [PMID: 34681124 PMCID: PMC8533512 DOI: 10.3390/biology10101026] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 09/27/2021] [Accepted: 10/04/2021] [Indexed: 12/24/2022]
Abstract
Simple Summary A systematic search for datasets of transcriptome data is a hefty task. Therefore, we developed the ARGEOS web tool, which simplifies the search and selection of datasets from various public databases. In addition, the service carries out an advanced analysis of a dataset, including collecting detailed protocols, information on the number of datasets, and providing additional reference information. An example of a cell polarization study exemplifies the effectiveness of the tool. Abstract Conduct a reanalysis of transcriptome data for studying intracellular signaling or solving other experimental problems is becoming increasingly popular. Gene expression data are archived as microarray or RNA-seq datasets mainly in two public databases: Gene Expression Omnibus (GEO) and ArrayExpress (AE). These databases were not initially intended to systematically search datasets, making it challenging to conduct a secondary study. Therefore, we have created the ARGEOS service, which has the following advantages that facilitate the search: (1) Users can simultaneously send several requests that are supposed to be used for systematic searches, and it is possible to correct the requests; (2) advanced analysis of information about the dataset is available. The service collects detailed protocols, information on the number of datasets, analyzes the availability of raw data, and provides other reference information. All this contributes to both rapid data analysis with the search for the most relevant datasets and to the systematic search with detailed analysis of the information of the datasets. The efficiency of the service is shown in the example of analyzing transcriptome data of activated (polarized) cells. We have performed a systematic search of studies of cell polarization (when cells are exposed to different immune stimuli). The web interface for ARGEOS is user-friendly and straightforward. It can be used by a person who is not familiar with database searching.
Collapse
|
10
|
Dixit NK. Design of Monovalent and Chimeric Tetravalent Dengue Vaccine Using an Immunoinformatics Approach. Int J Pept Res Ther 2021; 27:2607-2624. [PMID: 34602919 PMCID: PMC8475484 DOI: 10.1007/s10989-021-10277-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/24/2021] [Indexed: 12/15/2022]
Abstract
An immunoinformatics technique was used to predict a monovalent amide immunogen candidate capable of producing therapeutic antibodies as well as a potent immunogen candidate capable of acting as a universal vaccination against all dengue fever virus serotypes. The capsid protein is an attractive goal for anti-DENV due to its position in the dengue existence cycle. The widely accessible immunological data, advances in antigenic peptide prediction using reverse vaccinology, and the introduction of molecular docking in immunoinformatics have directed vaccine manufacturing. The C-proteins of DENV-1-4 serotypes were known as antigens to assist with logical design. Binding epitopes for TC cells, TH cells, and B cells is predicted from structural dengue virus capsid proteins. Each T cell epitope of C-protein integrated with a B cell as a templet was used as a vaccine and produce antibodies in contrast to serotype of the dengue virus. A chimeric tetravalent vaccine was created by combining four vaccines, each representing four dengue serotypes, to serve as a standard vaccine candidate for all four Sero groups. The LKRARNRVS, RGFRKEIGR, KNGAIKVLR, and KAINVLRGF from dengue 1, dengue 2, dengue 3, and dengue 4 epitopes may be essential immunotherapeutic representatives for controlling outbreaks.
Collapse
Affiliation(s)
- Neeraj Kumar Dixit
- Department of Biotechnology, Saroj Institute of Technology & Management, Lucknow, Utter Pradesh India
| |
Collapse
|
11
|
Multi-Omic Meta-Analysis of Transcriptomes and the Bibliome Uncovers Novel Hypoxia-Inducible Genes. Biomedicines 2021; 9:biomedicines9050582. [PMID: 34065451 PMCID: PMC8160971 DOI: 10.3390/biomedicines9050582] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/15/2021] [Accepted: 05/19/2021] [Indexed: 12/30/2022] Open
Abstract
Hypoxia is a condition in which cells, tissues, or organisms are deprived of sufficient oxygen supply. Aerobic organisms have a hypoxic response system, represented by hypoxia-inducible factor 1-α (HIF1A), to adapt to this condition. Due to publication bias, there has been little focus on genes other than well-known signature hypoxia-inducible genes. Therefore, in this study, we performed a meta-analysis to identify novel hypoxia-inducible genes. We searched publicly available transcriptome databases to obtain hypoxia-related experimental data, retrieved the metadata, and manually curated it. We selected the genes that are differentially expressed by hypoxic stimulation, and evaluated their relevance in hypoxia by performing enrichment analyses. Next, we performed a bibliometric analysis using gene2pubmed data to examine genes that have not been well studied in relation to hypoxia. Gene2pubmed data provides information about the relationship between genes and publications. We calculated and evaluated the number of reports and similarity coefficients of each gene to HIF1A, which is a representative gene in hypoxia studies. In this data-driven study, we report that several genes that were not known to be associated with hypoxia, including the G protein-coupled receptor 146 gene, are upregulated by hypoxic stimulation.
Collapse
|
12
|
Bandeira N, Deutsch EW, Kohlbacher O, Martens L, Vizcaíno JA. Data Management of Sensitive Human Proteomics Data: Current Practices, Recommendations, and Perspectives for the Future. Mol Cell Proteomics 2021; 20:100071. [PMID: 33711481 PMCID: PMC8056256 DOI: 10.1016/j.mcpro.2021.100071] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/01/2021] [Accepted: 03/02/2021] [Indexed: 12/12/2022] Open
Abstract
Today it is the norm that all relevant proteomics data that support the conclusions in scientific publications are made available in public proteomics data repositories. However, given the increase in the number of clinical proteomics studies, an important emerging topic is the management and dissemination of clinical, and thus potentially sensitive, human proteomics data. Both in the United States and in the European Union, there are legal frameworks protecting the privacy of individuals. Implementing privacy standards for publicly released research data in genomics and transcriptomics has led to processes to control who may access the data, so-called "controlled access" data. In parallel with the technological developments in the field, it is clear that the privacy risks of sharing proteomics data need to be properly assessed and managed. In our view, the proteomics community must be proactive in addressing these issues. Yet a careful balance must be kept. On the one hand, neglecting to address the potential of identifiability in human proteomics data could lead to reputational damage of the field, while on the other hand, erecting barriers to open access to clinical proteomics data will inevitably reduce reuse of proteomics data and could substantially delay critical discoveries in biomedical research. In order to balance these apparently conflicting requirements for data privacy and efficient use and reuse of research efforts through the sharing of clinical proteomics data, development efforts will be needed at different levels including bioinformatics infrastructure, policymaking, and mechanisms of oversight.
Collapse
Affiliation(s)
- Nuno Bandeira
- Center for Computational Mass Spectrometry, University of California, San Diego (UCSD), La Jolla, California, USA; Department Computer Science and Engineering, University of California, San Diego (UCSD), La Jolla, California, USA; Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego (UCSD), La Jolla, California, USA
| | | | - Oliver Kohlbacher
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany; Quantitative Biology Center, University of Tübingen, Tübingen, Germany; Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany; Institute for Translational Bioinformatics, University Hospital Tübingen, Tübingen, Germany
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, Ghent, Belgium; Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| |
Collapse
|
13
|
Bono H. Meta-Analysis of Oxidative Transcriptomes in Insects. Antioxidants (Basel) 2021; 10:antiox10030345. [PMID: 33669076 PMCID: PMC7996572 DOI: 10.3390/antiox10030345] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 02/17/2021] [Accepted: 02/22/2021] [Indexed: 11/29/2022] Open
Abstract
Data accumulation in public databases has resulted in extensive use of meta-analysis, a statistical analysis that combines the results of multiple studies. Oxidative stress occurs when there is an imbalance between free radical activity and antioxidant activity, which can be studied in insects by transcriptome analysis. This study aimed to apply a meta-analysis approach to evaluate insect oxidative transcriptomes using publicly available data. We collected oxidative stress response-related RNA sequencing (RNA-seq) data for a wide variety of insect species, mainly from public gene expression databases, by manual curation. Only RNA-seq data of Drosophila melanogaster were found and were systematically analyzed using a newly developed RNA-seq analysis workflow for species without a reference genome sequence. The results were evaluated by two metric methods to construct a reference dataset for oxidative stress response studies. Many genes were found to be downregulated under oxidative stress and related to organ system process (GO:0003008) and adherens junction organization (GO:0034332) by gene enrichment analysis. A cross-species analysis was also performed. RNA-seq data of Caenorhabditis elegans were curated, since no RNA-seq data of insect species are currently available in public databases. This method, including the workflow developed, represents a powerful tool for deciphering conserved networks in oxidative stress response.
Collapse
Affiliation(s)
- Hidemasa Bono
- Program of Biomedical Science, Graduate School of Integrated Sciences for Life, Hiroshima University, 3-10-23 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-0046, Japan; ; Tel.: +81-82-424-4013
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan
| |
Collapse
|
14
|
Fukuda A, Kodama Y, Mashima J, Fujisawa T, Ogasawara O. DDBJ update: streamlining submission and access of human data. Nucleic Acids Res 2021; 49:D71-D75. [PMID: 33156332 PMCID: PMC7779041 DOI: 10.1093/nar/gkaa982] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 10/09/2020] [Accepted: 10/12/2020] [Indexed: 01/25/2023] Open
Abstract
The Bioinformation and DDBJ Center (DDBJ Center, https://www.ddbj.nig.ac.jp) provides databases that capture, preserve and disseminate diverse biological data to support research in the life sciences. This center collects nucleotide sequences with annotations, raw sequencing data, and alignment information from high-throughput sequencing platforms, and study and sample information, in collaboration with the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI). This collaborative framework is known as the International Nucleotide Sequence Database Collaboration (INSDC). In collaboration with the National Bioscience Database Center (NBDC), the DDBJ Center also provides a controlled-access database, the Japanese Genotype–phenotype Archive (JGA), which archives and distributes human genotype and phenotype data, requiring authorized access. The NBDC formulates guidelines and policies for sharing human data and reviews data submission and use applications. To streamline all of the processes at NBDC and JGA, we have integrated the two systems by introducing a unified login platform with a group structure in September 2020. In addition to the public databases, the DDBJ Center provides a computer resource, the NIG supercomputer, for domestic researchers to analyze large-scale genomic data. This report describes updates to the services of the DDBJ Center, focusing on the NBDC and JGA system enhancements.
Collapse
Affiliation(s)
- Asami Fukuda
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yuichi Kodama
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Jun Mashima
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Takatomo Fujisawa
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Osamu Ogasawara
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| |
Collapse
|
15
|
Wang X, Chai Z, Pan G, Hao Y, Li B, Ye T, Li Y, Long F, Xia L, Liu M. ExoBCD: a comprehensive database for exosomal biomarker discovery in breast cancer. Brief Bioinform 2020; 22:5860692. [PMID: 32591816 DOI: 10.1093/bib/bbaa088] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 03/08/2020] [Accepted: 04/26/2020] [Indexed: 12/24/2022] Open
Abstract
Effective and safe implementation of precision oncology for breast cancer is a vital strategy to improve patient outcomes, which relies on the application of reliable biomarkers. As 'liquid biopsy' and novel resource for biomarkers, exosomes provide a promising avenue for the diagnosis and treatment of breast cancer. Although several exosome-related databases have been developed, there is still lacking of an integrated database for exosome-based biomarker discovery. To this end, a comprehensive database ExoBCD (https://exobcd.liumwei.org) was constructed with the combination of robust analysis of four high-throughput datasets, transcriptome validation of 1191 TCGA cases and manual mining of 950 studies. In ExoBCD, approximately 20 900 annotation entries were integrated from 25 external sources and 306 exosomal molecules (49 potential biomarkers and 257 biologically interesting molecules). The latter could be divided into 3 molecule types, including 121 mRNAs, 172 miRNAs and 13 lncRNAs. Thus, the well-linked information about molecular characters, experimental biology, gene expression patterns, overall survival, functional evidence, tumour stage and clinical use were fully integrated. As a data-driven and literature-based paradigm proposed of biomarker discovery, this study also demonstrated the corroborative analysis and identified 36 promising molecules, as well as the most promising prognostic biomarkers, IGF1R and FRS2. Taken together, ExoBCD is the first well-corroborated knowledge base for exosomal studies of breast cancer. It not only lays a foundation for subsequent studies but also strengthens the studies of probing molecular mechanisms, discovering biomarkers and developing meaningful clinical use.
Collapse
Affiliation(s)
- Xuanyi Wang
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, China
| | - Zixuan Chai
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, China
| | - Guizhi Pan
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, China
| | - Ting Ye
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, China
| | - Yinghong Li
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, China
| | - Fei Long
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, China
| | - Lixin Xia
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, China
| | - Mingwei Liu
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, China
| |
Collapse
|
16
|
Karagkouni D, Paraskevopoulou MD, Tastsoglou S, Skoufos G, Karavangeli A, Pierros V, Zacharopoulou E, Hatzigeorgiou AG. DIANA-LncBase v3: indexing experimentally supported miRNA targets on non-coding transcripts. Nucleic Acids Res 2020; 48:D101-D110. [PMID: 31732741 PMCID: PMC7145509 DOI: 10.1093/nar/gkz1036] [Citation(s) in RCA: 133] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 10/16/2019] [Accepted: 11/13/2019] [Indexed: 12/11/2022] Open
Abstract
DIANA-LncBase v3.0 (www.microrna.gr/LncBase) is a reference repository with experimentally supported miRNA targets on non-coding transcripts. Its third version provides approximately half a million entries, corresponding to ∼240 000 unique tissue and cell type specific miRNA-lncRNA pairs. This compilation of interactions is derived from the manual curation of publications and the analysis of >300 high-throughput datasets. miRNA targets are supported by 14 experimental methodologies, applied to 243 distinct cell types and tissues in human and mouse. The largest part of the database is highly confident, AGO-CLIP-derived miRNA-binding events. LncBase v3.0 is the first relevant database to employ a robust CLIP-Seq-guided algorithm, microCLIP framework, to analyze 236 AGO-CLIP-Seq libraries and catalogue ∼370 000 miRNA binding events. The database was redesigned from the ground up, providing new functionalities. Known short variant information, on >67,000 experimentally supported target sites and lncRNA expression profiles in different cellular compartments are catered to users. Interactive visualization plots, portraying correlations of miRNA-lncRNA pairs, as well as lncRNA expression profiles in a wide range of cell types and tissues, are presented for the first time through a dedicated page. LncBase v3.0 constitutes a valuable asset for ncRNA research, providing new insights to the understanding of the still widely unexplored lncRNA functions.
Collapse
Affiliation(s)
- Dimitra Karagkouni
- DIANA-Lab, Department of Electrical and Computer Engineering, Univ. of Thessaly, 38221 Volos, Greece.,Hellenic Pasteur Institute, 11521 Athens, Greece.,Department of Computer Science and Biomedical Informatics, Univ. of Thessaly, 351 31 Lamia, Greece
| | - Maria D Paraskevopoulou
- DIANA-Lab, Department of Electrical and Computer Engineering, Univ. of Thessaly, 38221 Volos, Greece.,Hellenic Pasteur Institute, 11521 Athens, Greece
| | - Spyros Tastsoglou
- DIANA-Lab, Department of Electrical and Computer Engineering, Univ. of Thessaly, 38221 Volos, Greece.,Hellenic Pasteur Institute, 11521 Athens, Greece
| | - Giorgos Skoufos
- DIANA-Lab, Department of Electrical and Computer Engineering, Univ. of Thessaly, 38221 Volos, Greece.,Hellenic Pasteur Institute, 11521 Athens, Greece
| | - Anna Karavangeli
- DIANA-Lab, Department of Electrical and Computer Engineering, Univ. of Thessaly, 38221 Volos, Greece.,Department of Computer Science and Biomedical Informatics, Univ. of Thessaly, 351 31 Lamia, Greece
| | - Vasilis Pierros
- DIANA-Lab, Department of Electrical and Computer Engineering, Univ. of Thessaly, 38221 Volos, Greece.,Hellenic Pasteur Institute, 11521 Athens, Greece
| | - Elissavet Zacharopoulou
- DIANA-Lab, Department of Electrical and Computer Engineering, Univ. of Thessaly, 38221 Volos, Greece.,Department of Informatics and Telecommunications, Postgraduate Program: 'Information Technologies in Medicine and Biology', University of Athens, 15784 Athens, Greece
| | - Artemis G Hatzigeorgiou
- DIANA-Lab, Department of Electrical and Computer Engineering, Univ. of Thessaly, 38221 Volos, Greece.,Hellenic Pasteur Institute, 11521 Athens, Greece.,Department of Computer Science and Biomedical Informatics, Univ. of Thessaly, 351 31 Lamia, Greece
| |
Collapse
|
17
|
Sayers EW, Beck J, Brister JR, Bolton EE, Canese K, Comeau DC, Funk K, Ketter A, Kim S, Kimchi A, Kitts PA, Kuznetsov A, Lathrop S, Lu Z, McGarvey K, Madden TL, Murphy TD, O'Leary N, Phan L, Schneider VA, Thibaud-Nissen F, Trawick BW, Pruitt KD, Ostell J. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2020; 48:D9-D16. [PMID: 31602479 DOI: 10.1093/nar/gkz899] [Citation(s) in RCA: 267] [Impact Index Per Article: 66.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 10/09/2019] [Indexed: 11/14/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface, a sequence database search and a gene orthologs page. Additional resources that were updated in the past year include PMC, Bookshelf, My Bibliography, Assembly, RefSeq, viral genomes, the prokaryotic genome annotation pipeline, Genome Workbench, dbSNP, BLAST, Primer-BLAST, IgBLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jeff Beck
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathi Canese
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Donald C Comeau
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathryn Funk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Anne Ketter
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Avi Kimchi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Paul A Kitts
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Anatoliy Kuznetsov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stacy Lathrop
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kelly McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Thomas L Madden
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Nuala O'Leary
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Lon Phan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Bart W Trawick
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - James Ostell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
18
|
Bono H. All of gene expression (AOE): An integrated index for public gene expression databases. PLoS One 2020; 15:e0227076. [PMID: 31978081 PMCID: PMC6980531 DOI: 10.1371/journal.pone.0227076] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 12/10/2019] [Indexed: 12/12/2022] Open
Abstract
Gene expression data have been archived as microarray and RNA-seq datasets in two public databases, Gene Expression Omnibus (GEO) and ArrayExpress (AE). In 2018, the DNA DataBank of Japan started a similar repository called the Genomic Expression Archive (GEA). These databases are useful resources for the functional interpretation of genes, but have been separately maintained and may lack RNA-seq data, while the original sequence data are available in the Sequence Read Archive (SRA). We constructed an index for those gene expression data repositories, called All Of gene Expression (AOE), to integrate publicly available gene expression data. The web interface of AOE can graphically query data in addition to the application programming interface. By collecting gene expression data from RNA-seq in the SRA, AOE also includes data not included in GEO and AE. AOE is accessible as a search tool from the GEA website and is freely available at https://aoe.dbcls.jp/.
Collapse
Affiliation(s)
- Hidemasa Bono
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Mishima,Japan
- * E-mail:
| |
Collapse
|
19
|
Jean-Quartier C, Jeanquartier F, Holzinger A. Open Data for Differential Network Analysis in Glioma. Int J Mol Sci 2020; 21:E547. [PMID: 31952211 PMCID: PMC7013918 DOI: 10.3390/ijms21020547] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Revised: 12/29/2019] [Accepted: 01/03/2020] [Indexed: 12/20/2022] Open
Abstract
The complexity of cancer diseases demands bioinformatic techniques and translational research based on big data and personalized medicine. Open data enables researchers to accelerate cancer studies, save resources and foster collaboration. Several tools and programming approaches are available for analyzing data, including annotation, clustering, comparison and extrapolation, merging, enrichment, functional association and statistics. We exploit openly available data via cancer gene expression analysis, we apply refinement as well as enrichment analysis via gene ontology and conclude with graph-based visualization of involved protein interaction networks as a basis for signaling. The different databases allowed for the construction of huge networks or specified ones consisting of high-confidence interactions only. Several genes associated to glioma were isolated via a network analysis from top hub nodes as well as from an outlier analysis. The latter approach highlights a mitogen-activated protein kinase next to a member of histondeacetylases and a protein phosphatase as genes uncommonly associated with glioma. Cluster analysis from top hub nodes lists several identified glioma-associated gene products to function within protein complexes, including epidermal growth factors as well as cell cycle proteins or RAS proto-oncogenes. By using selected exemplary tools and open-access resources for cancer research and differential network analysis, we highlight disturbed signaling components in brain cancer subtypes of glioma.
Collapse
|
20
|
Meta-Analysis of Hypoxic Transcriptomes from Public Databases. Biomedicines 2020; 8:biomedicines8010010. [PMID: 31936636 PMCID: PMC7168238 DOI: 10.3390/biomedicines8010010] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 12/26/2019] [Accepted: 01/08/2020] [Indexed: 12/16/2022] Open
Abstract
Hypoxia is the insufficiency of oxygen in the cell, and hypoxia-inducible factors (HIFs) are central regulators of oxygen homeostasis. In order to obtain functional insights into the hypoxic response in a data-driven way, we attempted a meta-analysis of the RNA-seq data from the hypoxic transcriptomes archived in public databases. In view of methodological variability of archived data in the databases, we first manually curated RNA-seq data from appropriate pairs of transcriptomes before and after hypoxic stress. These included 128 human and 52 murine transcriptome pairs. We classified the results of experiments for each gene into three categories: upregulated, downregulated, and unchanged. Hypoxic transcriptomes were then compared between humans and mice to identify common hypoxia-responsive genes. In addition, meta-analyzed hypoxic transcriptome data were integrated with public ChIP-seq data on the known human HIFs, HIF-1 and HIF-2, to provide insights into hypoxia-responsive pathways involving direct transcription factor binding. This study provides a useful resource for hypoxia research. It also demonstrates the potential of a meta-analysis approach to public gene expression databases for selecting candidate genes from gene expression profiles generated under various experimental conditions.
Collapse
|
21
|
Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. IPD-IMGT/HLA Database. Nucleic Acids Res 2020; 48:D948-D955. [PMID: 31667505 PMCID: PMC7145640 DOI: 10.1093/nar/gkz950] [Citation(s) in RCA: 309] [Impact Index Per Article: 77.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/03/2019] [Accepted: 10/29/2019] [Indexed: 11/14/2022] Open
Abstract
The IPD-IMGT/HLA Database, http://www.ebi.ac.uk/ipd/imgt/hla/, currently contains over 25 000 allele sequence for 45 genes, which are located within the Major Histocompatibility Complex (MHC) of the human genome. This region is the most polymorphic region of the human genome, and the levels of polymorphism seen exceed most other genes. Some of the genes have several thousand variants and are now termed hyperpolymorphic, rather than just simply polymorphic. The IPD-IMGT/HLA Database has provided a stable, highly accessible, user-friendly repository for this information, providing the scientific and medical community access to the many variant sequences of this gene system, that are critical for the successful outcome of transplantation. The number of currently known variants, and dramatic increase in the number of new variants being identified has necessitated a dedicated resource with custom tools for curation and publication. The challenge for the database is to continue to provide a highly curated database of sequence variants, while supporting the increased number of submissions and complexity of sequences. In order to do this, traditional methods of accessing and presenting data will be challenged, and new methods will need to be utilized to keep pace with new discoveries.
Collapse
Affiliation(s)
- James Robinson
- Anthony Nolan Research Institute, London, UK
- UCL Cancer Institute, University College London (UCL), London, UK
| | | | | | | | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Steven G E Marsh
- Anthony Nolan Research Institute, London, UK
- UCL Cancer Institute, University College London (UCL), London, UK
| |
Collapse
|
22
|
Ogasawara O, Kodama Y, Mashima J, Kosuge T, Fujisawa T. DDBJ Database updates and computational infrastructure enhancement. Nucleic Acids Res 2020; 48:D45-D50. [PMID: 31724722 PMCID: PMC7145692 DOI: 10.1093/nar/gkz982] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 10/10/2019] [Accepted: 10/21/2019] [Indexed: 12/30/2022] Open
Abstract
The Bioinformation and DDBJ Center (https://www.ddbj.nig.ac.jp) in the National Institute of Genetics (NIG) maintains a primary nucleotide sequence database as a member of the International Nucleotide Sequence Database Collaboration (INSDC) in partnership with the US National Center for Biotechnology Information and the European Bioinformatics Institute. The NIG operates the NIG supercomputer as a computational basis for the construction of DDBJ databases and as a large-scale computational resource for Japanese biologists and medical researchers. In order to accommodate the rapidly growing amount of deoxyribonucleic acid (DNA) nucleotide sequence data, NIG replaced its supercomputer system, which is designed for big data analysis of genome data, in early 2019. The new system is equipped with 30 PB of DNA data archiving storage; large-scale parallel distributed file systems (13.8 PB in total) and 1.1 PFLOPS computation nodes and graphics processing units (GPUs). Moreover, as a starting point of developing multi-cloud infrastructure of bioinformatics, we have also installed an automatic file transfer system that allows users to prevent data lock-in and to achieve cost/performance balance by exploiting the most suitable environment from among the supercomputer and public clouds for different workloads.
Collapse
Affiliation(s)
- Osamu Ogasawara
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Yuichi Kodama
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Jun Mashima
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Takehide Kosuge
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| | - Takatomo Fujisawa
- The Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, 411-8540, Japan
| |
Collapse
|
23
|
Rajaram S, Roth MA, Malato J, VandenBerg S, Hann B, Atreya CE, Altschuler SJ, Wu LF. A multi-modal data resource for investigating topographic heterogeneity in patient-derived xenograft tumors. Sci Data 2019; 6:253. [PMID: 31672976 PMCID: PMC6823477 DOI: 10.1038/s41597-019-0225-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Accepted: 09/11/2019] [Indexed: 12/20/2022] Open
Abstract
Patient-derived xenografts (PDXs) are an essential pre-clinical resource for investigating tumor biology. However, cellular heterogeneity within and across PDX tumors can strongly impact the interpretation of PDX studies. Here, we generated a multi-modal, large-scale dataset to investigate PDX heterogeneity in metastatic colorectal cancer (CRC) across tumor models, spatial scales and genomic, transcriptomic, proteomic and imaging assay modalities. To showcase this dataset, we present analysis to assess sources of PDX variation, including anatomical orientation within the implanted tumor, mouse contribution, and differences between replicate PDX tumors. A unique aspect of our dataset is deep characterization of intra-tumor heterogeneity via immunofluorescence imaging, which enables investigation of variation across multiple spatial scales, from subcellular to whole tumor levels. Our study provides a benchmark data resource to investigate PDX models of metastatic CRC and serves as a template for future, quantitative investigations of spatial heterogeneity within and across PDX tumor models.
Collapse
Affiliation(s)
- Satwik Rajaram
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, USA.
- Lyda Hill Department of Bioinformatics and Department of Pathology, University of Texas Southwestern Medical Center, Dallas, Texas, USA.
| | - Maike A Roth
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, USA
| | - Julia Malato
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA
| | - Scott VandenBerg
- Department of Pathology, University of California San Francisco, San Francisco, CA, USA
- Biorepository and Tissue Biomarker Technology Core, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA
| | - Byron Hann
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA
| | - Chloe E Atreya
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA
- Division of Hematology/Oncology, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Steven J Altschuler
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, USA.
| | - Lani F Wu
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, USA.
| |
Collapse
|
24
|
Staley J, Mazloom R, Lowe P, Newsum CT, Jaberi-Douraki M, Riviere J, Wyckoff GJ. Novel Data Sharing Agreement to Accelerate Big Data Translational Research Projects in the One Health Sphere. Top Companion Anim Med 2019; 37:100367. [PMID: 31837758 DOI: 10.1016/j.tcam.2019.100367] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Revised: 09/26/2019] [Accepted: 09/26/2019] [Indexed: 01/16/2023]
Abstract
When conducting translational research, the ability to share data generated by researchers and clinicians working with for-profit companies is essential, particularly in cases that involve "one health" data (i.e., data that could come from human, animal, or environmental sources). The 1DATA Project, a collaboration between Kansas State University and the University of Missouri, has examined and overcome some of the barriers to sharing this information for "big data" projects. This article discusses some of the obstacles we encountered, and the ways those obstacles can be surmounted via a novel form of Master Sharing Agreement. Developed in collaboration with industry partners, it is presented here as a template for expediting future one health work.
Collapse
Affiliation(s)
- Joshua Staley
- Veterinary Biomedical Sciences, Kansas State University, Olathe 22201 West Innovation Drive, Olathe, KS, USA
| | - Reza Mazloom
- Institute of Computational Comparative Medicine, Department of Anatomy and Physiology, Kansas State University, Manhattan, KS, USA
| | - Paul Lowe
- Office of the Vice President for Research, Kansas State University, Manhattan, KS, USA
| | - C T Newsum
- Aratana Therapeutics, Inc., 11400 Tomahawk Creek Pkwy #340, Leawood, KS, USA
| | - Majid Jaberi-Douraki
- Institute of Computational Comparative Medicine, Department of Anatomy and Physiology, Kansas State University, Manhattan, KS, USA
| | - Jim Riviere
- Veterinary Biomedical Sciences, Kansas State University, Olathe 22201 West Innovation Drive, Olathe, KS, USA
| | - Gerald J Wyckoff
- Molecular Biology and Biochemistry, University of Missouri - Kansas City School of Biological and Chemical Sciences, Kansas City, MO, USA; Division of Pharmacology and Pharmaceutical Sciences, University of Missouri - Kansas City School of Pharmacy, Kansas City, MO, USA.
| |
Collapse
|
25
|
Conte F, Fiscon G, Licursi V, Bizzarri D, D'Antò T, Farina L, Paci P. A paradigm shift in medicine: A comprehensive review of network-based approaches. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194416. [PMID: 31382052 DOI: 10.1016/j.bbagrm.2019.194416] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 07/19/2019] [Accepted: 07/28/2019] [Indexed: 02/01/2023]
Abstract
Network medicine is a rapidly evolving new field of medical research, which combines principles and approaches of systems biology and network science, holding the promise to uncovering the causes and to revolutionize the diagnosis and treatments of human diseases. This new paradigm reflects the fact that human diseases are not caused by single molecular defects, but driven by complex interactions among a variety of molecular mediators. The complexity of these interactions embraces different types of information: from the cellular-molecular level of protein-protein interactions to correlational studies of gene expression and regulation, to metabolic and disease pathways up to drug-disease relationships. The analysis of these complex networks can reveal new disease genes and/or disease pathways and identify possible targets for new drug development, as well as new uses for existing drugs. In this review, we offer a comprehensive overview of network types and algorithms used in the framework of network medicine. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Federica Conte
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| | - Giulia Fiscon
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy.
| | - Valerio Licursi
- Biology and Biotechnology Department "Charles Darwin" (BBCD), Sapienza University of Rome, Rome, Italy
| | - Daniele Bizzarri
- Department of Internal Medicine and Medical Specialties, Sapienza University of Rome, Rome, Italy
| | - Tommaso D'Antò
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Lorenzo Farina
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Paola Paci
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| |
Collapse
|