1
|
Zhao J, Qian F, Li X, Yu Z, Zhu J, Yu R, Zhao Y, Ding K, Li Y, Yang Y, Pan Q, Chen J, Song C, Wang Q, Zhang J, Wang G, Li C. CanMethdb: a database for genome-wide DNA methylation annotation in cancers. Bioinformatics 2022; 39:6881077. [PMID: 36477791 PMCID: PMC9825769 DOI: 10.1093/bioinformatics/btac783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/30/2022] [Accepted: 12/06/2022] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION DNA methylation within gene body and promoters in cancer cells is well documented. An increasing number of studies showed that cytosine-phosphate-guanine (CpG) sites falling within other regulatory elements could also regulate target gene activation, mainly by affecting transcription factors (TFs) binding in human cancers. This led to the urgent need for comprehensively and effectively collecting distinct cis-regulatory elements and TF-binding sites (TFBS) to annotate DNA methylation regulation. RESULTS We developed a database (CanMethdb, http://meth.liclab.net/CanMethdb/) that focused on the upstream and downstream annotations for CpG-genes in cancers. This included upstream cis-regulatory elements, especially those involving distal regions to genes, and TFBS annotations for the CpGs and downstream functional annotations for the target genes, computed through integrating abundant DNA methylation and gene expression profiles in diverse cancers. Users could inquire CpG-target gene pairs for a cancer type through inputting a genomic region, a CpG, a gene name, or select hypo/hypermethylated CpG sets. The current version of CanMethdb documented a total of 38 986 060 CpG-target gene pairs (with 6 769 130 unique pairs), involving 385 217 CpGs and 18 044 target genes, abundant cis-regulatory elements and TFs for 33 TCGA cancer types. CanMethdb might help biologists perform in-depth studies of target gene regulations based on DNA methylations in cancer. AVAILABILITY AND IMPLEMENTATION The main program is available at https://github.com/chunquanlipathway/CanMethdb. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Zhengmin Yu
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang 421001, China,School of Computer, University of South China, Hengyang 421001, China,The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang 421001, China
| | - Jiang Zhu
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163711, China,College of Information and Computer Engineering, Northeast Forestry University, Harbin 150038, China
| | - Rui Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150088, China
| | - Yue Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150088, China
| | - Ke Ding
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150038, China
| | - Yanyu Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163711, China
| | - Yongsan Yang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163711, China
| | - Qi Pan
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Department of Rheumatology, Zhon gshan Hospital, Fudan University, Shanghai 200433, China
| | - Jiaxin Chen
- Shenzhen Bay Laboratory, Pingshan Translational Medicine Center, Shenzhen 518118, China
| | - Chao Song
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang 421001, China,School of Computer, University of South China, Hengyang 421001, China,The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang 421001, China
| | - Qiuyu Wang
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang 421001, China,School of Computer, University of South China, Hengyang 421001, China,The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang 421001, China
| | - Jian Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163711, China
| | - Guohua Wang
- To whom correspondence should be addressed. or
| | - Chunquan Li
- To whom correspondence should be addressed. or
| |
Collapse
|
2
|
Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:ijms232012272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
|
3
|
Park I, Jung J, Lee S, Park K, Ryu JW, Son MY, Cho HS, Kim DS. Characterization of terminal-ileal and colonic Crohn's disease in treatment-naïve paediatric patients based on transcriptomic profile using logistic regression. J Transl Med 2021; 19:250. [PMID: 34098982 PMCID: PMC8185924 DOI: 10.1186/s12967-021-02909-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 05/24/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Inflammatory bowel disease (IBD) is a chronic and idiopathic inflammatory disorder of the gastrointestinal tract and comprises ulcerative colitis (UC) and Crohn's disease (CD). Crohn's disease can affect any part of the gastrointestinal tract, but mainly the terminal ileum and colon. In the present study, we aimed to characterize terminal-ileal CD (ICD) and colonic CD (CCD) at the molecular level, which might enable a more optimized approach for the clinical care and scientific research of CD. METHODS We analyzed differentially expressed genes in samples from 23 treatment-naïve paediatric patients with CD and 25 non-IBD controls, and compared the data with previously published RNA-Seq data using multi-statistical tests and confidence intervals. We implemented functional profiling and proposed statistical methods for feature selection using a logistic regression model to identify genes that are highly associated in ICD or CCD. We also validated our final candidate genes in independent paediatric and adult cohorts. RESULTS We identified 550 genes specifically expressed in patients with CD compared with those in healthy controls (p < 0.05). Among these DEGs, 240 from patients with CCD were mainly involved in mitochondrial dysfunction, whereas 310 from patients with ICD were enriched in the ileum functions such as digestion, absorption, and metabolism. To choose the most effective gene set, we selected the most powerful genes (p-value ≤ 0.05, accuracy ≥ 0.8, and AUC ≥ 0.8) using logistic regression. Consequently, 33 genes were identified as useful for discriminating CD location; the accuracy and AUC were 0.86 and 0.83, respectively. We then validated the 33 genes with data from another independent paediatric cohort (accuracy = 0.93, AUC = 0.92) and adult cohort (accuracy = 0.88, AUC = 0.72). CONCLUSIONS In summary, we identified DEGs that are specifically expressed in CCD and ICD compared with those in healthy controls and patients with UC. Based on the feature selection analysis, 33 genes were identified as useful for discriminating CCD and ICD with high accuracy and AUC, for not only paediatric patients but also independent cohorts. We propose that our approach and the final gene set are useful for the molecular classification of patients with CD, and it could be beneficial in treatments based on disease location.
Collapse
Affiliation(s)
- Ilkyu Park
- Department of Bioinformatics, KRIBB School of Bioscience, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, Korea.,Department of Environmental Disease Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Korea
| | - Jaeeun Jung
- Department of Environmental Disease Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Korea
| | - Sugi Lee
- Department of Bioinformatics, KRIBB School of Bioscience, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, Korea.,Department of Environmental Disease Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Korea
| | - Kunhyang Park
- Department of Core Facility Management Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-gu, Daejeon, Korea
| | - Jea-Woon Ryu
- Department of Rare Disease Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-gu, Daejeon, Korea
| | - Mi-Young Son
- Department of Stem Cell Convergence Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-gu, Daejeon, Korea.
| | - Hyun-Soo Cho
- Department of Stem Cell Convergence Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-gu, Daejeon, Korea.
| | - Dae-Soo Kim
- Department of Bioinformatics, KRIBB School of Bioscience, Korea University of Science and Technology (UST), 217 Gajeong-ro, Yuseong-gu, Daejeon, Korea. .,Department of Environmental Disease Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), 125 Gwahak-ro, Yuseong-gu, Daejeon, 34141, Korea.
| |
Collapse
|
4
|
Jiang H, Tang S, Liu W, Zhang Y. Deep learning for COVID-19 chest CT (computed tomography) image analysis: A lesson from lung cancer. Comput Struct Biotechnol J 2021; 19:1391-1399. [PMID: 33680351 PMCID: PMC7923948 DOI: 10.1016/j.csbj.2021.02.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 02/17/2021] [Accepted: 02/20/2021] [Indexed: 12/31/2022] Open
Abstract
As a recent global health emergency, the quick and reliable diagnosis of COVID-19 is urgently needed. Thus, many artificial intelligence (AI)-base methods are proposed for COVID-19 chest CT (computed tomography) image analysis. However, there are very limited COVID-19 chest CT images publicly available to evaluate those deep neural networks. On the other hand, a huge amount of CT images from lung cancer are publicly available. To build a reliable deep learning model trained and tested with a larger scale dataset, the proposed model builds a public COVID-19 CT dataset, containing 1186 CT images synthesized from lung cancer CT images using CycleGAN. Additionally, various deep learning models are tested with synthesized or real chest CT images for COVID-19 and Non-COVID-19 classification. In comparison, all models achieve excellent results in accuracy, precision, recall and F1 score for both synthesized and real COVID-19 CT images, demonstrating the reliable of the synthesized dataset. The public dataset and deep learning models can facilitate the development of accurate and efficient diagnostic testing for COVID-19.
Collapse
Affiliation(s)
- Hao Jiang
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China
| | - Shiming Tang
- School of Computing and Engineering, University of Missouri-Kansas City, MO, United States
| | - Weihuang Liu
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Yang Zhang
- College of Science, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| |
Collapse
|