1
|
Biswas B, Kumar N, Sugimoto M, Hoque MA. scHD4E: Novel ensemble learning-based differential expression analysis method for single-cell RNA-sequencing data. Comput Biol Med 2024; 178:108769. [PMID: 38897145 DOI: 10.1016/j.compbiomed.2024.108769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/14/2024] [Accepted: 06/15/2024] [Indexed: 06/21/2024]
Abstract
Differential expression (DE) analysis between cell types for scRNA-seq data by capturing its complicated features is crucial. Recently, different methods have been developed for targeting the scRNA-seq data analysis based on different modeling frameworks, assumptions, strategies and test statistic in considering various data features. The scDEA is an ensemble learning-based DE analysis method developed recently, yielding p-values using Lancaster's combination, generated by 12 individual DE analysis methods, and producing more accurate and stable results than individual methods. The objective of our study is to propose a new ensemble learning-based DE analysis method, scHD4E, using top performers in only 4 separate methods. The top performer 4 methods have been selected through an evaluation process using six real scRNA-seq data sets. We conducted comprehensive experiments for five experimental data sets to evaluate our proposed method based on the sample size effects, batch effects, type I error control, gene ontology enrichment analysis, runtime, identified matched DE genes, and semantic similarity measurement between methods. We also perform similar analyses (except the last 3 terms) and compute performance measures like accuracy, F1 score, Mathew's correlation coefficient etc. for a simulated data set. The results show that scHD4E is performs better than all the individual and scDEA methods in all the above perspectives. We expect that scHD4E will serve the modern data scientists for detecting the DEGs in scRNA-seq data analysis. To implement our proposed method, a Github R package scHD4E and its shiny application has been developed, and available in the following links: https://github.com/bbiswas1989/scHD4E and https://github.com/bbiswas1989/scHD4E-Shiny.
Collapse
Affiliation(s)
- Biplab Biswas
- Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh; Department of Statistics, Faculty of Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Nishith Kumar
- Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh.
| | - Masahiro Sugimoto
- Institute for Advanced Biosciences, Keio University 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan.
| | - Md Aminul Hoque
- Department of Statistics, Faculty of Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
2
|
Devall M, Eaton S, Yoshida C, Powell SM, Casey G, Li L. Assessment of Colorectal Cancer Risk Factors through the Application of Network-Based Approaches in a Racially Diverse Cohort of Colon Organoid Stem Cells. Cancers (Basel) 2023; 15:3550. [PMID: 37509213 PMCID: PMC10377524 DOI: 10.3390/cancers15143550] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/03/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
Numerous demographic factors have been associated with colorectal cancer (CRC) risk. To better define biological mechanisms underlying these associations, we performed RNA sequencing of stem-cell-enriched organoids derived from the healthy colons of seven European Americans and eight African Americans. A weighted gene co-expression network analysis was performed following RNA sequencing. Module-trait relationships were determined through the association testing of each module and five CRC risk factors (age, body mass index, sex, smoking history, and race). Only modules that displayed a significantly positive correlation for gene significance and module membership were considered for further investigation. In total, 16 modules were associated with known CRC risk factors (p < 0.05). To contextualize the role of risk modules in CRC, publicly available RNA-sequencing data from TCGA-COAD were downloaded and re-analyzed. Differentially expressed genes identified between tumors and matched normal-adjacent tissue were overlaid across each module. Loci derived from CRC genome-wide association studies were additionally overlaid across modules to identify robust putative targets of risk. Among them, MYBL2 and RXRA represented strong plausible drivers through which cigarette smoking and BMI potentially modulated CRC risk, respectively. In summary, our findings highlight the potential of the colon organoid system in identifying novel CRC risk mechanisms in an ancestrally diverse and cellularly relevant population.
Collapse
Affiliation(s)
- Matthew Devall
- Department of Family Medicine, University of Virginia, Charlottesville, VA 22903, USA (L.L.)
| | - Stephen Eaton
- Department of Family Medicine, University of Virginia, Charlottesville, VA 22903, USA (L.L.)
| | - Cynthia Yoshida
- Digestive Health Center, University of Virginia, Charlottesville, VA 22903, USA
| | - Steven M. Powell
- Digestive Health Center, University of Virginia, Charlottesville, VA 22903, USA
| | - Graham Casey
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA;
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| | - Li Li
- Department of Family Medicine, University of Virginia, Charlottesville, VA 22903, USA (L.L.)
- University of Virginia Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
3
|
Devall MA, Eaton S, Ali MW, Powell SM, Li L, Casey G. Insights into Early Onset Colorectal Cancer through Analysis of Normal Colon Organoids of Familial Adenomatous Polyposis Patients. Cancers (Basel) 2022; 14:4138. [PMID: 36077675 PMCID: PMC9454756 DOI: 10.3390/cancers14174138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 08/22/2022] [Accepted: 08/25/2022] [Indexed: 01/07/2023] Open
Abstract
Early onset colorectal cancer (EOCRC) rates have increased in recent decades. While lowering the recommended age for routine colonoscopies to 45 may reduce this burden, such measures do not address those who develop CRC before that age. Additional measures are needed to identify individuals at-risk for CRC. To better define transcriptomic events that precede the development of CRC, we performed RNA-sequencing analysis in colon organoids derived from seven healthy and six familial adenomatous polyposis (FAP) patients. This led to the identification of 2635 significant differentially expressed genes (FDR < 0.05). Through secondary analysis of publicly available datasets, we found that these genes were enriched for significant genes also present in FAP CRC and non-hereditary CRC datasets, including a subset that were unique to EOCRC. By exposing FAP colon organoids to a three-day ethanol treatment, we found that two EOCRC-relevant genes were also targets of CRC related lifestyle factors. Our data provides unique insight into the potential, early mechanisms of CRC development in colon epithelial cells, which may provide biomarkers for patient monitoring. We also show how modifiable lifestyle factors may further alter genes relevant to EOCRC, adding weight to the hypothesis that such factors represent an important contributor to increased EOCRC incidence.
Collapse
Affiliation(s)
- Matthew A. Devall
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
- Department of Family Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Stephen Eaton
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
- Department of Family Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | - Mourad W. Ali
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Steven M. Powell
- Digestive Health Center, University of Virginia, Charlottesville, VA 22903, USA
| | - Li Li
- Department of Family Medicine, University of Virginia, Charlottesville, VA 22903, USA
- Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22911, USA
| | - Graham Casey
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
- Comprehensive Cancer Center, University of Virginia, Charlottesville, VA 22911, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
4
|
Devall MA, Eaton S, Ali MW, Dampier CH, Weisenberger D, Powell SM, Li L, Casey G. DNA methylation analysis of normal colon organoids from familial adenomatous polyposis patients reveals novel insight into colon cancer development. Clin Epigenetics 2022; 14:104. [PMID: 35999641 PMCID: PMC9396789 DOI: 10.1186/s13148-022-01324-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 08/05/2022] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Familial adenomatous polyposis (FAP) is an inherited colorectal cancer (CRC) syndrome resulting from germ line mutations in the adenomatous polyposis coli (APC) gene. While FAP accounts for less than 1% of all CRC cases, loss of APC expression is seen in > 80% of non-hereditary CRCs. To better understand molecular mechanisms underlying APC-driven CRC, we performed an epigenome-wide analysis of colon organoids derived from normal-appearing colons of FAP patients versus healthy subjects to identify differentially methylated regions (DMRs) that may precede the onset of CRC. RESULTS We identified 358 DMRs when comparing colon organoids of FAP patients to those of healthy subjects (FDR < 0.05, |mean beta difference| = 5%). Of these, nearly 50% of DMRs were also differentially methylated in at least one of three CRC tumor and normal adjacent tissue (NAT) cohorts (TCGA-COAD, GSE193535 and ColoCare). Moreover, 27 of the DMRs mapped to CRC genome-wide association study (GWAS) loci. We provide evidence suggesting that some of these DMRs led to significant differences in gene expression of adjacent genes using quantitative PCR. For example, we identified significantly greater expression of five genes: Kazal-type serine peptidase inhibitor domain 1 (KAZALD1, P = 0.032), F-Box and leucine-rich repeat protein 8 (FBXL8, P = 0.036), TRIM31 antisense RNA 1 (TRIM31-AS1, P = 0.036), Fas apoptotic inhibitory molecule 2 (FAIM2, P = 0.049) and (Collagen beta (1-0)galactosyltransferase 2 (COLGALT2, P = 0.049). Importantly, both FBXL8 and TRIM31-AS1 were also significantly differentially expressed in TCGA-COAD tumor versus matched NAT, supporting a role for these genes in CRC tumor development. CONCLUSIONS We performed the first DNA methylome-wide analysis of normal colon organoids derived from FAP patients compared to those of healthy subjects. Our results reveal that normal colon organoids from FAP patients exhibit extensive epigenetic differences compared to those of healthy subjects that appear similar to those exhibited in CRC tumor. Our analyses therefore identify DMRs and candidate target genes that are potentially important in CRC tumor development in FAP, with potential implications for non-hereditary CRC.
Collapse
Affiliation(s)
- Matthew A. Devall
- grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Stephen Eaton
- grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Mourad Wagdy Ali
- grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Christopher H. Dampier
- grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Daniel Weisenberger
- grid.42505.360000 0001 2156 6853Department of Biochemistry and Molecular Medicine, University of Southern California, Los Angeles, CA USA
| | - Steven M. Powell
- grid.27755.320000 0000 9136 933XDigestive Health Center, University of Virginia, Charlottesville, VA USA
| | - Li Li
- grid.27755.320000 0000 9136 933XDepartment of Family Medicine, University of Virginia, Charlottesville, VA USA
| | - Graham Casey
- grid.27755.320000 0000 9136 933XCenter for Public Health Genomics, University of Virginia, Charlottesville, VA USA ,grid.27755.320000 0000 9136 933XDepartment of Public Health Sciences, University of Virginia, Charlottesville, VA USA
| |
Collapse
|