1
|
Muthamilselvan S, Palaniappan A. BrcaDx: precise identification of breast cancer from expression data using a minimal set of features. FRONTIERS IN BIOINFORMATICS 2023; 3:1103493. [PMID: 37287543 PMCID: PMC10242386 DOI: 10.3389/fbinf.2023.1103493] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 05/15/2023] [Indexed: 06/09/2023] Open
Abstract
Background: Breast cancer is the foremost cancer in worldwide incidence, surpassing lung cancer notwithstanding the gender bias. One in four cancer cases among women are attributable to cancers of the breast, which are also the leading cause of death in women. Reliable options for early detection of breast cancer are needed. Methods: Using public-domain datasets, we screened transcriptomic profiles of breast cancer samples, and identified progression-significant linear and ordinal model genes using stage-informed models. We then applied a sequence of machine learning techniques, namely, feature selection, principal components analysis, and k-means clustering, to train a learner to discriminate "cancer" from "normal" based on expression levels of identified biomarkers. Results: Our computational pipeline yielded an optimal set of nine biomarker features for training the learner, namely, NEK2, PKMYT1, MMP11, CPA1, COL10A1, HSD17B13, CA4, MYOC, and LYVE1. Validation of the learned model on an independent test dataset yielded a performance of 99.5% accuracy. Blind validation on an out-of-domain external dataset yielded a balanced accuracy of 95.5%, demonstrating that the model has effectively reduced the dimensionality of the problem, and learnt the solution. The model was rebuilt using the full dataset, and then deployed as a web app for non-profit purposes at: https://apalania.shinyapps.io/brcadx/. To our knowledge, this is the best-performing freely available tool for the high-confidence diagnosis of breast cancer, and represents a promising aid to medical diagnosis.
Collapse
|
2
|
Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species. Genes (Basel) 2022; 13:genes13111982. [DOI: 10.3390/genes13111982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 09/20/2022] [Accepted: 09/23/2022] [Indexed: 11/16/2022] Open
Abstract
Although several biclustering algorithms have been studied, few are used for cross-pattern identification across species using multi-omics data mining. A fast empirical Bayesian biclustering (Bi-EB) algorithm is developed to detect the patterns shared from both integrated omics data and between species. The Bi-EB algorithm addresses the clinical critical translational question using the bioinformatics strategy, which addresses how modules of genotype variation associated with phenotype from cancer cell screening data can be identified and how these findings can be directly translated to a cancer patient subpopulation. Empirical Bayesian probabilistic interpretation and ratio strategy are proposed in Bi-EB for the first time to detect the pairwise regulation patterns among species and variations in multiple omics on a gene level, such as proteins and mRNA. An expectation–maximization (EM) optimal algorithm is used to extract the foreground co-current variations out of its background noise data by adjusting parameters with bicluster membership probability threshold Ac; and the bicluster average probability p. Three simulation experiments and two real biology mRNA and protein data analyses conducted on the well-known Cancer Genomics Atlas (TCGA) and The Cancer Cell Line Encyclopedia (CCLE) verify that the proposed Bi-EB algorithm can significantly improve the clustering recovery and relevance accuracy, outperforming the other seven biclustering methods—Cheng and Church (CC), xMOTIFs, BiMax, Plaid, Spectral, FABIA, and QUBIC—with a recovery score of 0.98 and a relevance score of 0.99. At the same time, the Bi-EB algorithm is used to determine shared the causality patterns of mRNA to the protein between patients and cancer cells in TCGA and CCLE breast cancer. The clinically well-known treatment target protein module estrogen receptor (ER), ER (p118), AR, BCL2, cyclin E1, and IGFBP2 are identified in accordance with their mRNA expression variations in the luminal-like subtype. Ten genes, including CCNB1, CDH1, KDR, RAB25, PRKCA, etc., found which can maintain the high accordance of mRNA–protein for both breast cancer patients and cell lines in basal-like subtypes for the first time. Bi-EB provides a useful biclustering analysis tool to discover the cross patterns hidden both in multiple data matrixes (omics) and species. The implementation of the Bi-EB method in the clinical setting will have a direct impact on administrating translational research based on the cancer cell screening guidance.
Collapse
|
3
|
Lovero D, D’Oronzo S, Palmirotta R, Cafforio P, Brown J, Wood S, Porta C, Lauricella E, Coleman R, Silvestris F. Correlation between targeted RNAseq signature of breast cancer CTCs and onset of bone-only metastases. Br J Cancer 2022; 126:419-429. [PMID: 34272498 PMCID: PMC8810805 DOI: 10.1038/s41416-021-01481-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 06/04/2021] [Accepted: 06/30/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Bone is the most frequent site of metastases from breast cancer (BC), but no biomarkers are yet available to predict skeletal dissemination. METHODS We attempted to identify a gene signature correlated with bone metastasis (BM) onset in circulating tumour cells (CTCs), isolated by a DEPArray-based protocol from 40 metastatic BC patients and grouped according to metastasis sites, namely "BM" (bone-only), "ES" (extra-skeletal) or BM + ES (bone + extra-skeletal). RESULTS A 134-gene panel was first validated through targeted RNA sequencing (RNAseq) on sub-clones of the MDA-MB-231 BC cell line with variable organotropism, which successfully shaped their clustering. The panel was then applied to CTC groups and, in particular, the "BM" vs "ES" CTC comparison revealed 31 differentially expressed genes, including MAF, CAPG, GIPC1 and IL1B, playing key prognostic roles in BC. CONCLUSION Such evidence confirms that CTCs are suitable biological sources for organotropism investigation through targeted RNAseq and might deserve future applications in wide-scale prospective studies.
Collapse
Affiliation(s)
- Domenica Lovero
- grid.7644.10000 0001 0120 3326Department of Biomedical Sciences and Human Oncology—Section of Internal Medicine and Clinical Oncology, University of Bari Aldo Moro, Bari, Italy
| | - Stella D’Oronzo
- grid.7644.10000 0001 0120 3326Department of Biomedical Sciences and Human Oncology—Section of Internal Medicine and Clinical Oncology, University of Bari Aldo Moro, Bari, Italy
| | - Raffaele Palmirotta
- grid.7644.10000 0001 0120 3326Department of Biomedical Sciences and Human Oncology—Section of Internal Medicine and Clinical Oncology, University of Bari Aldo Moro, Bari, Italy
| | - Paola Cafforio
- grid.7644.10000 0001 0120 3326Department of Biomedical Sciences and Human Oncology—Section of Internal Medicine and Clinical Oncology, University of Bari Aldo Moro, Bari, Italy
| | - Janet Brown
- grid.417079.c0000 0004 0391 9207Department of Oncology and Metabolism, University of Sheffield, Weston Park Hospital, Sheffield, UK
| | - Steven Wood
- grid.417079.c0000 0004 0391 9207Department of Oncology and Metabolism, University of Sheffield, Weston Park Hospital, Sheffield, UK
| | - Camillo Porta
- grid.7644.10000 0001 0120 3326Department of Biomedical Sciences and Human Oncology—Section of Internal Medicine and Clinical Oncology, University of Bari Aldo Moro, Bari, Italy
| | - Eleonora Lauricella
- grid.7644.10000 0001 0120 3326Department of Biomedical Sciences and Human Oncology—Section of Internal Medicine and Clinical Oncology, University of Bari Aldo Moro, Bari, Italy
| | - Robert Coleman
- grid.417079.c0000 0004 0391 9207Department of Oncology and Metabolism, University of Sheffield, Weston Park Hospital, Sheffield, UK
| | - Franco Silvestris
- grid.7644.10000 0001 0120 3326Department of Biomedical Sciences and Human Oncology—Section of Internal Medicine and Clinical Oncology, University of Bari Aldo Moro, Bari, Italy
| |
Collapse
|
4
|
Ren L, Li J, Wang C, Lou Z, Gao S, Zhao L, Wang S, Chaulagain A, Zhang M, Li X, Tang J. Single cell RNA sequencing for breast cancer: present and future. Cell Death Discov 2021; 7:104. [PMID: 33990550 PMCID: PMC8121804 DOI: 10.1038/s41420-021-00485-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 03/03/2021] [Accepted: 04/15/2021] [Indexed: 01/01/2023] Open
Abstract
Breast cancer is one of the most common malignant tumors in women. It is a heterogeneous disease related to genetic and environmental factors. Presently, the treatment of breast cancer still faces challenges due to recurrence and metastasis. The emergence of single-cell RNA sequencing (scRNA-seq) technology has brought new strategies to deeply understand the biological behaviors of breast cancer. By analyzing cell phenotypes and transcriptome differences at the single-cell level, scRNA-seq reveals the heterogeneity, dynamic growth and differentiation process of cells. This review summarizes the application of scRNA-seq technology in breast cancer research, such as in studies on cell heterogeneity, cancer cell metastasis, drug resistance, and prognosis. scRNA-seq technology is of great significance to deeply analyze the mechanism of breast cancer occurrence and development, identify new therapeutic targets and develop new therapeutic approaches for breast cancer.
Collapse
Affiliation(s)
- Lili Ren
- Department of Pathology, Harbin Medical University, Harbin, 150081, China
| | - Junyi Li
- Department of Pathology, Harbin Medical University, Harbin, 150081, China
| | - Chuhan Wang
- Department of Pathology, Harbin Medical University, Harbin, 150081, China
| | - Zheqi Lou
- Department of Pathology, Harbin Medical University, Harbin, 150081, China
| | - Shuangshu Gao
- Department of Pathology, Harbin Medical University, Harbin, 150081, China
| | - Lingyu Zhao
- Department of Pathology, Harbin Medical University, Harbin, 150081, China
| | - Shuoshuo Wang
- Department of Pathology, Harbin Medical University, Harbin, 150081, China
| | - Anita Chaulagain
- Department of Microbiology, Harbin Medical University, Harbin, 150081, China
| | - Minghui Zhang
- Department of Oncology, Chifeng City Hospital, Chifeng, 024000, China.
| | - Xiaobo Li
- Department of Pathology, Harbin Medical University, Harbin, 150081, China.
| | - Jing Tang
- Department of Pathology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
5
|
Li X, Cooper NGF, O'Toole TE, Rouchka EC. Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies. BMC Genomics 2020; 21:75. [PMID: 31992223 PMCID: PMC6986029 DOI: 10.1186/s12864-020-6502-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Accepted: 01/16/2020] [Indexed: 12/20/2022] Open
Abstract
Background High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. Of primary concern, there is no consensus regarding which normalization and statistical methods are the most appropriate for analyzing this data. The lack of standardized analytical methods leads to uncertainties in data interpretation and study reproducibility, especially with studies reporting high false discovery rates. In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths. Results Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. However, using an intra-group analysis of false positives from real and simulated data, we found that a Wald test performs better than an exact test when the number of sample replicates is large and that a QL F-test performs the best given sample sizes of 5, 10 and 15 for any normalization. The RLE, TMM and UQ methods performed similarly given a desired sample size. Conclusion We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. When the sample size is large, UQ-pgQ2 with a QL F-test is a better choice for the type I error control in an intra-group analysis. We observed read depths have a minimal impact for differential gene expression analysis based on the simulated data.
Collapse
Affiliation(s)
- Xiaohong Li
- Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, USA.
| | - Nigel G F Cooper
- Department of Anatomical Sciences and Neurobiology, University of Louisville, Louisville, KY, USA
| | | | - Eric C Rouchka
- Department of Computer Science and Engineering, University of Louisville, Louisville, KY, USA
| |
Collapse
|