1
|
Uncovering Oncogenic Mechanisms of Tumor Suppressor Genes in Breast Cancer Multi-Omics Data. Int J Mol Sci 2022; 23:ijms23179624. [PMID: 36077026 PMCID: PMC9455665 DOI: 10.3390/ijms23179624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 08/16/2022] [Accepted: 08/19/2022] [Indexed: 11/17/2022] Open
Abstract
Tumor suppressor genes (TSGs) are essential genes in the development of cancer. While they have many roles in normal cells, mutation and dysregulation of the TSGs result in aberrant molecular processes in cancer cells. Therefore, understanding TSGs and their roles in the oncogenic process is crucial for prevention and treatment of cancer. In this research, multi-omics breast cancer data were used to identify molecular mechanisms of TSGs in breast cancer. Differentially expressed genes and differentially coexpressed genes were identified in four large-scale transcriptomics data from public repositories and multi-omics data analyses of copy number, methylation and gene expression were performed. The results of the analyses were integrated using enrichment analysis and meta-analysis of a p-value summation method. The integrative analysis revealed that TSGs have a significant relationship with genes of gene ontology terms that are related to cell cycle, genome stability, RNA processing and metastasis, indicating the regulatory mechanisms of TSGs on cancer cells. The analysis frame and research results will provide valuable information for the further identification of TSGs in different types of cancers.
Collapse
|
2
|
Park S, Yi G. Development of Gene Expression-Based Random Forest Model for Predicting Neoadjuvant Chemotherapy Response in Triple-Negative Breast Cancer. Cancers (Basel) 2022; 14:cancers14040881. [PMID: 35205629 PMCID: PMC8870575 DOI: 10.3390/cancers14040881] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 01/28/2022] [Accepted: 02/02/2022] [Indexed: 12/11/2022] Open
Abstract
Simple Summary Only 20–50% of patients with triple negative breast cancer achieve a pathological complete response from neoadjuvant chemotherapy, a strong indicator of patient survival. Therefore, there is an urgent need for a reliable predictive model of the patient’s pathological complete response prior to actual treatment. The purpose of this study was to develop such a model based on random forest recursive feature elimination and to benchmark the performance of the proposed model against existing predictive models. Our study suggests that an 86-gene-based random forest model associated to DNA repair and cell cycle mechanisms can provide reliable predictions of neoadjuvant chemotherapy response in patients with triple negative breast cancer. Abstract Neoadjuvant chemotherapy (NAC) response is an important indicator of patient survival in triple negative breast cancer (TNBC), but predicting chemosensitivity remains a challenge in clinical practice. We developed an 86-gene-based random forest (RF) classifier capable of predicting neoadjuvant chemotherapy response (pathological Complete Response (pCR) or Residual Disease (RD)) in TNBC patients. The performance of pCR classification of the proposed model was evaluated by Receiver Operating Characteristic (ROC) curve and Precision Recall (PR) curve. The AUROC and AUPRC of the proposed model on the test set were 0.891 and 0.829, respectively. At a predefined specificity (>90%), the proposed model shows a superior sensitivity compared to the best performing reported NAC response prediction model (69.2% vs. 36.9%). Moreover, the predicted pCR status by the model well explains the distance recurrence free survival (DRFS) of TNBC patients. In addition, the pCR probabilities of the proposed model using the expression profiles of the CCLE TNBC cell lines show a high Spearman rank correlation with cyclophosphamide sensitivity in the TNBC cell lines (SRCC =0.697, p-value =0.031). Associations between the 86 genes and DNA repair/cell cycle mechanisms were provided through function enrichment analysis. Our study suggests that the random forest-based prediction model provides a reliable prediction of the clinical response to neoadjuvant chemotherapy and may explain chemosensitivity in TNBC.
Collapse
|
3
|
Yao S, Rava B, Tong X, James G. Asymmetric Error Control Under Imperfect Supervision: A Label-Noise-Adjusted Neyman–Pearson Umbrella Algorithm. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2021.2016423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Shunan Yao
- Department of Mathematics, Dana and David Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, CA
| | - Bradley Rava
- Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA
| | - Xin Tong
- Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA
| | - Gareth James
- Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA
| |
Collapse
|
4
|
Introduction and Analysis of a Method for the Investigation of QCD-like Tree Data. ENTROPY 2022; 24:e24010104. [PMID: 35052130 PMCID: PMC8774677 DOI: 10.3390/e24010104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/04/2022] [Accepted: 01/06/2022] [Indexed: 11/16/2022]
Abstract
The properties of decays that take place during jet formation cannot be easily deduced from the final distribution of particles in a detector. In this work, we first simulate a system of particles with well-defined masses, decay channels, and decay probabilities. This presents the “true system” for which we want to reproduce the decay probability distributions. Assuming we only have the data that this system produces in the detector, we decided to employ an iterative method which uses a neural network as a classifier between events produced in the detector by the “true system” and some arbitrary “test system”. In the end, we compare the distributions obtained with the iterative method to the “true” distributions.
Collapse
|
5
|
Jung SY, Sobel EM, Pellegrini M, Yu H, Papp JC. Synergistic Effects of Genetic Variants of Glucose Homeostasis and Lifelong Exposures to Cigarette Smoking, Female Hormones, and Dietary Fat Intake on Primary Colorectal Cancer Development in African and Hispanic/Latino American Women. Front Oncol 2021; 11:760243. [PMID: 34692549 PMCID: PMC8529283 DOI: 10.3389/fonc.2021.760243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 09/22/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Disparities in cancer genomic science exist among racial/ethnic minorities. Particularly, African American (AA) and Hispanic/Latino American (HA) women, the 2 largest minorities, are underrepresented in genetic/genome-wide studies for cancers and their risk factors. We conducted on AA and HA postmenopausal women a genomic study for insulin resistance (IR), the main biologic mechanism underlying colorectal cancer (CRC) carcinogenesis owing to obesity. METHODS With 780 genome-wide IR-specific single-nucleotide polymorphisms (SNPs) among 4,692 AA and 1,986 HA women, we constructed a CRC-risk prediction model. Along with these SNPs, we incorporated CRC-associated lifestyles in the model of each group and detected the topmost influential genetic and lifestyle factors. Further, we estimated the attributable risk of the topmost risk factors shared by the groups to explore potential factors that differentiate CRC risk between these groups. RESULTS In both groups, we detected IR-SNPs in PCSK1 (in AA) and IFT172, GCKR, and NRBP1 (in HA) and risk lifestyles, including long lifetime exposures to cigarette smoking and endogenous female hormones and daily intake of polyunsaturated fatty acids (PFA), as the topmost predictive variables for CRC risk. Combinations of those top genetic- and lifestyle-markers synergistically increased CRC risk. Of those risk factors, dietary PFA intake and long lifetime exposure to female hormones may play a key role in mediating racial disparity of CRC incidence between AA and HA women. CONCLUSIONS Our results may improve CRC risk prediction performance in those medically/scientifically underrepresented groups and lead to the development of genetically informed interventions for cancer prevention and therapeutic effort, thus contributing to reduced cancer disparities in those minority subpopulations.
Collapse
Affiliation(s)
- Su Yon Jung
- Translational Sciences Section, Jonsson Comprehensive Cancer Center, School of Nursing, University of California, Los Angeles, Los Angeles, CA, United States
| | - Eric M. Sobel
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, Life Sciences Division, University of California, Los Angeles, Los Angeles, CA, United States
| | - Herbert Yu
- Cancer Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, United States
| | - Jeanette C. Papp
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
6
|
Jung SY. Genetic Signatures of Glucose Homeostasis: Synergistic Interplay With Long-Term Exposure to Cigarette Smoking in Development of Primary Colorectal Cancer Among African American Women. Clin Transl Gastroenterol 2021; 12:e00412. [PMID: 34608882 PMCID: PMC8500576 DOI: 10.14309/ctg.0000000000000412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 08/22/2021] [Indexed: 11/17/2022] Open
Abstract
INTRODUCTION Insulin resistance (IR)/glucose intolerance is a critical biologic mechanism for the development of colorectal cancer (CRC) in postmenopausal women. Whereas IR and excessive adiposity are more prevalent in African American (AA) women than in White women, AA women are underrepresented in genome-wide studies for systemic regulation of IR and the association with CRC risk. METHODS With 780 genome-wide IR single-nucleotide polymorphisms (SNPs) among 4,692 AA women, we tested for a causal inference between genetically elevated IR and CRC risk. Furthermore, by incorporating CRC-associated lifestyle factors, we established a prediction model on the basis of gene-environment interactions to generate risk profiles for CRC with the most influential genetic and lifestyle factors. RESUTLS In the pooled Mendelian randomization analysis, the genetically elevated IR was associated with 9 times increased risk of CRC, but with lack of analytic power. By addressing the variation of individual SNPs in CRC in the prediction model, we detected 4 fasting glucose-specific SNPs in GCK, PCSK1, and MTNR1B and 4 lifestyles, including smoking, aging, prolonged lifetime exposure to endogenous estrogen, and high fat intake, as the most predictive markers of CRC risk. Our joint test for those risk genotypes and lifestyles with smoking revealed the synergistically increased CRC risk, more substantially in women with longer-term exposure to cigarette smoking. DISCUSSION Our findings may improve CRC prediction ability among medically underrepresented AA women and highlight genetically informed preventive interventions (e.g., smoking cessation; CRC screening to longer-term smokers) for those women at high risk with risk genotypes and behavioral patterns.
Collapse
Affiliation(s)
- Su Yon Jung
- Translational Sciences Section, School of Nursing, University of California, Los Angeles, Los Angeles, California, USA; and
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, California, USA.
| |
Collapse
|
7
|
Affiliation(s)
- Yang Feng
- Department of Biostatistics School of Global Public Health, New York University New York New York USA
| | - Min Zhou
- Division of Science and Technology Beijing Normal University‐Hong Kong Baptist University United International College Zhuhai China
| | - Xin Tong
- Department of Data Sciences and Operations Marshall School of Business, University of Southern California Los Angeles California USA
| |
Collapse
|
8
|
Bokhari W, Bansal A. AEC Classifier: A Tree-Based Classifier with Error Control for Medical Disease Diagnosis and Other Applications. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2021. [DOI: 10.1142/s1793351x21400055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In medical disease diagnosis, the cost of a false negative could greatly outweigh the cost of a false positive. This is because the former could cost a life, whereas the latter may only cause medical costs and stress to the patient. The unique nature of this problem highlights the need of asymmetric error control for binary classification applications. In this domain, traditional machine learning classifiers may not be ideal as they do not provide a way to control the number of false negatives below a certain threshold. This paper proposes a novel tree-based binary classification algorithm that can control the number of false negatives with a mathematical guarantee, based on Neyman–Pearson (NP) Lemma. This classifier is evaluated on the data obtained from different heart studies and it predicts the risk of cardiac disease, not only with comparable accuracy and AUC-ROC score but also with full control over the number of false negatives. The methodology used to construct this classifier can be expanded to many more use cases, not only in medical disease diagnosis but also beyond as shown from analysis on different diverse datasets.
Collapse
Affiliation(s)
| | - Ajay Bansal
- Arizona State University at Tempe, Arizona, USA
| |
Collapse
|
9
|
Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model 2021; 61:2623-2640. [PMID: 34100609 DOI: 10.1021/acs.jcim.1c00160] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Machine learning classifiers trained on class imbalanced data are prone to overpredict the majority class. This leads to a larger misclassification rate for the minority class, which in many real-world applications is the class of interest. For binary data, the classification threshold is set by default to 0.5 which, however, is often not ideal for imbalanced data. Adjusting the decision threshold is a good strategy to deal with the class imbalance problem. In this work, we present two different automated procedures for the selection of the optimal decision threshold for imbalanced classification. A major advantage of our procedures is that they do not require retraining of the machine learning models or resampling of the training data. The first approach is specific for random forest (RF), while the second approach, named GHOST, can be potentially applied to any machine learning classifier. We tested these procedures on 138 public drug discovery data sets containing structure-activity data for a variety of pharmaceutical targets. We show that both thresholding methods improve significantly the performance of RF. We tested the use of GHOST with four different classifiers in combination with two molecular descriptors, and we found that most classifiers benefit from threshold optimization. GHOST also outperformed other strategies, including random undersampling and conformal prediction. Finally, we show that our thresholding procedures can be effectively applied to real-world drug discovery projects, where the imbalance and characteristics of the data vary greatly between the training and test sets.
Collapse
Affiliation(s)
- Carmen Esposito
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Gregory A Landrum
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland.,T5 Informatics GmbH, Spalenring 11, 4055 Basel, Switzerland
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
10
|
Li JJ, Chen YE, Tong X. A flexible model-free prediction-based framework for feature ranking. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2021; 22:124. [PMID: 35321091 PMCID: PMC8939838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Despite the availability of numerous statistical and machine learning tools for joint feature modeling, many scientists investigate features marginally, i.e., one feature at a time. This is partly due to training and convention but also roots in scientists' strong interests in simple visualization and interpretability. As such, marginal feature ranking for some predictive tasks, e.g., prediction of cancer driver genes, is widely practiced in the process of scientific discoveries. In this work, we focus on marginal ranking for binary classification, one of the most common predictive tasks. We argue that the most widely used marginal ranking criteria, including the Pearson correlation, the two-sample t test, and two-sample Wilcoxon rank-sum test, do not fully take feature distributions and prediction objectives into account. To address this gap in practice, we propose two ranking criteria corresponding to two prediction objectives: the classical criterion (CC) and the Neyman-Pearson criterion (NPC), both of which use model-free nonparametric implementations to accommodate diverse feature distributions. Theoretically, we show that under regularity conditions, both criteria achieve sample-level ranking that is consistent with their population-level counterpart with high probability. Moreover, NPC is robust to sampling bias when the two class proportions in a sample deviate from those in the population. This property endows NPC good potential in biomedical research where sampling biases are ubiquitous. We demonstrate the use and relative advantages of CC and NPC in simulation and real data studies. Our model-free objective-based ranking idea is extendable to ranking feature subsets and generalizable to other prediction tasks and learning objectives.
Collapse
Affiliation(s)
| | | | - Xin Tong
- Department of Data Sciences and Operations, Marshall Business School, University of Southern California
| |
Collapse
|
11
|
Xia L, Zhao R, Wu Y, Tong X. Intentional Control of Type I Error Over Unconscious Data Distortion: A Neyman–Pearson Approach to Text Classification. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2020.1740711] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Lucy Xia
- Department of ISOM, School of Business and Management, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Richard Zhao
- Department of Computer Science and Software Engineering, The Behrend College, The Pennsylvania State University, Erie, PA
| | - Yanhui Wu
- Faculty of Business and Economics, University of Hong Kong, Pokfulam, Hong Kong
- Department of Economics and Finance, University of Southern California, Los Angeles, CA
| | - Xin Tong
- Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA
| |
Collapse
|
12
|
Lyu J, Li JJ, Su J, Peng F, Chen YE, Ge X, Li W. DORGE: Discovery of Oncogenes and tumoR suppressor genes using Genetic and Epigenetic features. SCIENCE ADVANCES 2020; 6:6/46/eaba6784. [PMID: 33177077 PMCID: PMC7673741 DOI: 10.1126/sciadv.aba6784] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 09/29/2020] [Indexed: 05/09/2023]
Abstract
Data-driven discovery of cancer driver genes, including tumor suppressor genes (TSGs) and oncogenes (OGs), is imperative for cancer prevention, diagnosis, and treatment. Although epigenetic alterations are important for tumor initiation and progression, most known driver genes were identified based on genetic alterations alone. Here, we developed an algorithm, DORGE (Discovery of Oncogenes and tumor suppressoR genes using Genetic and Epigenetic features), to identify TSGs and OGs by integrating comprehensive genetic and epigenetic data. DORGE identified histone modifications as strong predictors for TSGs, and it found missense mutations, super enhancers, and methylation differences as strong predictors for OGs. We extensively validated DORGE-predicted cancer driver genes using independent functional genomics data. We also found that DORGE-predicted dual-functional genes (both TSGs and OGs) are enriched at hubs in protein-protein interaction and drug-gene networks. Overall, our study has deepened the understanding of epigenetic mechanisms in tumorigenesis and revealed previously undetected cancer driver genes.
Collapse
Affiliation(s)
- Jie Lyu
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA 92697, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA.
| | - Jianzhong Su
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fanglue Peng
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yiling Elaine Chen
- Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xinzhou Ge
- Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Wei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA 92697, USA.
| |
Collapse
|
13
|
Li JJ, Tong X. Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines. PATTERNS (NEW YORK, N.Y.) 2020; 1:100115. [PMID: 33073257 PMCID: PMC7546185 DOI: 10.1016/j.patter.2020.100115] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here, we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example.
Collapse
Affiliation(s)
- Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Xin Tong
- Department of Data Sciences and Operations, Marshall School of Business, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
14
|
Exploring the Possibility of a Recovery of Physics Process Properties from a Neural Network Model. ENTROPY 2020; 22:e22090994. [PMID: 33286763 PMCID: PMC7597324 DOI: 10.3390/e22090994] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 08/26/2020] [Accepted: 09/03/2020] [Indexed: 12/05/2022]
Abstract
The application of machine learning methods to particle physics often does not provide enough understanding of the underlying physics. An interpretable model which provides a way to improve our knowledge of the mechanism governing a physical system directly from the data can be very useful. In this paper, we introduce a simple artificial physical generator based on the Quantum chromodynamical (QCD) fragmentation process. The data simulated from the generator are then passed to a neural network model which we base only on the partial knowledge of the generator. We aimed to see if the interpretation of the generated data can provide the probability distributions of basic processes of such a physical system. This way, some of the information we omitted from the network model on purpose is recovered. We believe this approach can be beneficial in the analysis of real QCD processes.
Collapse
|
15
|
Rumora L, Majić I, Miler M, Medak D. Spatial video remote sensing for urban vegetation mapping using vegetation indices. Urban Ecosyst 2020. [DOI: 10.1007/s11252-020-01002-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
16
|
Li WV, Chen Y, Li JJ. TROM: A Testing-Based Method for Finding Transcriptomic Similarity of Biological Samples. STATISTICS IN BIOSCIENCES 2016; 9:105-136. [PMID: 28781712 DOI: 10.1007/s12561-016-9163-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Comparative transcriptomics has gained increasing popularity in genomic research thanks to the development of high-throughput technologies including microarray and next-generation RNA sequencing that have generated numerous transcriptomic data. An important question is to understand the conservation and divergence of biological processes in different species. We propose a testing-based method TROM (Transcriptome Overlap Measure) for comparing transcriptomes within or between different species, and provide a different perspective, in contrast to traditional correlation analyses, about capturing transcriptomic similarity. Specifically, the TROM method focuses on identifying associated genes that capture molecular characteristics of biological samples, and subsequently comparing the biological samples by testing the overlap of their associated genes. We use simulation and real data studies to demonstrate that TROM is more powerful in identifying similar transcriptomes and more robust to stochastic gene expression noise than Pearson and Spearman correlations. We apply TROM to compare the developmental stages of six Drosophila species, C. elegans, S. purpuratus, D. rerio and mouse liver, and find interesting correspondence patterns that imply conserved gene expression programs in the development of these species. The TROM method is available as an R package on CRAN (https://cran.r-project.org/package=TROM) with manuals and source codes available at http://www.stat.ucla.edu/~jingyi.li/software-and-data/trom.html.
Collapse
Affiliation(s)
- Wei Vivian Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Yiling Chen
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
| | - Jingyi Jessica Li
- Department of Statistics, University of California, Los Angeles, CA 90095-1554, USA
- Department of Human Genetics, University of California, Los Angeles, CA 90095-7088, USA
| |
Collapse
|