1
|
Chekouo T, Stingo FC, Class CA, Yan Y, Bohannan Z, Wei Y, Garcia-Manero G, Hanash S, Do KA. Investigating protein patterns in human leukemia cell line experiments: A Bayesian approach for extremely small sample sizes. Stat Methods Med Res 2019; 29:1181-1196. [PMID: 31172886 DOI: 10.1177/0962280219852721] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Human cancer cell line experiments are valuable for investigating drug sensitivity biomarkers. The number of biomarkers measured in these experiments is typically on the order of several thousand, whereas the number of samples is often limited to one or at most three replicates for each experimental condition. We have developed an innovative Bayesian approach that efficiently identifies clusters of proteins that exhibit similar patterns of expression. Motivated by the availability of ion mobility mass spectrometry data on cell line experiments in myelodysplastic syndrome and acute myeloid leukemia, our methodology can identify proteins that follow biologically meaningful trends of expression. Extensive simulation studies demonstrate good performance of the proposed method even in the presence of relatively small effects and sample sizes.
Collapse
Affiliation(s)
- Thierry Chekouo
- Department of Mathematics and Statistics, University of Calgary, Calgary, Canada
| | - Francesco C Stingo
- Department of Statistics, Computer Science, Applications "G. Parenti", University of Florence, Florence, Italy
| | - Caleb A Class
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yuanqing Yan
- Department of Neurosurgery, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Zachary Bohannan
- Division of Research, The University of Houston, Houston, TX, USA
| | - Yue Wei
- Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Samir Hanash
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
2
|
Li X, Choudhary PK, Biswas S, Wang X. A Bayesian latent variable approach to aggregation of partial and top-ranked lists in genomic studies. Stat Med 2018; 37:4266-4278. [PMID: 30094911 DOI: 10.1002/sim.7920] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 06/13/2018] [Accepted: 07/03/2018] [Indexed: 12/30/2022]
Abstract
In genomic research, it is becoming increasingly popular to perform meta-analysis, the practice of combining results from multiple studies that target a common essential biological problem. Rank aggregation, a robust meta-analytic approach, consolidates such studies at the rank level. There exists extensive research on this topic, and various methods have been developed in the past. However, these methods have two major limitations when they are applied in the genomic context. First, they are mainly designed to work with full lists, whereas partial and/or top-ranked lists prevail in genomic studies. Second, the component studies are often clustered, and the existing methods fail to utilize such information. To address the above concerns, a Bayesian latent variable approach, called BiG, is proposed to formally deal with partial and top-ranked lists and incorporate the effect of clustering. Various reasonable prior specifications for variance parameters in hierarchical models are carefully studied and compared. Simulation results demonstrate the superior performance of BiG compared with other popular rank aggregation methods under various practical settings. A non-small-cell lung cancer data example is analyzed for illustration.
Collapse
Affiliation(s)
- Xue Li
- Department of Statistical Science, Southern Methodist University, Dallas, Texas
| | | | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, Texas
| |
Collapse
|
3
|
Li C, Lee J, Ding J, Sun S. Integrative analysis of gene expression and methylation data for breast cancer cell lines. BioData Min 2018; 11:13. [PMID: 29983747 PMCID: PMC6019806 DOI: 10.1186/s13040-018-0174-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2018] [Accepted: 06/13/2018] [Indexed: 12/11/2022] Open
Abstract
Background The deadly costs of cancer and necessity for an accurate method of early cancer detection have demanded the identification of genetic and epigenetic factors associated with cancer. DNA methylation, an epigenetic event, plays an important role in cancer susceptibility. In this paper, we use DNA methylation and gene expression data integration and pathway analysis to further explore and understand the complex relationship between methylation and gene expression. Results Through linear modeling and analysis of variance, we obtain genes that show a significant correlation between methylation and gene expression. We then examine the functions and relationships of these genes using bioinformatic tools and databases. In particular, using ConsensusPathDB, we analyze the networks of statistically significant genes to identify hub genes, genes with a large number of links to other genes. We identify eight major hub genes, all in strong association with cancer susceptibility. Through further analysis of the function, gene expression level, and methylation level of these hub genes, we conclude that they are novel potential biomarkers for breast cancer. Conclusions Our findings have various implications for cancer screening, early detection methods, and potential novel treatments for cancer. Researchers can also use our results to develop more effective methods for cancer study. Electronic supplementary material The online version of this article (10.1186/s13040-018-0174-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Juyon Lee
- Korea International School Pangyo Campus, Seongnam, South Korea
| | - Jessica Ding
- Liberal Arts and Science Academy, Austin, Texas USA
| | - Shuying Sun
- 4Department of Mathematics, Texas State University, San Marcos, TX USA
| |
Collapse
|
4
|
Wang T, Xiao G, Chu Y, Zhang MQ, Corey DR, Xie Y. Design and bioinformatics analysis of genome-wide CLIP experiments. Nucleic Acids Res 2015; 43:5263-74. [PMID: 25958398 PMCID: PMC4477666 DOI: 10.1093/nar/gkv439] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 04/23/2015] [Indexed: 01/05/2023] Open
Abstract
The past decades have witnessed a surge of discoveries revealing RNA regulation as a central player in cellular processes. RNAs are regulated by RNA-binding proteins (RBPs) at all post-transcriptional stages, including splicing, transportation, stabilization and translation. Defects in the functions of these RBPs underlie a broad spectrum of human pathologies. Systematic identification of RBP functional targets is among the key biomedical research questions and provides a new direction for drug discovery. The advent of cross-linking immunoprecipitation coupled with high-throughput sequencing (genome-wide CLIP) technology has recently enabled the investigation of genome-wide RBP–RNA binding at single base-pair resolution. This technology has evolved through the development of three distinct versions: HITS-CLIP, PAR-CLIP and iCLIP. Meanwhile, numerous bioinformatics pipelines for handling the genome-wide CLIP data have also been developed. In this review, we discuss the genome-wide CLIP technology and focus on bioinformatics analysis. Specifically, we compare the strengths and weaknesses, as well as the scopes, of various bioinformatics tools. To assist readers in choosing optimal procedures for their analysis, we also review experimental design and procedures that affect bioinformatics analyses.
Collapse
Affiliation(s)
- Tao Wang
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390, USA
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390, USA
| | - Yongjun Chu
- Departments of Pharmacology and Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390, USA
| | - Michael Q Zhang
- Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, Richardson, TX 75080, USA Bioinformatics Division, Center for Synthetic and System Biology, TNLIST, Tsinghua University, Beijing 100084, China
| | - David R Corey
- Departments of Pharmacology and Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390, USA
| | - Yang Xie
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390, USA Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390, USA
| |
Collapse
|
5
|
Wang T, Xie Y, Xiao G. dCLIP: a computational approach for comparative CLIP-seq analyses. Genome Biol 2014; 15:R11. [PMID: 24398258 PMCID: PMC4054096 DOI: 10.1186/gb-2014-15-1-r11] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2013] [Accepted: 01/07/2014] [Indexed: 12/13/2022] Open
Abstract
Although comparison of RNA-protein interaction profiles across different conditions has become increasingly important to understanding the function of RNA-binding proteins (RBPs), few computational approaches have been developed for quantitative comparison of CLIP-seq datasets. Here, we present an easy-to-use command line tool, dCLIP, for quantitative CLIP-seq comparative analysis. The two-stage method implemented in dCLIP, including a modified MA normalization method and a hidden Markov model, is shown to be able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets, generated by HITS-CLIP, iCLIP and PAR-CLIP protocols. dCLIP is freely available at http://qbrc.swmed.edu/software/.
Collapse
|
6
|
Guanghua X, Xinlei W, Quincey L, Nestler EJ, Xie Y. Detection of epigenetic changes using ANOVA with spatially varying coefficients. Stat Appl Genet Mol Biol 2013; 12:189-205. [PMID: 23502341 DOI: 10.1515/sagmb-2012-0057] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Identification of genome-wide epigenetic changes, the stable changes in gene function without a change in DNA sequence, under various conditions plays an important role in biomedical research. High-throughput epigenetic experiments are useful tools to measure genome-wide epigenetic changes, but the measured intensity levels from these high-resolution genome-wide epigenetic profiling data are often spatially correlated with high noise levels. In addition, it is challenging to detect genome-wide epigenetic changes across multiple conditions, so efficient statistical methodology development is needed for this purpose. In this study, we consider ANOVA models with spatially varying coefficients, combined with a hierarchical Bayesian approach, to explicitly model spatial correlation caused by location-dependent biological effects (i.e., epigenetic changes) and borrow strength among neighboring probes to compare epigenetic changes across multiple conditions. Through simulation studies and applications in drug addiction and depression datasets, we find that our approach compares favorably with competing methods; it is more efficient in estimation and more effective in detecting epigenetic changes. In addition, it can provide biologically meaningful results.
Collapse
Affiliation(s)
- Xiao Guanghua
- Division of Biostatistics, Department of Clinical Sciences, The University of Texas Southwestern Medical Center at Dallas, TX 75390, USA
| | | | | | | | | |
Collapse
|