1
|
Schincaglia A, Pasti L, Cavazzini A, Purcaro G, Beccaria M. Optimization of headspace high-capacity tool coupled to two-dimensional gas chromatography-mass spectrometry for mapping the volatile organic compounds of raw pistachios. A proof-of-concept on the classification ability by geographic origin. Food Chem 2024; 460:140702. [PMID: 39116768 DOI: 10.1016/j.foodchem.2024.140702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/04/2024] [Accepted: 07/27/2024] [Indexed: 08/10/2024]
Abstract
An optimized procedure for extracting and analyzing raw pistachio volatiles was developed through headspace sampling with high-capacity tools and subsequent analysis using comprehensive two-dimensional gas chromatography coupled with mass spectrometry. The examination of 18 pistachio samples belonging to different geographic areas led to the identification of a set of 99 volatile organic compounds (VOCs). Molecules were putatively identified using linear retention index, mass spectra similarity, and two-dimensional plot location. The impact of preprocessing and processing techniques on the aligned data matrix from a set of samples of different geographical origins, after removing contaminants, was evaluated. The combination of scaling with log-transformation, normalization with z-score, and data reduction with random forest machine learning algorithm generated a panel of 16 discriminatory VOC molecules. As a proof of concept, raw pistachios' VOC profile was employed for the first time to tentatively classify them based on their geographical origin.
Collapse
Affiliation(s)
- Andrea Schincaglia
- Department of Chemical Pharmaceutical, and Agricultural Sciences, Via Luigi Borsari 46, 44121, University of Ferrara, Ferrara, Italy; Gembloux Agro-Bio Tech, Passage des Déportés 2, 5030, Gembloux, University of Liège, Belgium
| | - Luisa Pasti
- Department of Environmental and Prevention Sciences, Via L. Borsari 46, 44121, University of Ferrara, Ferrara, Italy
| | - Alberto Cavazzini
- Department of Chemical Pharmaceutical, and Agricultural Sciences, Via Luigi Borsari 46, 44121, University of Ferrara, Ferrara, Italy; Council for Agricultural Research and Economics, CREA, via della Navicella 2/4, Rome, 00184, Italy
| | - Giorgia Purcaro
- Gembloux Agro-Bio Tech, Passage des Déportés 2, 5030, Gembloux, University of Liège, Belgium.
| | - Marco Beccaria
- Department of Chemical Pharmaceutical, and Agricultural Sciences, Via Luigi Borsari 46, 44121, University of Ferrara, Ferrara, Italy; Organic and Biological Analytical Chemistry Group, MolSys Research Unit, University of Liège, 4000 Liège, Belgium.
| |
Collapse
|
2
|
Cortes-Guzman MA, Treviño V. CoGTEx: Unscaled system-level coexpression estimation from GTEx data forecast novel functional gene partners. PLoS One 2024; 19:e0309961. [PMID: 39365797 PMCID: PMC11451983 DOI: 10.1371/journal.pone.0309961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 08/21/2024] [Indexed: 10/06/2024] Open
Abstract
MOTIVATION Coexpression estimations are helpful for analysis of pathways, cofactors, regulators, targets, and human health and disease. Ideally, coexpression estimations should consider as many diverse cell types as possible and consider that available data is not uniform across tissues. Importantly, the coexpression estimations accessible today are performed on a "tissue level", which is based on cell type standardized formulations. Little or no attention is paid to overall gene expression levels. The tissue-level estimation assumes that variance expression levels are more important than mean expression levels. Here, we challenge this assumption by estimating a coexpression calculation at the "system level", which is estimated without standardization by tissue, and show that it provides valuable information. We made available a resource to view, download, and analyze both, tissue- and system-level coexpression estimations from GTEx human data. METHODS GTEx v8 expression data was globally normalized, batch-processed, and filtered. Then, PCA, clustering, and tSNE stringent procedures were applied to generate 42 distinct and curated tissue clusters. Coexpression was estimated from these 42 tissue clusters computing the correlation of 33,445 genes by sampling 70 samples per tissue cluster to avoid tissue overrepresentation. This process was repeated 20 times, extracting the minimum value provided as a robust estimation. Three metrics were calculated (Pearson, Spearman, and G-statistic) in two data processing modes, at the system-level (TPM scale) and tissue levels (z-score scale). RESULTS We first validate our tissue-level estimations compared with other databases. Then, by specific analyses in several examples and literature validations of predictions, we show that system-level coexpression estimation differs from tissue-level estimations and that both contain valuable information reflected in biological pathways. We also show that coexpression estimations are associated to transcriptional regulation. Finally, we present CoGTEx, a valuable resource for viewing and analyzing coexpressed genes in human adult tissues from GTEx v8 data. We introduce our web resource to list, view and explore the coexpressed genes from GTEx data. CONCLUSION We conclude that system-level coexpression is a novel and interesting coexpression metric capable of generating plausible predictions and biological hypotheses; and that CoGTEx is a valuable resource to view, compare, and download system- and tissue- level coexpression estimations from GTEx data. AVAILABILITY The web resource is available at http://bioinformatics.mx/cogtex.
Collapse
Affiliation(s)
| | - Víctor Treviño
- Tecnologico de Monterrey, Escuela de Medicina, Bioinformática, Monterrey, Nuevo León, México
- Tecnologico de Monterrey, OriGen Project, Monterrey, Nuevo León, México
| |
Collapse
|
3
|
Takahashi M, Chong HB, Zhang S, Yang TY, Lazarov MJ, Harry S, Maynard M, Hilbert B, White RD, Murrey HE, Tsou CC, Vordermark K, Assaad J, Gohar M, Dürr BR, Richter M, Patel H, Kryukov G, Brooijmans N, Alghali ASO, Rubio K, Villanueva A, Zhang J, Ge M, Makram F, Griesshaber H, Harrison D, Koglin AS, Ojeda S, Karakyriakou B, Healy A, Popoola G, Rachmin I, Khandelwal N, Neil JR, Tien PC, Chen N, Hosp T, van den Ouweland S, Hara T, Bussema L, Dong R, Shi L, Rasmussen MQ, Domingues AC, Lawless A, Fang J, Yoda S, Nguyen LP, Reeves SM, Wakefield FN, Acker A, Clark SE, Dubash T, Kastanos J, Oh E, Fisher DE, Maheswaran S, Haber DA, Boland GM, Sade-Feldman M, Jenkins RW, Hata AN, Bardeesy NM, Suvà ML, Martin BR, Liau BB, Ott CJ, Rivera MN, Lawrence MS, Bar-Peled L. DrugMap: A quantitative pan-cancer analysis of cysteine ligandability. Cell 2024; 187:2536-2556.e30. [PMID: 38653237 PMCID: PMC11143475 DOI: 10.1016/j.cell.2024.03.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/15/2024] [Accepted: 03/19/2024] [Indexed: 04/25/2024]
Abstract
Cysteine-focused chemical proteomic platforms have accelerated the clinical development of covalent inhibitors for a wide range of targets in cancer. However, how different oncogenic contexts influence cysteine targeting remains unknown. To address this question, we have developed "DrugMap," an atlas of cysteine ligandability compiled across 416 cancer cell lines. We unexpectedly find that cysteine ligandability varies across cancer cell lines, and we attribute this to differences in cellular redox states, protein conformational changes, and genetic mutations. Leveraging these findings, we identify actionable cysteines in NF-κB1 and SOX10 and develop corresponding covalent ligands that block the activity of these transcription factors. We demonstrate that the NF-κB1 probe blocks DNA binding, whereas the SOX10 ligand increases SOX10-SOX10 interactions and disrupts melanoma transcriptional signaling. Our findings reveal heterogeneity in cysteine ligandability across cancers, pinpoint cell-intrinsic features driving cysteine targeting, and illustrate the use of covalent probes to disrupt oncogenic transcription-factor activity.
Collapse
Affiliation(s)
- Mariko Takahashi
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA.
| | - Harrison B Chong
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Siwen Zhang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Tzu-Yi Yang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Matthew J Lazarov
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Stefan Harry
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | | | | | | | | | | | - Kira Vordermark
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Jonathan Assaad
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Magdy Gohar
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Benedikt R Dürr
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Marianne Richter
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Himani Patel
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | | | | | | | - Karla Rubio
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Antonio Villanueva
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Junbing Zhang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Maolin Ge
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Farah Makram
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Hanna Griesshaber
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Drew Harrison
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Ann-Sophie Koglin
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Samuel Ojeda
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Barbara Karakyriakou
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Alexander Healy
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - George Popoola
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Inbal Rachmin
- Cutaneous Biology Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Neha Khandelwal
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | | | - Pei-Chieh Tien
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Nicholas Chen
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA
| | - Tobias Hosp
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Sanne van den Ouweland
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Toshiro Hara
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lillian Bussema
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Rui Dong
- Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Lei Shi
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Martin Q Rasmussen
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Ana Carolina Domingues
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Aleigha Lawless
- Department of Surgery, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jacy Fang
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Satoshi Yoda
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Linh Phuong Nguyen
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Sarah Marie Reeves
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Farrah Nicole Wakefield
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Adam Acker
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Sarah Elizabeth Clark
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Taronish Dubash
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - John Kastanos
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA
| | - Eugene Oh
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - David E Fisher
- Cutaneous Biology Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Shyamala Maheswaran
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Daniel A Haber
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Genevieve M Boland
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Surgery, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Surgery, Harvard Medical School, Boston, MA 02114, USA
| | - Moshe Sade-Feldman
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Russell W Jenkins
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Aaron N Hata
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Nabeel M Bardeesy
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Mario L Suvà
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA
| | | | - Brian B Liau
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, USA
| | - Christopher J Ott
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA
| | - Miguel N Rivera
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Pathology, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA
| | - Michael S Lawrence
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pathology, Harvard Medical School, Boston, MA 02114, USA.
| | - Liron Bar-Peled
- Krantz Family Center for Cancer Research, Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA; Department of Medicine, Harvard Medical School, Boston, MA 02114, USA.
| |
Collapse
|
4
|
Singh V, Kirtipal N, Song B, Lee S. Normalization of RNA-Seq data using adaptive trimmed mean with multi-reference. Brief Bioinform 2024; 25:bbae241. [PMID: 38770720 PMCID: PMC11107385 DOI: 10.1093/bib/bbae241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 04/04/2024] [Accepted: 05/07/2024] [Indexed: 05/22/2024] Open
Abstract
The normalization of RNA sequencing data is a primary step for downstream analysis. The most popular method used for the normalization is the trimmed mean of M values (TMM) and DESeq. The TMM tries to trim away extreme log fold changes of the data to normalize the raw read counts based on the remaining non-deferentially expressed genes. However, the major problem with the TMM is that the values of trimming factor M are heuristic. This paper tries to estimate the adaptive value of M in TMM based on Jaeckel's Estimator, and each sample acts as a reference to find the scale factor of each sample. The presented approach is validated on SEQC, MAQC2, MAQC3, PICKRELL and two simulated datasets with two-group and three-group conditions by varying the percentage of differential expression and the number of replicates. The performance of the present approach is compared with various state-of-the-art methods, and it is better in terms of area under the receiver operating characteristic curve and differential expression.
Collapse
Affiliation(s)
- Vikas Singh
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| | - Nikhil Kirtipal
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| | - Byeongsop Song
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| | - Sunjae Lee
- School of Life Sciences, Gwangju Institute of Science and Technology, 123 Cheomdan-gwagiro, 61005, Gwangju, South Korea
| |
Collapse
|
5
|
Shao R, Suzuki T, Suyama M, Tsukada Y. The impact of selective HDAC inhibitors on the transcriptome of early mouse embryos. BMC Genomics 2024; 25:143. [PMID: 38317092 PMCID: PMC10840191 DOI: 10.1186/s12864-024-10029-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 01/18/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND Histone acetylation, which is regulated by histone acetyltransferases (HATs) and histone deacetylases (HDACs), plays a crucial role in the control of gene expression. HDAC inhibitors (HDACi) have shown potential in cancer therapy; however, the specific roles of HDACs in early embryos remain unclear. Moreover, although some pan-HDACi have been used to maintain cellular undifferentiated states in early embryos, the specific mechanisms underlying their effects remain unknown. Thus, there remains a significant knowledge gap regarding the application of selective HDACi in early embryos. RESULTS To address this gap, we treated early embryos with two selective HDACi (MGCD0103 and T247). Subsequently, we collected and analyzed their transcriptome data at different developmental stages. Our findings unveiled a significant effect of HDACi treatment during the crucial 2-cell stage of zygotes, leading to a delay in embryonic development after T247 and an arrest at 2-cell stage after MGCD0103 administration. Furthermore, we elucidated the regulatory targets underlying this arrested embryonic development, which pinpointed the G2/M phase as the potential period of embryonic development arrest caused by MGCD0103. Moreover, our investigation provided a comprehensive profile of the biological processes that are affected by HDACi, with their main effects being predominantly localized in four aspects of zygotic gene activation (ZGA): RNA splicing, cell cycle regulation, autophagy, and transcription factor regulation. By exploring the transcriptional regulation and epigenetic features of the genes affected by HDACi, we made inferences regarding the potential main pathways via which HDACs affect gene expression in early embryos. Notably, Hdac7 exhibited a distinct response, highlighting its potential as a key player in early embryonic development. CONCLUSIONS Our study conducted a comprehensive analysis of the effects of HDACi on early embryonic development at the transcriptional level. The results demonstrated that HDACi significantly affected ZGA in embryos, elucidated the distinct actions of various selective HDACi, and identified specific biological pathways and mechanisms via which these inhibitors modulated early embryonic development.
Collapse
Affiliation(s)
- Ruiqi Shao
- Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, 812-8582, Fukuoka, Japan
| | - Takayoshi Suzuki
- SANKEN, Osaka University, 8-1 Mihogaoka, 567-0047, Ibaraki, Osaka, Japan
| | - Mikita Suyama
- Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, 812-8582, Fukuoka, Japan.
| | - Yuichi Tsukada
- Advanced Biological Information Research Division, INAMORI Frontier Research Center, Kyushu University, 744 Motooka, Nishi-ku, 819-0395, Fukuoka, Japan.
| |
Collapse
|
6
|
Herrera-Uribe J, Lim KS, Byrne KA, Daharsh L, Liu H, Corbett RJ, Marco G, Schroyen M, Koltes JE, Loving CL, Tuggle CK. Integrative profiling of gene expression and chromatin accessibility elucidates specific transcriptional networks in porcine neutrophils. Front Genet 2023; 14:1107462. [PMID: 37287538 PMCID: PMC10242145 DOI: 10.3389/fgene.2023.1107462] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 04/27/2023] [Indexed: 06/09/2023] Open
Abstract
Neutrophils are vital components of the immune system for limiting the invasion and proliferation of pathogens in the body. Surprisingly, the functional annotation of porcine neutrophils is still limited. The transcriptomic and epigenetic assessment of porcine neutrophils from healthy pigs was performed by bulk RNA sequencing and transposase accessible chromatin sequencing (ATAC-seq). First, we sequenced and compared the transcriptome of porcine neutrophils with eight other immune cell transcriptomes to identify a neutrophil-enriched gene list within a detected neutrophil co-expression module. Second, we used ATAC-seq analysis to report for the first time the genome-wide chromatin accessible regions of porcine neutrophils. A combined analysis using both transcriptomic and chromatin accessibility data further defined the neutrophil co-expression network controlled by transcription factors likely important for neutrophil lineage commitment and function. We identified chromatin accessible regions around promoters of neutrophil-specific genes that were predicted to be bound by neutrophil-specific transcription factors. Additionally, published DNA methylation data from porcine immune cells including neutrophils were used to link low DNA methylation patterns to accessible chromatin regions and genes with highly enriched expression in porcine neutrophils. In summary, our data provides the first integrative analysis of the accessible chromatin regions and transcriptional status of porcine neutrophils, contributing to the Functional Annotation of Animal Genomes (FAANG) project, and demonstrates the utility of chromatin accessible regions to identify and enrich our understanding of transcriptional networks in a cell type such as neutrophils.
Collapse
Affiliation(s)
- Juber Herrera-Uribe
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Kyu-Sang Lim
- Department of Animal Science, Iowa State University, Ames, IA, United States
- Department of Animal Resource Science, Kongju National University, Yesan, Republic of Korea
| | - Kristen A. Byrne
- USDA-Agriculture Research Service, National Animal Disease Center, Food Safety and Enteric Pathogens Research Unit, Ames, IA, United States
| | - Lance Daharsh
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Haibo Liu
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Ryan J. Corbett
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Gianna Marco
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Martine Schroyen
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - James E. Koltes
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Crystal L. Loving
- USDA-Agriculture Research Service, National Animal Disease Center, Food Safety and Enteric Pathogens Research Unit, Ames, IA, United States
| | | |
Collapse
|
7
|
Scheepbouwer C, Hackenberg M, van Eijndhoven MAJ, Gerber A, Pegtel M, Gómez-Martín C. NORMSEQ: a tool for evaluation, selection and visualization of RNA-Seq normalization methods. Nucleic Acids Res 2023:7175338. [PMID: 37216599 DOI: 10.1093/nar/gkad429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 04/24/2023] [Accepted: 05/09/2023] [Indexed: 05/24/2023] Open
Abstract
RNA-sequencing has become one of the most used high-throughput approaches to gain knowledge about the expression of all different RNA subpopulations. However, technical artifacts, either introduced during library preparation and/or data analysis, can influence the detected RNA expression levels. A critical step, especially in large and low input datasets or studies, is data normalization, which aims at eliminating the variability in data that is not related to biology. Many normalization methods have been developed, each of them relying on different assumptions, making the selection of the appropriate normalization strategy key to preserve biological information. To address this, we developed NormSeq, a free web-server tool to systematically assess the performance of normalization methods in a given dataset. A key feature of NormSeq is the implementation of information gain to guide the selection of the best normalization method, which is crucial to eliminate or at least reduce non-biological variability. Altogether, NormSeq provides an easy-to-use platform to explore different aspects of gene expression data with a special focus on data normalization to help researchers, even without bioinformatics expertise, to obtain reliable biological inference from their data. NormSeq is freely available at: https://arn.ugr.es/normSeq.
Collapse
Affiliation(s)
- Chantal Scheepbouwer
- Department of Neurosurgery, Cancer Center Amsterdam, Amsterdam University Medical Center (UMC) location Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
- Cancer Center Amsterdam, Cancer Biology, Amsterdam, The Netherlands
- Department of Pathology, Cancer Center Amsterdam, Amsterdam UMC location Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
| | - Michael Hackenberg
- Genetics Genetics Department, Faculty of Science, Universidad de Granada, Campus de Fuentenueva s/n, 18071, Granada, Spain
- Bioinformatics Laboratory, Biomedical Research Centre (CIBM), Biotechnology Institute, PTS, Avda. del Conocimiento s/n, 18100 Granada, Spain
- Excellence Research Unit "Modeling Nature" (MNat), University of Granada, Spain
- Instituto de Investigación Biosanitaria ibs.GRANADA, University Hospitals of Granada-University of Granada, Spain, Conocimiento s/n, 18100, Granada, Spain
| | - Monique A J van Eijndhoven
- Department of Pathology, Cancer Center Amsterdam, Amsterdam UMC location Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
| | - Alan Gerber
- Department of Neurosurgery, Cancer Center Amsterdam, Amsterdam University Medical Center (UMC) location Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
- Cancer Center Amsterdam, Cancer Biology, Amsterdam, The Netherlands
| | - Michiel Pegtel
- Department of Pathology, Cancer Center Amsterdam, Amsterdam UMC location Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
| | - Cristina Gómez-Martín
- Department of Pathology, Cancer Center Amsterdam, Amsterdam UMC location Vrije Universiteit Amsterdam, Amsterdam 1081HV, The Netherlands
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, The Netherlands
| |
Collapse
|
8
|
Verschoor CP, Vlasschaert C, Rauh MJ, Paré G. A DNA methylation based measure outperforms circulating CRP as a marker of chronic inflammation and partly reflects the monocytic response to long-term inflammatory exposure: A Canadian longitudinal study of aging analysis. Aging Cell 2023:e13863. [PMID: 37139638 PMCID: PMC10352553 DOI: 10.1111/acel.13863] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Revised: 04/14/2023] [Accepted: 04/21/2023] [Indexed: 05/05/2023] Open
Abstract
A key hallmark in the age-related dysfunction of physiological systems is disruption related to the regulation of inflammation, often resulting in a chronic, low-grade inflammatory state (i.e., inflammaging). In order to understand the causes of overall system decline, methods to quantify the life-long exposure or damage related to chronic inflammation are critical. Here, we characterize a comprehensive epigenetic inflammation score (EIS) based on DNA methylation loci (CpGs) that are associated with circulating levels of C-reactive protein (CRP). In a cohort of 1446 older adults, we show that associations to age and health-related traits such as smoking history, chronic conditions, and established measures of accelerated aging were stronger for EIS than CRP, while the risk of longitudinal outcomes such as outpatient or inpatient visits and increased frailty were relatively similar. To determine whether variation in EIS actually reflects the cellular response to chronic inflammation we exposed THP1 myelo-monocytic cells to low levels of inflammatory mediators for 14 days, finding that EIS increased in response to both CRP (p = 0.011) and TNF (p = 0.068). Interestingly, a refined version of EIS based only on those CpGs that changed in vitro was more strongly associated with many of the aforementioned traits as compared to EIS. In conclusion, our study demonstrates that EIS outperforms circulating CRP with regard to its association to health-traits that are synonymous with chronic inflammation and accelerated aging, and substantiates its potential role as a clinically relevant tool for stratifying patient risk of adverse outcomes prior to treatment or following illness.
Collapse
Affiliation(s)
- Chris P Verschoor
- Health Sciences North Research Institute, Sudbury, Ontario, Canada
- Northern Ontario School of Medicine, Sudbury, Ontario, Canada
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
| | | | - Michael J Rauh
- Department of Pathology and Molecular Medicine, Queen's University, Kingston, Ontario, Canada
| | - Guillaume Paré
- Department of Health Research Methods, Evidence and Impact, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
9
|
Hoskins I, Sun S, Cote A, Roth FP, Cenik C. satmut_utils: a simulation and variant calling package for multiplexed assays of variant effect. Genome Biol 2023; 24:82. [PMID: 37081510 PMCID: PMC10116734 DOI: 10.1186/s13059-023-02922-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 04/04/2023] [Indexed: 04/22/2023] Open
Abstract
The impact of millions of individual genetic variants on molecular phenotypes in coding sequences remains unknown. Multiplexed assays of variant effect (MAVEs) are scalable methods to annotate relevant variants, but existing software lacks standardization, requires cumbersome configuration, and does not scale to large targets. We present satmut_utils as a flexible solution for simulation and variant quantification. We then benchmark MAVE software using simulated and real MAVE data. We finally determine mRNA abundance for thousands of cystathionine beta-synthase variants using two experimental methods. The satmut_utils package enables high-performance analysis of MAVEs and reveals the capability of variants to alter mRNA abundance.
Collapse
Affiliation(s)
- Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Song Sun
- The Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Atina Cote
- The Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Frederick P Roth
- The Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
10
|
Louise J, Deussen AR, Dodd JM. Data processing choices can affect findings in differential methylation analyses: an investigation using data from the LIMIT RCT. PeerJ 2023; 11:e14786. [PMID: 36755865 PMCID: PMC9901304 DOI: 10.7717/peerj.14786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 01/03/2023] [Indexed: 02/05/2023] Open
Abstract
Objective A wide array of methods exist for processing and analysing DNA methylation data. We aimed to perform a systematic comparison of the behaviour of these methods, using cord blood DNAm from the LIMIT RCT, in relation to detecting hypothesised effects of interest (intervention and pre-pregnancy maternal BMI) as well as effects known to be spurious, and known to be present. Methods DNAm data, from 645 cord blood samples analysed using Illumina 450K BeadChip arrays, were normalised using three different methods (with probe filtering undertaken pre- or post- normalisation). Batch effects were handled with a supervised algorithm, an unsupervised algorithm, or adjustment in the analysis model. Analysis was undertaken with and without adjustment for estimated cell type proportions. The effects estimated included intervention and BMI (effects of interest in the original study), infant sex and randomly assigned groups. Data processing and analysis methods were compared in relation to number and identity of differentially methylated probes, rankings of probes by p value and log-fold-change, and distributions of p values and log-fold-change estimates. Results There were differences corresponding to each of the processing and analysis choices. Importantly, some combinations of data processing choices resulted in a substantial number of spurious 'significant' findings. We recommend greater emphasis on replication and greater use of sensitivity analyses.
Collapse
Affiliation(s)
- Jennie Louise
- Discipline of Obstetrics & Gynaecology and The Robinson Research Institute, The University of Adelaide, Adelaide, Australia,Adelaide Health Technology Asseessment, The University of Adelaide, Adelaide, Australia
| | - Andrea R. Deussen
- Discipline of Obstetrics & Gynaecology and The Robinson Research Institute, The University of Adelaide, Adelaide, Australia
| | - Jodie M. Dodd
- Discipline of Obstetrics & Gynaecology and The Robinson Research Institute, The University of Adelaide, Adelaide, Australia,Department of Perinatal Medicine, Women’s and Babies Division, The Women’s and Children’s Hospital, Adelaide, South Australia, Australia
| |
Collapse
|
11
|
Wu CT, Shen M, Du D, Cheng Z, Parker SJ, Lu Y, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. Cosbin: cosine score-based iterative normalization of biologically diverse samples. BIOINFORMATICS ADVANCES 2022; 2:vbac076. [PMID: 36330358 PMCID: PMC9614059 DOI: 10.1093/bioadv/vbac076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 10/02/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022]
Abstract
Motivation Data normalization is essential to ensure accurate inference and comparability of gene expression measures across samples or conditions. Ideally, gene expression data should be rescaled based on consistently expressed reference genes. However, to normalize biologically diverse samples, the most commonly used reference genes exhibit striking expression variability and size-factor or distribution-based normalization methods can be problematic when the amount of asymmetry in differential expression is significant. Results We report an efficient and accurate data-driven method—Cosine score-based iterative normalization (Cosbin)—to normalize biologically diverse samples. Based on the Cosine scores of cross-condition expression patterns, the Cosbin pipeline iteratively eliminates asymmetric differentially expressed genes, identifies consistently expressed genes, and calculates sample-wise normalization factors. We demonstrate the superior performance and enhanced utility of Cosbin compared with six representative peer methods using both simulation and real multi-omics expression datasets. Implemented in open-source R scripts and specifically designed to address normalization bias due to significant asymmetry in differential expression across multiple conditions, the Cosbin tool complements rather than replaces the existing methods and will allow biologists to more accurately detect true molecular signals among diverse phenotypic groups. Availability and implementation The R scripts of Cosbin pipeline are freely available at https://github.com/MinjieSh/Cosbin. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Dongping Du
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Zuolin Cheng
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Yingzhou Lu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- To whom correspondence should be addressed.
| |
Collapse
|
12
|
Yang R, Stendahl AM, Vigh-Conrad KA, Held M, Lima AC, Conrad DF. SATINN: an automated neural network-based classification of testicular sections allows for high-throughput histopathology of mouse mutants. Bioinformatics 2022; 38:5288-5298. [PMID: 36214638 PMCID: PMC9710558 DOI: 10.1093/bioinformatics/btac673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 09/26/2022] [Accepted: 10/06/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The mammalian testis is a complex organ with a cellular composition that changes smoothly and cyclically in normal adults. While testis histology is already an invaluable tool for identifying and describing developmental differences in evolution and disease, methods for standardized, digital image analysis of testis are needed to expand the utility of this approach. RESULTS We developed SATINN (Software for Analysis of Testis Images with Neural Networks), a multi-level framework for automated analysis of multiplexed immunofluorescence images from mouse testis. This approach uses residual learning to train convolutional neural networks (CNNs) to classify nuclei from seminiferous tubules into seven distinct cell types with an accuracy of 81.7%. These cell classifications are then used in a second-level tubule CNN, which places seminiferous tubules into one of 12 distinct tubule stages with 57.3% direct accuracy and 94.9% within ±1 stage. We further describe numerous cell- and tubule-level statistics that can be derived from wild-type testis. Finally, we demonstrate how the classifiers and derived statistics can be used to rapidly and precisely describe pathology by applying our methods to image data from two mutant mouse lines. Our results demonstrate the feasibility and potential of using computer-assisted analysis for testis histology, an area poised to evolve rapidly on the back of emerging, spatially resolved genomic and proteomic technologies. AVAILABILITY AND IMPLEMENTATION The source code to reproduce the results described here and a SATINN standalone application with graphic-user interface are available from http://github.com/conradlab/SATINN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ran Yang
- To whom correspondence should be addressed. or or
| | - Alexandra M Stendahl
- Division of Genetics, Oregon National Primate Research Center, Oregon Health and Science University, Portland, OR 97006, USA
| | - Katinka A Vigh-Conrad
- Division of Genetics, Oregon National Primate Research Center, Oregon Health and Science University, Portland, OR 97006, USA
| | - Madison Held
- Division of Genetics, Oregon National Primate Research Center, Oregon Health and Science University, Portland, OR 97006, USA
| | - Ana C Lima
- To whom correspondence should be addressed. or or
| | | |
Collapse
|
13
|
Daunesse M, Legendre R, Varet H, Pain A, Chica C. ePeak: from replicated chromatin profiling data to epigenomic dynamics. NAR Genom Bioinform 2022; 4:lqac041. [PMID: 35664802 PMCID: PMC9154330 DOI: 10.1093/nargab/lqac041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 04/05/2022] [Accepted: 05/05/2022] [Indexed: 11/14/2022] Open
Abstract
We present ePeak, a Snakemake-based pipeline for the identification and quantification of reproducible peaks from raw ChIP-seq, CUT&RUN and CUT&Tag epigenomic profiling techniques. It also includes a statistical module to perform tailored differential marking and binding analysis with state of the art methods. ePeak streamlines critical steps like the quality assessment of the immunoprecipitation, spike-in calibration and the selection of reproducible peaks between replicates for both narrow and broad peaks. It generates complete reports for data quality control assessment and optimal interpretation of the results. We advocate for a differential analysis that accounts for the biological dynamics of each chromatin factor. Thus, ePeak provides linear and nonlinear methods for normalisation as well as conservative and stringent models for variance estimation and significance testing of the observed marking/binding differences. Using a published ChIP-seq dataset, we show that distinct populations of differentially marked/bound peaks can be identified. We study their dynamics in terms of read coverage and summit position, as well as the expression of the neighbouring genes. We propose that ePeak can be used to measure the richness of the epigenomic landscape underlying a biological process by identifying diverse regulatory regimes.
Collapse
Affiliation(s)
- Maëlle Daunesse
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| | - Rachel Legendre
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| | - Hugo Varet
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| | - Adrien Pain
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| | - Claudia Chica
- Bioinformatics and Biostatistics Hub, Institut Pasteur, Université de Paris, Paris F-75015, France
| |
Collapse
|
14
|
Ross JP, van Dijk S, Phang M, Skilton MR, Molloy PL, Oytam Y. Batch-effect detection, correction and characterisation in Illumina HumanMethylation450 and MethylationEPIC BeadChip array data. Clin Epigenetics 2022; 14:58. [PMID: 35488315 PMCID: PMC9055778 DOI: 10.1186/s13148-022-01277-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/10/2022] [Indexed: 11/20/2022] Open
Abstract
Background Genomic technologies can be subject to significant batch-effects which are known to reduce experimental power and to potentially create false positive results. The Illumina Infinium Methylation BeadChip is a popular technology choice for epigenome-wide association studies (EWAS), but presently, little is known about the nature of batch-effects on these designs. Given the subtlety of biological phenotypes in many EWAS, control for batch-effects should be a consideration.
Results Using the batch-effect removal approaches in the ComBat and Harman software, we examined two in-house datasets and compared results with three large publicly available datasets, (1214 HumanMethylation450 and 1094 MethylationEPIC BeadChips in total), and find that despite various forms of preprocessing, some batch-effects persist. This residual batch-effect is associated with the day of processing, the individual glass slide and the position of the array on the slide. Consistently across all datasets, 4649 probes required high amounts of correction. To understand the impact of this set to EWAS studies, we explored the literature and found three instances where persistently batch-effect prone probes have been reported in abstracts as key sites of differential methylation. As well as batch-effect susceptible probes, we also discover a set of probes which are erroneously corrected. We provide batch-effect workflows for Infinium Methylation data and provide reference matrices of batch-effect prone and erroneously corrected features across the five datasets spanning regionally diverse populations and three commonly collected biosamples (blood, buccal and saliva). Conclusions Batch-effects are ever present, even in high-quality data, and a strategy to deal with them should be part of experimental design, particularly for EWAS. Batch-effect removal tools are useful to reduce technical variance in Infinium Methylation data, but they need to be applied with care and make use of post hoc diagnostic measures. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-022-01277-9.
Collapse
Affiliation(s)
- Jason P Ross
- Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia.
| | - Susan van Dijk
- Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia
| | - Melinda Phang
- Charles Perkins Centre, The University of Sydney, Sydney, Australia
| | - Michael R Skilton
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.,Sydney Medical School, The University of Sydney, Sydney, Australia.,Sydney Institute for Women, Children and Their Families, Sydney Local Health District, Sydney, Australia
| | - Peter L Molloy
- Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia
| | - Yalchin Oytam
- Clinical Insights and Analytics Unit, South Eastern Sydney Local Health District, Sydney, Australia
| |
Collapse
|
15
|
Bogias KJ, Pederson SM, Leemaqz S, Smith MD, McAninch D, Jankovic-Karasoulos T, McCullough D, Wan Q, Bianco-Miotto T, Breen J, Roberts CT. Placental Transcription Profiling in 6-23 Weeks' Gestation Reveals Differential Transcript Usage in Early Development. Int J Mol Sci 2022; 23:ijms23094506. [PMID: 35562897 PMCID: PMC9105363 DOI: 10.3390/ijms23094506] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/12/2022] [Accepted: 04/13/2022] [Indexed: 12/13/2022] Open
Abstract
The human placenta is a rapidly developing transient organ that is key to pregnancy success. Early development of the conceptus occurs in a low oxygen environment before oxygenated maternal blood begins to flow into the placenta at ~10-12 weeks' gestation. This process is likely to substantially affect overall placental gene expression. Transcript variability underlying gene expression has yet to be profiled. In this study, accurate transcript expression profiles were identified for 84 human placental chorionic villus tissue samples collected across 6-23 weeks' gestation. Differential gene expression (DGE), differential transcript expression (DTE) and differential transcript usage (DTU) between 6-10 weeks' and 11-23 weeks' gestation groups were assessed. In total, 229 genes had significant DTE yet no significant DGE. Integration of DGE and DTE analyses found that differential expression patterns of individual transcripts were commonly masked upon aggregation to the gene-level. Of the 611 genes that exhibited DTU, 534 had no significant DGE or DTE. The four most significant DTU genes ADAM10, VMP1, GPR126, and ASAH1, were associated with hypoxia-responsive pathways. Transcript usage is a likely regulatory mechanism in early placentation. Identification of functional roles will facilitate new insight in understanding the origins of pregnancy complications.
Collapse
Affiliation(s)
- Konstantinos J. Bogias
- Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia; (K.J.B.); (S.L.); (D.M.); (T.J.-K.)
- Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia;
| | - Stephen M. Pederson
- Dame Roma Mitchell Cancer Research Laboratories, Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia;
| | - Shalem Leemaqz
- Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia; (K.J.B.); (S.L.); (D.M.); (T.J.-K.)
- Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia;
- Flinders Health and Medical Research Institute, Flinders University, Bedford Park, SA 5042, Australia; (M.D.S.); (D.M.); (Q.W.)
| | - Melanie D. Smith
- Flinders Health and Medical Research Institute, Flinders University, Bedford Park, SA 5042, Australia; (M.D.S.); (D.M.); (Q.W.)
| | - Dale McAninch
- Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia; (K.J.B.); (S.L.); (D.M.); (T.J.-K.)
| | - Tanja Jankovic-Karasoulos
- Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia; (K.J.B.); (S.L.); (D.M.); (T.J.-K.)
- Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia;
- Flinders Health and Medical Research Institute, Flinders University, Bedford Park, SA 5042, Australia; (M.D.S.); (D.M.); (Q.W.)
| | - Dylan McCullough
- Flinders Health and Medical Research Institute, Flinders University, Bedford Park, SA 5042, Australia; (M.D.S.); (D.M.); (Q.W.)
| | - Qianhui Wan
- Flinders Health and Medical Research Institute, Flinders University, Bedford Park, SA 5042, Australia; (M.D.S.); (D.M.); (Q.W.)
| | - Tina Bianco-Miotto
- Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia;
- School of Agriculture, Food and Wine, Waite Research Institute, University of Adelaide, Adelaide, SA 5005, Australia
| | - James Breen
- Indigenous Genomics, Telethon Kids Institute (Adelaide Office), Adelaide, SA 5000, Australia;
- College of Health & Medicine, Australian National University, Canberra, ACT 2600, Australia
| | - Claire T. Roberts
- Adelaide Medical School, University of Adelaide, Adelaide, SA 5005, Australia; (K.J.B.); (S.L.); (D.M.); (T.J.-K.)
- Robinson Research Institute, University of Adelaide, Adelaide, SA 5005, Australia;
- Flinders Health and Medical Research Institute, Flinders University, Bedford Park, SA 5042, Australia; (M.D.S.); (D.M.); (Q.W.)
- Correspondence:
| |
Collapse
|
16
|
Kroes MM, Miranda-Bedate A, Jacobi RHJ, van Woudenbergh E, den Hartog G, van Putten JPM, de Wit J, Pinelli E. Bordetella pertussis-infected innate immune cells drive the anti-pertussis response of human airway epithelium. Sci Rep 2022; 12:3622. [PMID: 35256671 PMCID: PMC8901624 DOI: 10.1038/s41598-022-07603-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 02/21/2022] [Indexed: 12/13/2022] Open
Abstract
Pertussis is a severe respiratory tract infection caused by Bordetella pertussis. This bacterium infects the ciliated epithelium of the human airways. We investigated the epithelial cell response to B. pertussis infection in primary human airway epithelium (HAE) differentiated at air-liquid interface. Infection of the HAE cells mimicked several hallmarks of B. pertussis infection such as reduced epithelial barrier integrity and abrogation of mucociliary transport. Our data suggests mild immunological activation of HAE by B. pertussis indicated by secretion of IL-6 and CXCL8 and the enrichment of genes involved in bacterial recognition and innate immune processes. We identified IL-1β and IFNγ, present in conditioned media derived from B. pertussis-infected macrophage and NK cells, as essential immunological factors for inducing robust chemokine secretion by HAE in response to B. pertussis. In transwell migration assays, the chemokine-containing supernatants derived from this HAE induced monocyte migration. Our data suggests that the airway epithelium on its own has a limited immunological response to B. pertussis and that for a broad immune response communication with local innate immune cells is necessary. This highlights the importance of intercellular communication in the defense against B. pertussis infection and may assist in the rational design of improved pertussis vaccines.
Collapse
Affiliation(s)
- M M Kroes
- Center for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, The Netherlands.,Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - A Miranda-Bedate
- Center for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, The Netherlands
| | - R H J Jacobi
- Center for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, The Netherlands
| | - E van Woudenbergh
- Center for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, The Netherlands.,Section Paediatric Infectious Diseases, Laboratory of Medical Immunology, Radboud Institute for Molecular Life Sciences, Nijmegen, The Netherlands
| | - G den Hartog
- Center for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, The Netherlands
| | - J P M van Putten
- Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, The Netherlands
| | - J de Wit
- Center for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, The Netherlands
| | - E Pinelli
- Center for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, The Netherlands.
| |
Collapse
|
17
|
Han S, Huang J, Foppiano F, Prehn C, Adamski J, Suhre K, Li Y, Matullo G, Schliess F, Gieger C, Peters A, Wang-Sattler R. TIGER: technical variation elimination for metabolomics data using ensemble learning architecture. Brief Bioinform 2022; 23:6492643. [PMID: 34981111 PMCID: PMC8921617 DOI: 10.1093/bib/bbab535] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/01/2021] [Accepted: 11/18/2021] [Indexed: 12/24/2022] Open
Abstract
Large metabolomics datasets inevitably contain unwanted technical variations which can obscure meaningful biological signals and affect how this information is applied to personalized healthcare. Many methods have been developed to handle unwanted variations. However, the underlying assumptions of many existing methods only hold for a few specific scenarios. Some tools remove technical variations with models trained on quality control (QC) samples which may not generalize well on subject samples. Additionally, almost none of the existing methods supports datasets with multiple types of QC samples, which greatly limits their performance and flexibility. To address these issues, a non-parametric method TIGER (Technical variation elImination with ensemble learninG architEctuRe) is developed in this study and released as an R package (https://CRAN.R-project.org/package=TIGERr). TIGER integrates the random forest algorithm into an adaptable ensemble learning architecture. Evaluation results show that TIGER outperforms four popular methods with respect to robustness and reliability on three human cohort datasets constructed with targeted or untargeted metabolomics data. Additionally, a case study aiming to identify age-associated metabolites is performed to illustrate how TIGER can be used for cross-kit adjustment in a longitudinal analysis with experimental data of three time-points generated by different analytical kits. A dynamic website is developed to help evaluate the performance of TIGER and examine the patterns revealed in our longitudinal analysis (https://han-siyu.github.io/TIGER_web/). Overall, TIGER is expected to be a powerful tool for metabolomics data analysis.
Collapse
Affiliation(s)
- Siyu Han
- School of Medicine, Technical University of Munich, Germany
| | | | | | - Cornelia Prehn
- Head of Metabolomics Lab at Metabolomics and Proteomics Core Facility, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH)
| | - Jerzy Adamski
- National University of Singapore, University of Ljubljana, Slovenia and Technical University of Munich, Germany
| | - Karsten Suhre
- Weill Cornell Medicine and director of the Bioinformatics Core, Qatar
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Giuseppe Matullo
- Human Genetics and group leader of the Genomics Variation, Complex Diseases and Population Medicine Unit at the Turin University, Italy
| | - Freimut Schliess
- Director Science & Innovation at Profil Institut für Stoffwechselforschung (GmbH)
| | - Christian Gieger
- Research Unit of Molecular Epidemiology at the Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH)
| | | | - Rui Wang-Sattler
- Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH)
| |
Collapse
|
18
|
Johnson KA, Krishnan A. Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data. Genome Biol 2022; 23:1. [PMID: 34980209 PMCID: PMC8721966 DOI: 10.1186/s13059-021-02568-9] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 12/06/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Constructing gene coexpression networks is a powerful approach for analyzing high-throughput gene expression data towards module identification, gene function prediction, and disease-gene prioritization. While optimal workflows for constructing coexpression networks, including good choices for data pre-processing, normalization, and network transformation, have been developed for microarray-based expression data, such well-tested choices do not exist for RNA-seq data. Almost all studies that compare data processing and normalization methods for RNA-seq focus on the end goal of determining differential gene expression. RESULTS Here, we present a comprehensive benchmarking and analysis of 36 different workflows, each with a unique set of normalization and network transformation methods, for constructing coexpression networks from RNA-seq datasets. We test these workflows on both large, homogenous datasets and small, heterogeneous datasets from various labs. We analyze the workflows in terms of aggregate performance, individual method choices, and the impact of multiple dataset experimental factors. Our results demonstrate that between-sample normalization has the biggest impact, with counts adjusted by size factors producing networks that most accurately recapitulate known tissue-naive and tissue-aware gene functional relationships. CONCLUSIONS Based on this work, we provide concrete recommendations on robust procedures for building an accurate coexpression network from an RNA-seq dataset. In addition, researchers can examine all the results in great detail at https://krishnanlab.github.io/RNAseq_coexpression to make appropriate choices for coexpression analysis based on the experimental factors of their RNA-seq dataset.
Collapse
Affiliation(s)
- Kayla A Johnson
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Arjun Krishnan
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
19
|
Xu Z, Chen H, Sun J, Mao W, Chen S, Chen M. Multi-Omics analysis identifies a lncRNA-related prognostic signature to predict bladder cancer recurrence. Bioengineered 2021; 12:11108-11125. [PMID: 34738881 PMCID: PMC8810060 DOI: 10.1080/21655979.2021.2000122] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Bladder cancer (BLCA) is one of the most common cancers worldwide with high recurrence rate. Hence, we intended to establish a recurrence-related long non-coding RNA (lncRNA) model of BLCA as a potential biomarker based on multi-omics analysis. Multi-omics data including copy number variation (CNV) data, mutation annotation files, RNA expression profiles and clinical data of The Cancer Genome Atlas (TCGA) BLCA cohort (303 cases) and GSE31684 (93 cases) were downloaded from public database. With multi-omics analysis, twenty lncRNAs were identified as the candidates related with BLCA recurrence, CNVs and mutations in training set. Ten-lncRNA signature were established using least absolute shrinkage and selection operation (LASSO) and Cox regression. Then, various survival analysis was used to assess the power of lncRNA model in predicting BLCA recurrence. The results showed that the recurrence-free survival time of high-risk group was significantly shorter than that of low-risk group in training and testing sets, and the predictive value of ten-lncRNA signature was robust and independent of other clinical variables. Gene Set Enrichment Analysis (GSEA) showed this signature were associated with immune disorders, indicating this signature may be involved in tumor immunology. After compared with the other reported lncRNA signatures, ten-lncRNA signature was validated as a superior prognostic model in predicting the recurrence of BLCA. The effectiveness of the model was also evaluated in bladder cancer samples via qRT-PCR. Thus, the novel ten-lncRNA signature, constructed based on multi-omics data, had robust prognostic power in predicting the recurrence of BLCA and potential clinical implications as biomarkers.
Collapse
Affiliation(s)
- Zhipeng Xu
- Department of Urology, Affiliated Zhongda Hospital of Southeast University, Nanjing, China
| | - Hui Chen
- Department of Radiation Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Jin Sun
- Department of Urology, Xuyi People's Hospital, Huaian, China
| | - Weipu Mao
- Department of Urology, Affiliated Zhongda Hospital of Southeast University, Nanjing, China
| | - Shuqiu Chen
- Department of Urology, Affiliated Zhongda Hospital of Southeast University, Nanjing, China
| | - Ming Chen
- Department of Urology, Affiliated Zhongda Hospital of Southeast University, Nanjing, China.,Department of Urology, Zhongda Hospital Lishui Branch, Nanjing, China
| |
Collapse
|
20
|
Herrera-Uribe J, Wiarda JE, Sivasankaran SK, Daharsh L, Liu H, Byrne KA, Smith TPL, Lunney JK, Loving CL, Tuggle CK. Reference Transcriptomes of Porcine Peripheral Immune Cells Created Through Bulk and Single-Cell RNA Sequencing. Front Genet 2021; 12:689406. [PMID: 34249103 PMCID: PMC8261551 DOI: 10.3389/fgene.2021.689406] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 05/18/2021] [Indexed: 01/03/2023] Open
Abstract
Pigs are a valuable human biomedical model and an important protein source supporting global food security. The transcriptomes of peripheral blood immune cells in pigs were defined at the bulk cell-type and single cell levels. First, eight cell types were isolated in bulk from peripheral blood mononuclear cells (PBMCs) by cell sorting, representing Myeloid, NK cells and specific populations of T and B-cells. Transcriptomes for each bulk population of cells were generated by RNA-seq with 10,974 expressed genes detected. Pairwise comparisons between cell types revealed specific expression, while enrichment analysis identified 1,885 to 3,591 significantly enriched genes across all 8 cell types. Gene Ontology analysis for the top 25% of significantly enriched genes (SEG) showed high enrichment of biological processes related to the nature of each cell type. Comparison of gene expression indicated highly significant correlations between pig cells and corresponding human PBMC bulk RNA-seq data available in Haemopedia. Second, higher resolution of distinct cell populations was obtained by single-cell RNA-sequencing (scRNA-seq) of PBMC. Seven PBMC samples were partitioned and sequenced that produced 28,810 single cell transcriptomes distributed across 36 clusters and classified into 13 general cell types including plasmacytoid dendritic cells (DC), conventional DCs, monocytes, B-cell, conventional CD4 and CD8 αβ T-cells, NK cells, and γδ T-cells. Signature gene sets from the human Haemopedia data were assessed for relative enrichment in genes expressed in pig cells and integration of pig scRNA-seq with a public human scRNA-seq dataset provided further validation for similarity between human and pig data. The sorted porcine bulk RNAseq dataset informed classification of scRNA-seq PBMC populations; specifically, an integration of the datasets showed that the pig bulk RNAseq data helped define the CD4CD8 double-positive T-cell populations in the scRNA-seq data. Overall, the data provides deep and well-validated transcriptomic data from sorted PBMC populations and the first single-cell transcriptomic data for porcine PBMCs. This resource will be invaluable for annotation of pig genes controlling immunogenetic traits as part of the porcine Functional Annotation of Animal Genomes (FAANG) project, as well as further study of, and development of new reagents for, porcine immunology.
Collapse
Affiliation(s)
- Juber Herrera-Uribe
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Jayne E. Wiarda
- Food Safety and Enteric Pathogens Research Unit, National Animal Disease Center, Agricultural Research Service, United States Department of Agriculture, Ames, IA, United States
- Immunobiology Graduate Program, Iowa State University, Ames, IA, United States
- Oak Ridge Institute for Science and Education, Agricultural Research Service Participation Program, Oak Ridge, TN, United States
| | - Sathesh K. Sivasankaran
- Food Safety and Enteric Pathogens Research Unit, National Animal Disease Center, Agricultural Research Service, United States Department of Agriculture, Ames, IA, United States
- Genome Informatics Facility, Iowa State University, Ames, IA, United States
| | - Lance Daharsh
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Haibo Liu
- Department of Animal Science, Iowa State University, Ames, IA, United States
| | - Kristen A. Byrne
- Food Safety and Enteric Pathogens Research Unit, National Animal Disease Center, Agricultural Research Service, United States Department of Agriculture, Ames, IA, United States
| | | | - Joan K. Lunney
- USDA-ARS, Beltsville Agricultural Research Center, Animal Parasitic Diseases Laboratory, Beltsville, MD, United States
| | - Crystal L. Loving
- Food Safety and Enteric Pathogens Research Unit, National Animal Disease Center, Agricultural Research Service, United States Department of Agriculture, Ames, IA, United States
| | | |
Collapse
|
21
|
Peterson EJR, Abidi AA, Arrieta-Ortiz ML, Aguilar B, Yurkovich JT, Kaur A, Pan M, Srinivas V, Shmulevich I, Baliga NS. Intricate Genetic Programs Controlling Dormancy in Mycobacterium tuberculosis. Cell Rep 2021; 31:107577. [PMID: 32348771 PMCID: PMC7605849 DOI: 10.1016/j.celrep.2020.107577] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 12/18/2019] [Accepted: 04/06/2020] [Indexed: 11/24/2022] Open
Abstract
Mycobacterium tuberculosis (MTB) displays the remarkable ability to transition in and out of dormancy, a hallmark of the pathogen’s capacity to evade the immune system and exploit susceptible individuals. Uncovering the gene regulatory programs that underlie the phenotypic shifts in MTB during disease latency and reactivation has posed a challenge. We develop an experimental system to precisely control dissolved oxygen levels in MTB cultures in order to capture the transcriptional events that unfold as MTB transitions into and out of hypoxia-induced dormancy. Using a comprehensive genome-wide transcription factor binding map and insights from network topology analysis, we identify regulatory circuits that deterministically drive sequential transitions across six transcriptionally and functionally distinct states encompassing more than three-fifths of the MTB genome. The architecture of the genetic programs explains the transcriptional dynamics underlying synchronous entry of cells into a dormant state that is primed to infect the host upon encountering favorable conditions. Mycobacterium tuberculosis (MTB) persists within the host by counteracting disparate stressors including hypoxia. Peterson et al. report a transcriptional program that coordinates sequential state transitions to drive MTB in and out of hypoxia-induced dormancy. Among varied properties, this program encodes advanced preparedness to infect the host in favorable conditions.
Collapse
Affiliation(s)
| | - Abrar A Abidi
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | - Boris Aguilar
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | - Amardeep Kaur
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Min Pan
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | | | - Nitin S Baliga
- Institute for Systems Biology, Seattle, WA 98109, USA; Molecular and Cellular Biology Program, Departments of Microbiology and Biology, University of Washington, Seattle, WA; Lawrence Berkeley National Laboratories, Berkeley, CA.
| |
Collapse
|
22
|
Differential Methylation in the GSTT1 Regulatory Region in Sudden Unexplained Death and Sudden Unexpected Death in Epilepsy. Int J Mol Sci 2021; 22:ijms22062790. [PMID: 33801838 PMCID: PMC7999472 DOI: 10.3390/ijms22062790] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 02/27/2021] [Accepted: 03/04/2021] [Indexed: 12/13/2022] Open
Abstract
Sudden cardiac death (SCD) is a diagnostic challenge in forensic medicine. In a relatively large proportion of the SCDs, the deaths remain unexplained after autopsy. This challenge is likely caused by unknown disease mechanisms. Changes in DNA methylation have been associated with several heart diseases, but the role of DNA methylation in SCD is unknown. In this study, we investigated DNA methylation in two SCD subtypes, sudden unexplained death (SUD) and sudden unexpected death in epilepsy (SUDEP). We assessed DNA methylation of more than 850,000 positions in cardiac tissue from nine SUD and 14 SUDEP cases using the Illumina Infinium MethylationEPIC BeadChip. In total, six differently methylated regions (DMRs) between the SUD and SUDEP cases were identified. The DMRs were located in proximity to or overlapping genes encoding proteins that are a part of the glutathione S-transferase (GST) superfamily. Whole genome sequencing (WGS) showed that the DNA methylation alterations were not caused by genetic changes, while whole transcriptome sequencing (WTS) showed that DNA methylation was associated with expression levels of the GSTT1 gene. In conclusion, our results indicate that cardiac DNA methylation is similar in SUD and SUDEP, but with regional differential methylation in proximity to GST genes.
Collapse
|
23
|
Chasse MH, Johnson BK, Boguslawski EA, Sorensen KM, Rosien JE, Kang MH, Reynolds CP, Heo L, Madaj ZB, Beddows I, Foxa GE, Kitchen‐Goosen SM, Williams BO, Triche TJ, Grohar PJ. Mithramycin induces promoter reprogramming and differentiation of rhabdoid tumor. EMBO Mol Med 2021; 13:e12640. [PMID: 33332735 PMCID: PMC7863405 DOI: 10.15252/emmm.202012640] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 11/18/2020] [Accepted: 11/20/2020] [Indexed: 12/21/2022] Open
Abstract
Rhabdoid tumor (RT) is a pediatric cancer characterized by the inactivation of SMARCB1, a subunit of the SWI/SNF chromatin remodeling complex. Although this deletion is the known oncogenic driver, there are limited effective therapeutic options for these patients. Here we use unbiased screening of cell line panels to identify a heightened sensitivity of rhabdoid tumor to mithramycin and the second-generation analogue EC8042. The sensitivity of MMA and EC8042 was superior to traditional DNA damaging agents and linked to the causative mutation of the tumor, SMARCB1 deletion. Mithramycin blocks SMARCB1-deficient SWI/SNF activity and displaces the complex from chromatin to cause an increase in H3K27me3. This triggers chromatin remodeling and enrichment of H3K27ac at chromHMM-defined promoters to restore cellular differentiation. These effects occurred at concentrations not associated with DNA damage and were not due to global chromatin remodeling or widespread gene expression changes. Importantly, a single 3-day infusion of EC8042 caused dramatic regressions of RT xenografts, recapitulated the increase in H3K27me3, and cellular differentiation described in vitro to completely cure three out of eight mice.
Collapse
Affiliation(s)
| | | | | | | | | | - Min H Kang
- Texas Tech University Health Sciences CenterLubbockTXUSA
| | | | - Lyong Heo
- Van Andel Research InstituteGrand RapidsMIUSA
| | | | - Ian Beddows
- Van Andel Research InstituteGrand RapidsMIUSA
| | | | | | | | | | - Patrick J Grohar
- Van Andel Research InstituteGrand RapidsMIUSA
- The Children's Hospital of PhiladelphiaPhiladelphiaPAUSA
- University of PennsylvaniaPerelman School of MedicinePhiladelphiaPAUSA
| |
Collapse
|
24
|
Low MGMT digital expression is associated with a better outcome of IDH1 wildtype glioblastomas treated with temozolomide. J Neurooncol 2021; 151:135-144. [PMID: 33400009 DOI: 10.1007/s11060-020-03675-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 12/10/2020] [Indexed: 12/24/2022]
Abstract
INTRODUCTION Glioblastoma (GBM) is the deadliest primary brain tumor. The standard treatment consists of surgery, radiotherapy, and temozolomide (TMZ). TMZ response is heterogeneous, and MGMT promoter (MGMTp) methylation has been the major predictive biomarker. We aimed to describe the clinical and molecular data of GBMs treated with TMZ, compare MGMT methylation with MGMT expression, and further associate with patient's outcome. METHODS We evaluate 112 FFPE adult GBM cases. IDH1 and ATRX expression was analyzed by immunohistochemistry, hotspot TERT promoter (TERTp) mutations were evaluated by Sanger or pyrosequencing, and MGMTp methylation was assessed by pyrosequencing and MGMT mRNA expression using the nCounter® Vantage 3D™ DNA damage and repair panel. RESULTS Of the 112 GBMs, 96 were IDH1WT, and 16 were IDH1MUT. Positive ATRX expression was found in 91.6% (88/96) of IDHWT and 43.7% (7/16) of IDHMUT. TERTp mutations were detected in 70.4% (50/71) of IDHWT. MGMTp methylation was found in 55.5% (35/63) of IDHWT and 84.6% (11/13) of IDHMUT, and as expected, MGMTp methylation was significantly associated with a better response to TMZ. MGMT expression was inversely correlated with MGMTp methylation levels (- 0.506, p < 0.0001), and MGMT low expression were significantly associated with better patient survival. It was also observed that integrating MGMTp methylation and expression, significantly improved the prognostication value. CONCLUSIONS MGMT mRNA levels evaluated by digital expression were associated with the outcome of TMZ-treated GBM patients. The combination of MGMT methylation and mRNA expression may provide a more accurate prediction of TMZ response in GBM patients.
Collapse
|
25
|
Whole blood mRNA expression-based targets to discriminate active tuberculosis from latent infection and other pulmonary diseases. Sci Rep 2020; 10:22072. [PMID: 33328540 PMCID: PMC7745039 DOI: 10.1038/s41598-020-78793-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 11/30/2020] [Indexed: 01/22/2023] Open
Abstract
Current diagnostic tests for tuberculosis (TB) are not able to predict reactivation disease progression from latent TB infection (LTBI). The main barrier to predicting reactivation disease is the lack of our understanding of host biomarkers associated with progression from latent infection to active disease. Here, we applied an immune-based gene expression profile by NanoString platform to identify whole blood markers that can distinguish active TB from other lung diseases (OPD), and that could be further evaluated as a reactivation TB predictor. Among 23 candidate genes that differentiated patients with active TB from those with OPD, nine genes (CD274, CEACAM1, CR1, FCGR1A/B, IFITM1, IRAK3, LILRA6, MAPK14, PDCD1LG2) demonstrated sensitivity and specificity of 100%. Seven genes (C1QB, C2, CCR2, CCRL2, LILRB4, MAPK14, MSR1) distinguished TB from LTBI with sensitivity and specificity between 82 and 100%. This study identified single gene candidates that distinguished TB from OPD and LTBI with high sensitivity and specificity (both > 82%), which may be further evaluated as diagnostic for disease and as predictive markers for reactivation TB.
Collapse
|
26
|
Souza AGD, Bastos VAF, Fujimura PT, Ferreira ICC, Leal LF, da Silva LS, Laus AC, Reis RM, Martins MM, Santos PS, Corrêa NCR, Marangoni K, Thomé CH, Colli LM, Goulart LR, Goulart VA. Cell-free DNA promotes malignant transformation in non-tumor cells. Sci Rep 2020; 10:21674. [PMID: 33303880 PMCID: PMC7728762 DOI: 10.1038/s41598-020-78766-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 11/30/2020] [Indexed: 12/16/2022] Open
Abstract
Cell-free DNA is present in different biological fluids and when released by tumor cells may contribute to pro-tumor events such as malignant transformation of cells adjacent to the tumor and metastasis. Thus, this study analyzed the effect of tumor cell-free DNA, isolated from the blood of prostate cancer patients, on non-tumor prostate cell lines (RWPE-1 and PNT-2). To achieve this, we performed cell-free DNA quantification and characterization assays, evaluation of gene and miRNA expression profiling focused on cancer progression and EMT, and metabolomics by mass spectrometry and cellular migration. The results showed that tumor-free cell DNA was able to alter the gene expression of MMP9 and CD44, alter the expression profile of nine miRNAs, and increased the tryptophan consumption and cell migration rates in non-tumor cells. Therefore, tumor cell-free DNA was capable of altering the receptor cell phenotype, triggering events related to malignant transformation in these cells, and can thus be considered a potential target for cancer diagnosis and therapy.
Collapse
Affiliation(s)
- Aline Gomes de Souza
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil.
- Department of Medical Imaging, Hematology, and Oncology, Ribeirão Preto Medical School - University of São Paulo, Ribeirão Preto, Brazil.
| | - Victor Alexandre F Bastos
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil
| | - Patricia Tieme Fujimura
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil
| | - Izabella Cristina C Ferreira
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil
| | - Letícia Ferro Leal
- Molecular Oncology Research Center, Barretos Cancer Hospital, Barretos, SP, Brazil
| | | | - Ana Carolina Laus
- Molecular Oncology Research Center, Barretos Cancer Hospital, Barretos, SP, Brazil
| | - Rui Manuel Reis
- Molecular Oncology Research Center, Barretos Cancer Hospital, Barretos, SP, Brazil
- Life and Health Sciences Research Institute (ICVS), Medical School, University of Minho, Braga, Portugal
- 3ICVS/3B's-PT Government Associate Laboratory, Braga, Portugal
| | - Mario Machado Martins
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil
| | - Paula Souza Santos
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil
| | - Natássia C Resende Corrêa
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil
| | - Karina Marangoni
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil
| | - Carolina Hassibe Thomé
- Center for Cell Based Therapy, Hemotherapy Center of Ribeirão Preto, Ribeirão Preto, SP, Brazil
| | - Leandro Machado Colli
- Department of Medical Imaging, Hematology, and Oncology, Ribeirão Preto Medical School - University of São Paulo, Ribeirão Preto, Brazil
| | - Luiz Ricardo Goulart
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil
- Department of Medical Microbiology and Immunology, University of California-Davis, Davis, USA
| | - Vivian Alonso Goulart
- Laboratory of Nanobiotechnology, Institute of Biotechnology, Federal University of Uberlândia, Uberlândia, MG, Brazil
| |
Collapse
|
27
|
Tanaka A, Ishitsuka Y, Ohta H, Fujimoto A, Yasunaga JI, Matsuoka M. Systematic clustering algorithm for chromatin accessibility data and its application to hematopoietic cells. PLoS Comput Biol 2020; 16:e1008422. [PMID: 33253153 PMCID: PMC7728210 DOI: 10.1371/journal.pcbi.1008422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 12/10/2020] [Accepted: 10/06/2020] [Indexed: 11/18/2022] Open
Abstract
The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.
Collapse
Affiliation(s)
- Azusa Tanaka
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
- Laboratory of Virus Control, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan
- * E-mail: (AT); (YI); (HO)
| | - Yasuhiro Ishitsuka
- Center for Science Adventure and Collaborative Research Advancement, Graduate School of Science, Kyoto University, Kyoto, Japan
- Department of Mathematics, Graduate School of Science, Kyoto University, Kyoto, Japan
- * E-mail: (AT); (YI); (HO)
| | - Hiroki Ohta
- Center for Science Adventure and Collaborative Research Advancement, Graduate School of Science, Kyoto University, Kyoto, Japan
- Department of Physics, Graduate School of Science, Kyoto University, Kyoto, Japan
- * E-mail: (AT); (YI); (HO)
| | - Akihiro Fujimoto
- Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Jun-ichirou Yasunaga
- Laboratory of Virus Control, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan
- Department of Hematology, Rheumatology and Infectious Disease, Faculty of Life Sciences, Kumamoto University, Kumamoto, Japan
| | - Masao Matsuoka
- Laboratory of Virus Control, Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan
- Department of Hematology, Rheumatology and Infectious Disease, Faculty of Life Sciences, Kumamoto University, Kumamoto, Japan
| |
Collapse
|
28
|
Bennike TB, Fatou B, Angelidou A, Diray-Arce J, Falsafi R, Ford R, Gill EE, van Haren SD, Idoko OT, Lee AH, Ben-Othman R, Pomat WS, Shannon CP, Smolen KK, Tebbutt SJ, Ozonoff A, Richmond PC, van den Biggelaar AHJ, Hancock REW, Kampmann B, Kollmann TR, Levy O, Steen H. Preparing for Life: Plasma Proteome Changes and Immune System Development During the First Week of Human Life. Front Immunol 2020; 11:578505. [PMID: 33329546 PMCID: PMC7732455 DOI: 10.3389/fimmu.2020.578505] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 09/22/2020] [Indexed: 01/05/2023] Open
Abstract
Neonates have heightened susceptibility to infections. The biological mechanisms are incompletely understood but thought to be related to age-specific adaptations in immunity due to resource constraints during immune system development and growth. We present here an extended analysis of our proteomics study of peripheral blood-plasma from a study of healthy full-term newborns delivered vaginally, collected at the day of birth and on day of life (DOL) 1, 3, or 7, to cover the first week of life. The plasma proteome was characterized by LC-MS using our established 96-well plate format plasma proteomics platform. We found increasing acute phase proteins and a reduction of respective inhibitors on DOL1. Focusing on the complement system, we found increased plasma concentrations of all major components of the classical complement pathway and the membrane attack complex (MAC) from birth onward, except C7 which seems to have near adult levels at birth. In contrast, components of the lectin and alternative complement pathways mainly decreased. A comparison to whole blood messenger RNA (mRNA) levels enabled characterization of mRNA and protein levels in parallel, and for 23 of the 30 monitored complement proteins, the whole blood transcript information by itself was not reflective of the plasma protein levels or dynamics during the first week of life. Analysis of immunoglobulin (Ig) mRNA and protein levels revealed that IgM levels and synthesis increased, while the plasma concentrations of maternally transferred IgG1-4 decreased in accordance with their in vivo half-lives. The neonatal plasma ratio of IgG1 to IgG2-4 was increased compared to adult values, demonstrating a highly efficient IgG1 transplacental transfer process. Partial compensation for maternal IgG degradation was achieved by endogenous synthesis of the IgG1 subtype which increased with DOL. The findings were validated in a geographically distinct cohort, demonstrating a consistent developmental trajectory of the newborn's immune system over the first week of human life across continents. Our findings indicate that the classical complement pathway is central for newborn immunity and our approach to characterize the plasma proteome in parallel with the transcriptome will provide crucial insight in immune ontogeny and inform new approaches to prevent and treat diseases.
Collapse
Affiliation(s)
- Tue Bjerg Bennike
- Department of Pathology, Boston Children’s Hospital, Boston, MA, United States
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Benoit Fatou
- Department of Pathology, Boston Children’s Hospital, Boston, MA, United States
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Asimenia Angelidou
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Department of Neonatology, Beth Israel Deaconess Medical Center, Boston, MA, United States
| | - Joann Diray-Arce
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Reza Falsafi
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, BC, Canada
| | - Rebecca Ford
- Papua New Guinea Institute of Medical Research, Goroka, Papua New Guinea
| | - Erin E. Gill
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, BC, Canada
| | - Simon D. van Haren
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Olubukola T. Idoko
- Vaccines and Immunity Theme, Medical Research Council Unit, The Gambia at the London School of Hygiene and Tropical Medicine, Banjul, Gambia
| | - Amy H. Lee
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, BC, Canada
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Rym Ben-Othman
- Department of Pediatrics, University of British Columbia, and BC Children’s Hospital, Vancouver, BC, Canada
| | - William S. Pomat
- Papua New Guinea Institute of Medical Research, Goroka, Papua New Guinea
| | | | - Kinga K. Smolen
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Scott J. Tebbutt
- PROOF Centre of Excellence, Vancouver, BC, Canada
- UBC Centre for Heart Lung Innovation, St. Paul’s Hospital, Vancouver, BC, Canada
- Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Al Ozonoff
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | | | | | - Robert E. W. Hancock
- Department of Microbiology and Immunology, University of British Columbia, Vancouver, BC, Canada
| | - Beate Kampmann
- Vaccines and Immunity Theme, Medical Research Council Unit, The Gambia at the London School of Hygiene and Tropical Medicine, Banjul, Gambia
- Vaccine Centre, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Tobias R. Kollmann
- Department of Pediatrics, University of British Columbia, and BC Children’s Hospital, Vancouver, BC, Canada
- Department of Experimental Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Ofer Levy
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Broad Institute of MIT & Harvard, Cambridge, MA, United States
| | - Hanno Steen
- Department of Pathology, Boston Children’s Hospital, Boston, MA, United States
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| |
Collapse
|
29
|
Zhao Y, Wong L, Goh WWB. How to do quantile normalization correctly for gene expression data analyses. Sci Rep 2020; 10:15534. [PMID: 32968196 PMCID: PMC7511327 DOI: 10.1038/s41598-020-72664-6] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 08/03/2020] [Indexed: 02/07/2023] Open
Abstract
Quantile normalization is an important normalization technique commonly used in high-dimensional data analysis. However, it is susceptible to class-effect proportion effects (the proportion of class-correlated variables in a dataset) and batch effects (the presence of potentially confounding technical variation) when applied blindly on whole data sets, resulting in higher false-positive and false-negative rates. We evaluate five strategies for performing quantile normalization, and demonstrate that good performance in terms of batch-effect correction and statistical feature selection can be readily achieved by first splitting data by sample class-labels before performing quantile normalization independently on each split (“Class-specific”). Via simulations with both real and simulated batch effects, we demonstrate that the “Class-specific” strategy (and others relying on similar principles) readily outperform whole-data quantile normalization, and is robust-preserving useful signals even during the combined analysis of separately-normalized datasets. Quantile normalization is a commonly used procedure. But when carelessly applied on whole datasets without first considering class-effect proportion and batch effects, can result in poor performance. If quantile normalization must be used, then we recommend using the “Class-specific” strategy.
Collapse
Affiliation(s)
- Yaxing Zhao
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore, Singapore.,Department of Pathology, National University of Singapore, Singapore, Singapore
| | - Wilson Wen Bin Goh
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
30
|
Analysis of global gene expression at seven brain regions of patients with schizophrenia. Schizophr Res 2020; 223:119-127. [PMID: 32631700 DOI: 10.1016/j.schres.2020.06.032] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 04/14/2020] [Accepted: 06/27/2020] [Indexed: 12/30/2022]
Abstract
Previous transcriptome analyses of brain samples provided several insights into the pathophysiology of schizophrenia. In this study, we aimed to re-investigate gene expression datasets from seven brain regions of patients with schizophrenia and healthy controls by adopting a unified approach. After adjustment for confounding factors, we detected gene expression changes in 2 out of 7 brain regions - the dorsolateral prefrontal cortex (DLPFC) and parietal cortex (PC). We found relatively small effect sizes, not exceeding absolute log fold changes of 1. Gene-set enrichment analysis revealed the following alterations: 1) down-regulation of GABAergic signaling (in DLPFC and PC); 2) up-regulation of interleukin-23 signaling together with up-regulation of transcription mediated by RUNX1 and RUNX3 as well as down-regulation of RUNX2 signaling (in DLPFC) and 3) up-regulation of genes associated with responses to metal ions and RUNX1 signaling (PC). The number of neurons was significantly lower and the number of astrocytes was significantly higher at both brain regions. In turn, the index of microglia was increased in DLPFC and decreased in PC. Finally, our unsupervised analysis demonstrated that cellular composition of the samples was a major confounding factor in the analysis of gene expression across all datasets. In conclusion, our analysis provides further evidence that small but significant changes in the expression of genes related to GABAergic signaling, brain development, neuroinflammation and responses to metal ions might be involved in the pathophysiology of schizophrenia. Cell sorting techniques need to be used by future studies to dissect the effect of cellular content.
Collapse
|
31
|
Zhu A, Ibrahim JG, Love MI. Heavy-tailed prior distributions for sequence count data: removing the noise and preserving large differences. Bioinformatics 2020; 35:2084-2092. [PMID: 30395178 PMCID: PMC6581436 DOI: 10.1093/bioinformatics/bty895] [Citation(s) in RCA: 914] [Impact Index Per Article: 228.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 09/25/2018] [Accepted: 10/23/2018] [Indexed: 01/08/2023] Open
Abstract
MOTIVATION In RNA-seq differential expression analysis, investigators aim to detect those genes with changes in expression level across conditions, despite technical and biological variability in the observations. A common task is to accurately estimate the effect size, often in terms of a logarithmic fold change (LFC). RESULTS When the read counts are low or highly variable, the maximum likelihood estimates for the LFCs has high variance, leading to large estimates not representative of true differences, and poor ranking of genes by effect size. One approach is to introduce filtering thresholds and pseudocounts to exclude or moderate estimated LFCs. Filtering may result in a loss of genes from the analysis with true differences in expression, while pseudocounts provide a limited solution that must be adapted per dataset. Here, we propose the use of a heavy-tailed Cauchy prior distribution for effect sizes, which avoids the use of filter thresholds or pseudocounts. The proposed method, Approximate Posterior Estimation for generalized linear model, apeglm, has lower bias than previously proposed shrinkage estimators, while still reducing variance for those genes with little information for statistical inference. AVAILABILITY AND IMPLEMENTATION The apeglm package is available as an R/Bioconductor package at https://bioconductor.org/packages/apeglm, and the methods can be called from within the DESeq2 software. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anqi Zhu
- Department of Biostatistics, University of North Carolina-Chapel Hill, NC, USA
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina-Chapel Hill, NC, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, NC, USA.,Department of Genetics, University of North Carolina-Chapel Hill, NC, USA
| |
Collapse
|
32
|
Federico A, Serra A, Ha MK, Kohonen P, Choi JS, Liampa I, Nymark P, Sanabria N, Cattelani L, Fratello M, Kinaret PAS, Jagiello K, Puzyn T, Melagraki G, Gulumian M, Afantitis A, Sarimveis H, Yoon TH, Grafström R, Greco D. Transcriptomics in Toxicogenomics, Part II: Preprocessing and Differential Expression Analysis for High Quality Data. NANOMATERIALS 2020; 10:nano10050903. [PMID: 32397130 PMCID: PMC7279140 DOI: 10.3390/nano10050903] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 04/29/2020] [Accepted: 05/04/2020] [Indexed: 12/28/2022]
Abstract
Preprocessing of transcriptomics data plays a pivotal role in the development of toxicogenomics-driven tools for chemical toxicity assessment. The generation and exploitation of large volumes of molecular profiles, following an appropriate experimental design, allows the employment of toxicogenomics (TGx) approaches for a thorough characterisation of the mechanism of action (MOA) of different compounds. To date, a plethora of data preprocessing methodologies have been suggested. However, in most cases, building the optimal analytical workflow is not straightforward. A careful selection of the right tools must be carried out, since it will affect the downstream analyses and modelling approaches. Transcriptomics data preprocessing spans across multiple steps such as quality check, filtering, normalization, batch effect detection and correction. Currently, there is a lack of standard guidelines for data preprocessing in the TGx field. Defining the optimal tools and procedures to be employed in the transcriptomics data preprocessing will lead to the generation of homogeneous and unbiased data, allowing the development of more reliable, robust and accurate predictive models. In this review, we outline methods for the preprocessing of three main transcriptomic technologies including microarray, bulk RNA-Sequencing (RNA-Seq), and single cell RNA-Sequencing (scRNA-Seq). Moreover, we discuss the most common methods for the identification of differentially expressed genes and to perform a functional enrichment analysis. This review is the second part of a three-article series on Transcriptomics in Toxicogenomics.
Collapse
Affiliation(s)
- Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.F.); (A.S.); (L.C.); (M.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.F.); (A.S.); (L.C.); (M.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - My Kieu Ha
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Pekka Kohonen
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Jang-Sik Choi
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Irene Liampa
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Penny Nymark
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Natasha Sanabria
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
| | - Luca Cattelani
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.F.); (A.S.); (L.C.); (M.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Michele Fratello
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.F.); (A.S.); (L.C.); (M.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
| | - Pia Anneli Sofia Kinaret
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.F.); (A.S.); (L.C.); (M.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
| | - Karolina Jagiello
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Tomasz Puzyn
- QSAR Lab Ltd., Aleja Grunwaldzka 190/102, 80-266 Gdansk, Poland; (K.J.); (T.P.)
- Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308 Gdansk, Poland
| | - Georgia Melagraki
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Mary Gulumian
- National Institute for Occupational Health, Johannesburg 30333, South Africa; (N.S.); (M.G.)
- Haematology and Molecular Medicine Department, School of Pathology, University of the Witwatersrand, Johannesburg 2050, South Africa
| | - Antreas Afantitis
- Nanoinformatics Department, NovaMechanics Ltd., Nicosia 1065, Cyprus; (G.M.); (A.A.)
| | - Haralambos Sarimveis
- School of Chemical Engineering, National Technical University of Athens, 157 80 Athens, Greece; (I.L.); (H.S.)
| | - Tae-Hyun Yoon
- Center for Next Generation Cytometry, Hanyang University, Seoul 04763, Korea; (M.K.H.); (J.-S.C.); (T.-H.Y.)
- Department of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Korea
- Institute of Next Generation Material Design, Hanyang University, Seoul 04763, Korea
| | - Roland Grafström
- Institute of Environmental Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden; (P.K.); (P.N.); (R.G.)
- Division of Toxicology, Misvik Biology, 20520 Turku, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, FI-33014 Tampere, Finland; (A.F.); (A.S.); (L.C.); (M.F.); (P.A.S.K.)
- BioMediTech Institute, Tampere University, FI-33014 Tampere, Finland
- Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland
- Correspondence:
| |
Collapse
|
33
|
Zhang Y, Kim MS, Reichenberger ER, Stear B, Taylor DM. Scedar: A scalable Python package for single-cell RNA-seq exploratory data analysis. PLoS Comput Biol 2020; 16:e1007794. [PMID: 32339163 PMCID: PMC7217489 DOI: 10.1371/journal.pcbi.1007794] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 05/12/2020] [Accepted: 03/17/2020] [Indexed: 11/25/2022] Open
Abstract
In single-cell RNA-seq (scRNA-seq) experiments, the number of individual cells has increased exponentially, and the sequencing depth of each cell has decreased significantly. As a result, analyzing scRNA-seq data requires extensive considerations of program efficiency and method selection. In order to reduce the complexity of scRNA-seq data analysis, we present scedar, a scalable Python package for scRNA-seq exploratory data analysis. The package provides a convenient and reliable interface for performing visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering on large-scale scRNA-seq datasets. The analytical methods are efficient, and they also do not assume that the data follow certain statistical distributions. The package is extensible and modular, which would facilitate the further development of functionalities for future requirements with the open-source development community. The scedar package is distributed under the terms of the MIT license at https://pypi.org/project/scedar.
Collapse
Affiliation(s)
- Yuanchao Zhang
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Man S. Kim
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Erin R. Reichenberger
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Ben Stear
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Deanne M. Taylor
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
34
|
Tao J, Hao Y, Li X, Yin H, Nie X, Zhang J, Xu B, Chen Q, Li B. Systematic Identification of Housekeeping Genes Possibly Used as References in Caenorhabditis elegans by Large-Scale Data Integration. Cells 2020; 9:E786. [PMID: 32213971 PMCID: PMC7140892 DOI: 10.3390/cells9030786] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 03/11/2020] [Accepted: 03/11/2020] [Indexed: 12/20/2022] Open
Abstract
For accurate gene expression quantification, normalization of gene expression data against reliable reference genes is required. It is known that the expression levels of commonly used reference genes vary considerably under different experimental conditions, and therefore, their use for data normalization is limited. In this study, an unbiased identification of reference genes in Caenorhabditis elegans was performed based on 145 microarray datasets (2296 gene array samples) covering different developmental stages, different tissues, drug treatments, lifestyle, and various stresses. As a result, thirteen housekeeping genes (rps-23, rps-26, rps-27, rps-16, rps-2, rps-4, rps-17, rpl-24.1, rpl-27, rpl-33, rpl-36, rpl-35, and rpl-15) with enhanced stability were comprehensively identified by using six popular normalization algorithms and RankAggreg method. Functional enrichment analysis revealed that these genes were significantly overrepresented in GO terms or KEGG pathways related to ribosomes. Validation analysis using recently published datasets revealed that the expressions of newly identified candidate reference genes were more stable than the commonly used reference genes. Based on the results, we recommended using rpl-33 and rps-26 as the optimal reference genes for microarray and rps-2 and rps-4 for RNA-sequencing data validation. More importantly, the most stable rps-23 should be a promising reference gene for both data types. This study, for the first time, successfully displays a large-scale microarray data driven genome-wide identification of stable reference genes for normalizing gene expression data and provides a potential guideline on the selection of universal internal reference genes in C. elegans, for quantitative gene expression analysis.
Collapse
Affiliation(s)
- Jingxin Tao
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (J.T.); (Y.H.); (X.L.); (H.Y.); (X.N.); (J.Z.); (B.X.)
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (J.T.); (Y.H.); (X.L.); (H.Y.); (X.N.); (J.Z.); (B.X.)
| | - Xudong Li
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (J.T.); (Y.H.); (X.L.); (H.Y.); (X.N.); (J.Z.); (B.X.)
| | - Huachun Yin
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (J.T.); (Y.H.); (X.L.); (H.Y.); (X.N.); (J.Z.); (B.X.)
| | - Xiner Nie
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (J.T.); (Y.H.); (X.L.); (H.Y.); (X.N.); (J.Z.); (B.X.)
| | - Jie Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (J.T.); (Y.H.); (X.L.); (H.Y.); (X.N.); (J.Z.); (B.X.)
| | - Boying Xu
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (J.T.); (Y.H.); (X.L.); (H.Y.); (X.N.); (J.Z.); (B.X.)
| | - Qiao Chen
- Scientific Research Office, Chongqing Normal University, Chongqing 401331, China;
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; (J.T.); (Y.H.); (X.L.); (H.Y.); (X.N.); (J.Z.); (B.X.)
| |
Collapse
|
35
|
Čuklina J, Pedrioli PGA, Aebersold R. Review of Batch Effects Prevention, Diagnostics, and Correction Approaches. Methods Mol Biol 2020; 2051:373-387. [PMID: 31552638 DOI: 10.1007/978-1-4939-9744-2_16] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Systematic technical variation in high-throughput studies consisting of the serial measurement of large sample cohorts is known as batch effects. Batch effects reduce the sensitivity of biological signal extraction and can cause significant artifacts. The systematic bias in the data caused by batch effects is more common in studies in which logistical considerations restrict the number of samples that can be prepared or profiled in a single experiment, thus necessitating the arrangement of subsets of study samples in batches. To mitigate the negative impact of batch effects, statistical approaches for batch correction are used at the stage of experimental design and data processing. Whereas in genomics batch effects and possible remedies have been extensively discussed, they are a relatively new challenge in proteomics because methods with sufficient throughput to systematically measure through large sample cohorts have only recently become available. Here we provide general recommendations to mitigate batch effects: we discuss the design of large-scale proteomic studies, review the most commonly used tools for batch effect correction and overview their application in proteomics.
Collapse
Affiliation(s)
- Jelena Čuklina
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
- Ph.D. Program in Systems Biology, University of Zurich and ETH Zurich, Zürich, Switzerland
| | - Patrick G A Pedrioli
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
- ETH Zürich, PHRT-MS, Zürich, Switzerland
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland.
- Faculty of Science, University of Zürich, Zürich, Switzerland.
| |
Collapse
|
36
|
Reyes ALP, Silva TC, Coetzee SG, Plummer JT, Davis BD, Chen S, Hazelett DJ, Lawrenson K, Berman BP, Gayther SA, Jones MR. GENAVi: a shiny web application for gene expression normalization, analysis and visualization. BMC Genomics 2019; 20:745. [PMID: 31619158 PMCID: PMC6796420 DOI: 10.1186/s12864-019-6073-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 08/29/2019] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND The development of next generation sequencing (NGS) methods led to a rapid rise in the generation of large genomic datasets, but the development of user-friendly tools to analyze and visualize these datasets has not developed at the same pace. This presents a two-fold challenge to biologists; the expertise to select an appropriate data analysis pipeline, and the need for bioinformatics or programming skills to apply this pipeline. The development of graphical user interface (GUI) applications hosted on web-based servers such as Shiny can make complex workflows accessible across operating systems and internet browsers to those without programming knowledge. RESULTS We have developed GENAVi (Gene Expression Normalization Analysis and Visualization) to provide a user-friendly interface for normalization and differential expression analysis (DEA) of human or mouse feature count level RNA-Seq data. GENAVi is a GUI based tool that combines Bioconductor packages in a format for scientists without bioinformatics expertise. We provide a panel of 20 cell lines commonly used for the study of breast and ovarian cancer within GENAVi as a foundation for users to bring their own data to the application. Users can visualize expression across samples, cluster samples based on gene expression or correlation, calculate and plot the results of principal components analysis, perform DEA and gene set enrichment and produce plots for each of these analyses. To allow scalability for large datasets we have provided local install via three methods. We improve on available tools by offering a range of normalization methods and a simple to use interface that provides clear and complete session reporting and for reproducible analysis. CONCLUSION The development of tools using a GUI makes them practical and accessible to scientists without bioinformatics expertise, or access to a data analyst with relevant skills. While several GUI based tools are currently available for RNA-Seq analysis we improve on these existing tools. This user-friendly application provides a convenient platform for the normalization, analysis and visualization of gene expression data for scientists without bioinformatics expertise.
Collapse
Affiliation(s)
- Alberto Luiz P Reyes
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Tiago C Silva
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Simon G Coetzee
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Jasmine T Plummer
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Brian D Davis
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Stephanie Chen
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Dennis J Hazelett
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Kate Lawrenson
- Women's Cancer Program, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Benjamin P Berman
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Simon A Gayther
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Michelle R Jones
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA.
| |
Collapse
|
37
|
Kirov S, Sasson A, Zhang C, Chasalow S, Dongre A, Steen H, Stensballe A, Andersen V, Birkelund S, Bennike TB. Degradation of the extracellular matrix is part of the pathology of ulcerative colitis. Mol Omics 2019; 15:67-76. [PMID: 30702115 DOI: 10.1039/c8mo00239h] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The scientific value of re-analyzing existing datasets is often proportional to the complexity of the data. Proteomics data are inherently complex and can be analyzed at many levels, including proteins, peptides, and post-translational modifications to verify and/or develop new hypotheses. In this paper, we present our re-analysis of a previously published study comparing colon biopsy samples from ulcerative colitis (UC) patients to non-affected controls. We used a different statistical approach, employing a linear mixed-effects regression model and analyzed the data both on the protein and peptide level. In addition to confirming and reinforcing the original finding of upregulation of neutrophil extracellular traps (NETs), we report novel findings, including that Extracellular Matrix (ECM) degradation and neutrophil maturation are involved in the pathology of UC. The pharmaceutically most relevant differential protein expressions were confirmed using immunohistochemistry as an orthogonal method. As part of this study, we also compared proteomics data to previously published mRNA expression data. These comparisons indicated compensatory regulation at transcription levels of the ECM proteins we identified and open possible new avenues for drug discovery.
Collapse
Affiliation(s)
- Stefan Kirov
- Translational Bioinformatics, Bristol Myers Squib, Pennington, NJ, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Unbiased data analytic strategies to improve biomarker discovery in precision medicine. Drug Discov Today 2019; 24:1735-1748. [PMID: 31158511 DOI: 10.1016/j.drudis.2019.05.018] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 04/23/2019] [Accepted: 05/28/2019] [Indexed: 12/25/2022]
Abstract
Omics technologies promised improved biomarker discovery for precision medicine. The foremost problem of discovered biomarkers is irreproducibility between patient cohorts. From a data analytics perspective, the main reason for these failures is bias in statistical approaches and overfitting resulting from batch effects and confounding factors. The keys to reproducible biomarker discovery are: proper study design, unbiased data preprocessing and quality control analyses, and a knowledgeable application of statistics and machine learning algorithms. In this review, we discuss study design and analysis considerations and suggest standards from an expert point-of-view to promote unbiased decision-making in biomarker discovery in precision medicine.
Collapse
|
39
|
Hicks SC, Okrah K, Paulson JN, Quackenbush J, Irizarry RA, Bravo HC. Smooth quantile normalization. Biostatistics 2019; 19:185-198. [PMID: 29036413 DOI: 10.1093/biostatistics/kxx028] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 05/07/2017] [Indexed: 11/14/2022] Open
Abstract
Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.
Collapse
Affiliation(s)
- Stephanie C Hicks
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Kwame Okrah
- Genetech, Product Development Biostatistics, 1 DNA Way, South San Francisco, CA 94080, USA
| | - Joseph N Paulson
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - John Quackenbush
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Héctor Corrada Bravo
- Department of Computer Science, University of Maryland, College Park, USA and Center for Bioinformatics and Computational Biology, Institute of Advanced Computer Studies, University of Maryland, 8314 Paint Branch Dr., College Park, MD 20742, College Park, USA
| |
Collapse
|
40
|
Integration of DNA methylation patterns and genetic variation in human pediatric tissues help inform EWAS design and interpretation. Epigenetics Chromatin 2019; 12:1. [PMID: 30602389 PMCID: PMC6314079 DOI: 10.1186/s13072-018-0245-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 12/18/2018] [Indexed: 02/06/2023] Open
Abstract
Background The widespread use of accessible peripheral tissues for epigenetic analyses has prompted increasing interest in the study of tissue-specific DNA methylation (DNAm) variation in human populations. To date, characterizations of inter-individual DNAm variability and DNAm concordance across tissues have been largely performed in adult tissues and therefore are limited in their relevance to DNAm profiles from pediatric samples. Given that DNAm patterns in early life undergo rapid changes and have been linked to a wide range of health outcomes and environmental exposures, direct investigations of tissue-specific DNAm variation in pediatric samples may help inform the design and interpretation of DNAm analyses from early life cohorts. In this study, we present a systematic comparison of genome-wide DNAm patterns between matched pediatric buccal epithelial cells (BECs) and peripheral blood mononuclear cells (PBMCs), two of the most widely used peripheral tissues in human epigenetic studies. Specifically, we assessed DNAm variability, cross-tissue DNAm concordance and genetic determinants of DNAm across two independent early life cohorts encompassing different ages. Results BECs had greater inter-individual DNAm variability compared to PBMCs and highly the variable CpGs are more likely to be positively correlated between the matched tissues compared to less variable CpGs. These sites were enriched for CpGs under genetic influence, suggesting that a substantial proportion of DNAm covariation between tissues can be attributed to genetic variation. Finally, we demonstrated the relevance of our findings to human epigenetic studies by categorizing CpGs from published DNAm association studies of pediatric BECs and peripheral blood. Conclusions Taken together, our results highlight a number of important considerations and practical implications in the design and interpretation of EWAS analyses performed in pediatric peripheral tissues. Electronic supplementary material The online version of this article (10.1186/s13072-018-0245-6) contains supplementary material, which is available to authorized users.
Collapse
|
41
|
Meng Q, Wang K, Brunetti T, Xia Y, Jiao C, Dai R, Fitzgerald D, Thomas A, Jay L, Eckart H, Grennan K, Imamura-Kawasawa Y, Li M, Sestan N, White KP, Chen C, Liu C. The DGCR5 long noncoding RNA may regulate expression of several schizophrenia-related genes. Sci Transl Med 2018; 10:eaat6912. [PMID: 30545965 PMCID: PMC6487854 DOI: 10.1126/scitranslmed.aat6912] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 10/22/2018] [Indexed: 12/12/2022]
Abstract
A number of studies indicate that rare copy number variations (CNVs) contribute to the risk of schizophrenia (SCZ). Most of these studies have focused on protein-coding genes residing in the CNVs. Here, we investigated long noncoding RNAs (lncRNAs) within 10 SCZ risk-associated CNV deletion regions (CNV-lncRNAs) and examined their potential contribution to SCZ risk. We used RNA sequencing transcriptome data derived from postmortem brain tissue from control individuals without psychiatric disease as part of the PsychENCODE BrainGVEX and Developmental Capstone projects. We carried out weighted gene coexpression network analysis to identify protein-coding genes coexpressed with CNV-lncRNAs in the human brain. We identified one neuronal function-related coexpression module shared by both datasets. This module contained a lncRNA called DGCR5 within the 22q11.2 CNV region, which was identified as a hub gene. Protein-coding genes associated with SCZ genome-wide association study signals, de novo mutations, or differential expression were also contained in this neuronal module. Using DGCR5 knockdown and overexpression experiments in human neural progenitor cells derived from human induced pluripotent stem cells, we identified a potential role for DGCR5 in regulating certain SCZ-related genes.
Collapse
Affiliation(s)
- Qingtuan Meng
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Kangli Wang
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Tonya Brunetti
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Yan Xia
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Chuan Jiao
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Rujia Dai
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Dominic Fitzgerald
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Amber Thomas
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Lindsey Jay
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Heather Eckart
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Kay Grennan
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Yuka Imamura-Kawasawa
- Departments of Pharmacology and Biochemistry and Molecular Biology, Institute for Personalized Medicine, Penn State College of Medicine, Hershey, PA 17033, USA
- Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Mingfeng Li
- Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Nenad Sestan
- Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT, USA
| | - Kevin P White
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
- Tempus Labs Inc., Chicago, IL 60654, USA
| | - Chao Chen
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China.
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China
| | - Chunyu Liu
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China.
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
- School of Psychology, Shaanxi Normal University, Xian, Shaanxi, China
| |
Collapse
|
42
|
Min JL, Hemani G, Davey Smith G, Relton C, Suderman M. Meffil: efficient normalization and analysis of very large DNA methylation datasets. Bioinformatics 2018; 34:3983-3989. [PMID: 29931280 PMCID: PMC6247925 DOI: 10.1093/bioinformatics/bty476] [Citation(s) in RCA: 130] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Accepted: 06/18/2018] [Indexed: 12/11/2022] Open
Abstract
Motivation DNA methylation datasets are growing ever larger both in sample size and genome coverage. Novel computational solutions are required to efficiently handle these data. Results We have developed meffil, an R package designed for efficient quality control, normalization and epigenome-wide association studies of large samples of Illumina Methylation BeadChip microarrays. A complete re-implementation of functional normalization minimizes computational memory without increasing running time. Incorporating fixed and random effects within functional normalization, and automated estimation of functional normalization parameters reduces technical variation in DNA methylation levels, thus reducing false positive rates and improving power. Support for normalization of datasets distributed across physically different locations without needing to share biologically-based individual-level data means that meffil can be used to reduce heterogeneity in meta-analyses of epigenome-wide association studies. Availability and implementation https://github.com/perishky/meffil/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- J L Min
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| | - G Hemani
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| | - G Davey Smith
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| | - C Relton
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| | - M Suderman
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK.,Bristol Medical School, University of Bristol, Bristol, UK
| |
Collapse
|
43
|
McEwen LM, Jones MJ, Lin DTS, Edgar RD, Husquin LT, MacIsaac JL, Ramadori KE, Morin AM, Rider CF, Carlsten C, Quintana-Murci L, Horvath S, Kobor MS. Systematic evaluation of DNA methylation age estimation with common preprocessing methods and the Infinium MethylationEPIC BeadChip array. Clin Epigenetics 2018; 10:123. [PMID: 30326963 PMCID: PMC6192219 DOI: 10.1186/s13148-018-0556-2] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 10/01/2018] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND The capacity of technologies measuring DNA methylation (DNAm) is rapidly evolving, as are the options for applicable bioinformatics methods. The most commonly used DNAm microarray, the Illumina Infinium HumanMethylation450 (450K array), has recently been replaced by the Illumina Infinium HumanMethylationEPIC (EPIC array), nearly doubling the number of targeted CpG sites. Given that a subset of 450K CpG sites is absent on the EPIC array and that several tools for both data normalization and analyses were developed on the 450K array, it is important to assess their utility when applied to EPIC array data. One of the most commonly used 450K tools is the pan-tissue epigenetic clock, a multivariate predictor of biological age based on DNAm at 353 CpG sites. Of these CpGs, 19 are missing from the EPIC array, thus raising the question of whether EPIC data can be used to accurately estimate DNAm age. We also investigated a 71-CpG epigenetic age predictor, referred to as the Hannum method, which lacks 6 probes on the EPIC array. To evaluate these epigenetic clocks in EPIC data properly, a prior assessment of the effects of data preprocessing methods on DNAm age is also required. METHODS DNAm was quantified, on both the 450K and EPIC platforms, from human primary monocytes derived from 172 individuals. We calculated DNAm age from raw, and three different preprocessed data forms to assess the effects of different processing methods on the DNAm age estimate. Using an additional cohort, we also investigated DNAm age of peripheral blood mononuclear cells, bronchoalveolar lavage, and bronchial brushing samples using the EPIC array. RESULTS Using monocyte-derived data from subjects on both the 450K and EPIC, we found that DNAm age was highly correlated across both raw and preprocessing methods (r > 0.91). Thus, the correlation between chronological age and the DNAm age estimate is largely unaffected by platform differences and normalization methods. However, we found that the choice of normalization method and measurement platform can lead to a systematic offset in the age estimate which in turn leads to an increase in the median error. Comparing the 450K and EPIC DNAm age estimates, we observed that the median absolute difference was 1.44-3.10 years across preprocessing methods. CONCLUSIONS Here, we have provided evidence that the epigenetic clock is resistant to the lack of 19 CpG sites missing from the EPIC array as well as highlighted the importance of considering the technical variance of the epigenetic when interpreting group differences below the reported error. Furthermore, our study highlights the utility of epigenetic age acceleration measure, the residuals from a linear regression of DNAm age on chronological age, as the resulting values are robust with respect to normalization methods and measurement platforms.
Collapse
Affiliation(s)
- Lisa M McEwen
- BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, TRB A5-151, Vancouver, BC V5Z 4H4 Canada
| | - Meaghan J Jones
- BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, TRB A5-151, Vancouver, BC V5Z 4H4 Canada
| | - David Tse Shen Lin
- BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, TRB A5-151, Vancouver, BC V5Z 4H4 Canada
| | - Rachel D Edgar
- BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, TRB A5-151, Vancouver, BC V5Z 4H4 Canada
| | - Lucas T Husquin
- Unit of Human Evolutionary Genetics, Institut Pasteur, 75015 Paris, France
- Centre National de la Recherche Scientifique (CNRS) UMR2000, 75015 Paris, France
- Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, 75015 Paris, France
| | - Julia L MacIsaac
- BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, TRB A5-151, Vancouver, BC V5Z 4H4 Canada
| | - Katia E Ramadori
- BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, TRB A5-151, Vancouver, BC V5Z 4H4 Canada
| | - Alexander M Morin
- BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, TRB A5-151, Vancouver, BC V5Z 4H4 Canada
| | - Christopher F Rider
- Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Vancouver, BC Canada
| | - Chris Carlsten
- Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Vancouver, BC Canada
| | - Lluís Quintana-Murci
- Unit of Human Evolutionary Genetics, Institut Pasteur, 75015 Paris, France
- Centre National de la Recherche Scientifique (CNRS) UMR2000, 75015 Paris, France
- Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, 75015 Paris, France
| | - Steve Horvath
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA USA
| | - Michael S Kobor
- BC Children’s Hospital Research Institute, Department of Medical Genetics, University of British Columbia, 950 West 28th Avenue, TRB A5-151, Vancouver, BC V5Z 4H4 Canada
| |
Collapse
|
44
|
Affiliation(s)
- Meng Pan
- Department of Optoelectronic Engineering, College of Science and Engineering, Jinan University, Guangzhou, Guangdong, PR China
| | - Jie Zhang
- Department of Physics, College of Science and Engineering, Jinan University, Guangzhou, Guangdong, PR China
| |
Collapse
|
45
|
Tosti L, Ashmore J, Tan BSN, Carbone B, Mistri TK, Wilson V, Tomlinson SR, Kaji K. Mapping transcription factor occupancy using minimal numbers of cells in vitro and in vivo. Genome Res 2018; 28:592-605. [PMID: 29572359 PMCID: PMC5880248 DOI: 10.1101/gr.227124.117] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 03/05/2018] [Indexed: 01/11/2023]
Abstract
The identification of transcription factor (TF) binding sites in the genome is critical to understanding gene regulatory networks (GRNs). While ChIP-seq is commonly used to identify TF targets, it requires specific ChIP-grade antibodies and high cell numbers, often limiting its applicability. DNA adenine methyltransferase identification (DamID), developed and widely used in Drosophila, is a distinct technology to investigate protein–DNA interactions. Unlike ChIP-seq, it does not require antibodies, precipitation steps, or chemical protein–DNA crosslinking, but to date it has been seldom used in mammalian cells due to technical limitations. Here we describe an optimized DamID method coupled with next-generation sequencing (DamID-seq) in mouse cells and demonstrate the identification of the binding sites of two TFs, POU5F1 (also known as OCT4) and SOX2, in as few as 1000 embryonic stem cells (ESCs) and neural stem cells (NSCs), respectively. Furthermore, we have applied this technique in vivo for the first time in mammals. POU5F1 DamID-seq in the gastrulating mouse embryo at 7.5 d post coitum (dpc) successfully identified multiple POU5F1 binding sites proximal to genes involved in embryo development, neural tube formation, and mesoderm-cardiac tissue development, consistent with the pivotal role of this TF in post-implantation embryo. This technology paves the way to unprecedented investigation of TF–DNA interactions and GRNs in specific cell types of limited availability in mammals, including in vivo samples.
Collapse
Affiliation(s)
- Luca Tosti
- MRC Centre for Regenerative Medicine, University of Edinburgh, Edinburgh BioQuarter, Edinburgh, EH16 4UU, Scotland, United Kingdom
| | - James Ashmore
- MRC Centre for Regenerative Medicine, University of Edinburgh, Edinburgh BioQuarter, Edinburgh, EH16 4UU, Scotland, United Kingdom
| | - Boon Siang Nicholas Tan
- MRC Centre for Regenerative Medicine, University of Edinburgh, Edinburgh BioQuarter, Edinburgh, EH16 4UU, Scotland, United Kingdom
| | - Benedetta Carbone
- MRC Centre for Regenerative Medicine, University of Edinburgh, Edinburgh BioQuarter, Edinburgh, EH16 4UU, Scotland, United Kingdom
| | - Tapan K Mistri
- MRC Centre for Regenerative Medicine, University of Edinburgh, Edinburgh BioQuarter, Edinburgh, EH16 4UU, Scotland, United Kingdom
| | - Valerie Wilson
- MRC Centre for Regenerative Medicine, University of Edinburgh, Edinburgh BioQuarter, Edinburgh, EH16 4UU, Scotland, United Kingdom
| | - Simon R Tomlinson
- MRC Centre for Regenerative Medicine, University of Edinburgh, Edinburgh BioQuarter, Edinburgh, EH16 4UU, Scotland, United Kingdom
| | - Keisuke Kaji
- MRC Centre for Regenerative Medicine, University of Edinburgh, Edinburgh BioQuarter, Edinburgh, EH16 4UU, Scotland, United Kingdom
| |
Collapse
|
46
|
Lazar-Stefanita L, Scolari VF, Mercy G, Muller H, Guérin TM, Thierry A, Mozziconacci J, Koszul R. Cohesins and condensins orchestrate the 4D dynamics of yeast chromosomes during the cell cycle. EMBO J 2017; 36:2684-2697. [PMID: 28729434 PMCID: PMC5599795 DOI: 10.15252/embj.201797342] [Citation(s) in RCA: 86] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2017] [Revised: 06/29/2017] [Accepted: 07/04/2017] [Indexed: 11/09/2022] Open
Abstract
Duplication and segregation of chromosomes involves dynamic reorganization of their internal structure by conserved architectural proteins, including the structural maintenance of chromosomes (SMC) complexes cohesin and condensin. Despite active investigation of the roles of these factors, a genome-wide view of dynamic chromosome architecture at both small and large scale during cell division is still missing. Here, we report the first comprehensive 4D analysis of the higher-order organization of the Saccharomyces cerevisiae genome throughout the cell cycle and investigate the roles of SMC complexes in controlling structural transitions. During replication, cohesion establishment promotes numerous long-range intra-chromosomal contacts and correlates with the individualization of chromosomes, which culminates at metaphase. In anaphase, mitotic chromosomes are abruptly reorganized depending on mechanical forces exerted by the mitotic spindle. Formation of a condensin-dependent loop bridging the centromere cluster with the rDNA loci suggests that condensin-mediated forces may also directly facilitate segregation. This work therefore comprehensively recapitulates cell cycle-dependent chromosome dynamics in a unicellular eukaryote, but also unveils new features of chromosome structural reorganization during highly conserved stages of cell division.
Collapse
Affiliation(s)
- Luciana Lazar-Stefanita
- Institut Pasteur, Department Genomes and Genetics, Unité Régulation Spatiale des Génomes, Paris, France
- CNRS, UMR 3525, Paris, France
- Institut Pasteur, CNRS Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI), USR 3756, Paris, France
- Sorbonne Universités, UPMC Université Paris 6, Complexité du Vivant, Paris, France
| | - Vittore F Scolari
- Institut Pasteur, Department Genomes and Genetics, Unité Régulation Spatiale des Génomes, Paris, France
- CNRS, UMR 3525, Paris, France
- Institut Pasteur, CNRS Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI), USR 3756, Paris, France
| | - Guillaume Mercy
- Institut Pasteur, Department Genomes and Genetics, Unité Régulation Spatiale des Génomes, Paris, France
- CNRS, UMR 3525, Paris, France
- Institut Pasteur, CNRS Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI), USR 3756, Paris, France
- Sorbonne Universités, UPMC Université Paris 6, Complexité du Vivant, Paris, France
| | - Héloise Muller
- Institut Pasteur, Department Genomes and Genetics, Unité Régulation Spatiale des Génomes, Paris, France
- CNRS, UMR 3525, Paris, France
- Institut Pasteur, CNRS Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI), USR 3756, Paris, France
| | - Thomas M Guérin
- Laboratoire Télomères et Réparation du Chromosome, CEA, INSERM, UMR 967, IRCM, Université Paris-Saclay, Fontenay-aux-Roses, France
| | - Agnès Thierry
- Institut Pasteur, Department Genomes and Genetics, Unité Régulation Spatiale des Génomes, Paris, France
- CNRS, UMR 3525, Paris, France
- Institut Pasteur, CNRS Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI), USR 3756, Paris, France
| | - Julien Mozziconacci
- Sorbonne Universités, Theoretical Physics for Condensed Matter Lab, UPMC Université Paris 06, Paris, France
- CNRS, UMR 7600, Paris, France
| | - Romain Koszul
- Institut Pasteur, Department Genomes and Genetics, Unité Régulation Spatiale des Génomes, Paris, France
- CNRS, UMR 3525, Paris, France
- Institut Pasteur, CNRS Center of Bioinformatics, Biostatistics and Integrative Biology (C3BI), USR 3756, Paris, France
| |
Collapse
|
47
|
von der Haar M, Lindner P, Scheper T, Stahl F. Array Analysis Manager-An automated DNA microarray analysis tool simplifying microarray data filtering, bias recognition, normalization, and expression analysis. Eng Life Sci 2017; 17:841-846. [PMID: 32624831 PMCID: PMC6999572 DOI: 10.1002/elsc.201700046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 04/18/2017] [Accepted: 05/16/2017] [Indexed: 11/11/2022] Open
Abstract
Desoxyribonucleic acid (DNA) microarray experiments generate big datasets. To successfully harness the potential information within, multiple filtering, normalization, and analysis methods need to be applied. An in-depth knowledge of underlying physical, chemical, and statistical processes is crucial to the success of this analysis. However, due to the interdisciplinarity of DNA microarray applications and experimenter backgrounds, the published analyses differ greatly, for example, in methodology. This severely limits the comprehensibility and comparability among studies and research fields. In this work, we present a novel end-user software, developed to automatically filter, normalize, and analyze two-channel microarray experiment data. It enables the user to analyze single chip, dye-swap, and loop experiments with an extended dynamic intensity range using a multiscan approach. Furthermore, to our knowledge, this is the first analysis software solution, that can account for photobleaching, automatically detected by an artificial neural network. The user gets feedback on the effectiveness of each applied normalization regarding bias minimization. Standardized methods for expression analysis are included as well as the possibility to export the results in the Gene Expression Omnibus (GEO) format. This software was designed to simplify the microarray analysis process and help the experimenter to make educated decisions about the analysis process to contribute to reproducibility and comparability.
Collapse
Affiliation(s)
| | - Patrick Lindner
- Institut für Technische ChemieLeibniz Universität HannoverHannoverGermany
| | - Thomas Scheper
- Institut für Technische ChemieLeibniz Universität HannoverHannoverGermany
| | - Frank Stahl
- Institut für Technische ChemieLeibniz Universität HannoverHannoverGermany
| |
Collapse
|
48
|
Wang H, Yan W, Zhang S, Gu Y, Wang Y, Wei Y, Liu H, Wang F, Wu Q, Zhang Y. Survival differences of CIMP subtypes integrated with CNA information in human breast cancer. Oncotarget 2017; 8:48807-48819. [PMID: 28415743 PMCID: PMC5564726 DOI: 10.18632/oncotarget.16178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 03/01/2017] [Indexed: 12/31/2022] Open
Abstract
CpG island methylator phenotype of breast cancer is associated with widespread aberrant methylation at specified CpG islands and distinct patient outcomes. However, the influence of copy number contributing to the prognosis of tumors with different CpG island methylator phenotypes is still unclear. We analyzed both genetic (copy number) and epigenetic alterations in 765 breast cancers from The Cancer Genome Atlas data portal and got a panel of 15 biomarkers for copy number and methylation status evaluation. The gene panel identified two groups corresponding to distinct copy number profiles. In status of mere-loss copy number, patients were faced with a greater risk if they presented a higher CpG islands methylation pattern in biomarker panels. But for samples presenting merely-gained copy number, higher methylation level of CpG islands was associated with improved viability. In all, the integration of copy number alteration and methylation information enhanced the classification power on prognosis. Moreover, we found the molecular subtypes of breast cancer presented different distributions in two CpG island methylation phenotypes. Generated by the same set of human methylation 450K data, additional copy number information could provide insights into survival prediction of cancers with less heterogeneity and might help to determine the biomarkers for diagnosis and treatment for breast cancer patients in a more personalized approach.
Collapse
Affiliation(s)
- Huihan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Weili Yan
- School of Life Science and Technology, State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology, Harbin, 150001, China
| | - Shumei Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yue Gu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yihan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yanjun Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Hongbo Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Fang Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Qiong Wu
- School of Life Science and Technology, State Key Laboratory of Urban Water Resource and Environment, Harbin Institute of Technology, Harbin, 150001, China
| | - Yan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| |
Collapse
|
49
|
Du J, Tian J, Ding L, Trac C, Xia B, Sun S, Schones DE, Huang W. Vertical sleeve gastrectomy reverses diet-induced gene-regulatory changes impacting lipid metabolism. Sci Rep 2017; 7:5274. [PMID: 28706189 PMCID: PMC5509746 DOI: 10.1038/s41598-017-05349-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 05/26/2017] [Indexed: 02/08/2023] Open
Abstract
Vertical sleeve gastrectomy (VSG) produces sustainable weight loss, remission of type 2 diabetes (T2D), and improvement of nonalcoholic fatty liver disease (NAFLD). However, the molecular mechanisms underlying the metabolic benefits of VSG have remained elusive. According to our previous results, diet-induced obesity induces epigenetic modifications to chromatin in mouse liver. We demonstrate here that VSG in C57BL/6J wild-type male mice can reverse these chromatin modifications and thereby impact the expression of key metabolic genes. Genes involved in lipid metabolism, especially omega-6 fatty acid metabolism, are up-regulated in livers of mice after VSG while genes in inflammatory pathways are down-regulated after VSG. Consistent with gene expression changes, regulatory regions near genes involved in inflammatory response displayed decreased chromatin accessibility after VSG. Our results indicate that VSG induces global regulatory changes that impact hepatic inflammatory and lipid metabolic pathways, providing new insight into the mechanisms underlying the beneficial metabolic effects induced by VSG.
Collapse
Affiliation(s)
- Juan Du
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA, USA.,Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, CA, USA
| | - Jingyan Tian
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA, USA. .,Shanghai Clinical Center for Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Department of Endocrinology and Metabolism, China National Research Center for Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | - Lili Ding
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA, USA
| | - Candi Trac
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA, USA
| | - Brian Xia
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA, USA
| | - Siming Sun
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA, USA
| | - Dustin E Schones
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA, USA. .,Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, CA, USA.
| | - Wendong Huang
- Department of Diabetes Complications and Metabolism, Beckman Research Institute, City of Hope, Duarte, CA, USA. .,Irell & Manella Graduate School of Biological Sciences, City of Hope, Duarte, CA, USA.
| |
Collapse
|
50
|
Variation-preserving normalization unveils blind spots in gene expression profiling. Sci Rep 2017; 7:42460. [PMID: 28276435 PMCID: PMC5343588 DOI: 10.1038/srep42460] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Accepted: 01/11/2017] [Indexed: 11/17/2022] Open
Abstract
RNA-Seq and gene expression microarrays provide comprehensive profiles of gene activity, but lack of reproducibility has hindered their application. A key challenge in the data analysis is the normalization of gene expression levels, which is currently performed following the implicit assumption that most genes are not differentially expressed. Here, we present a mathematical approach to normalization that makes no assumption of this sort. We have found that variation in gene expression is much larger than currently believed, and that it can be measured with available assays. Our results also explain, at least partially, the reproducibility problems encountered in transcriptomics studies. We expect that this improvement in detection will help efforts to realize the full potential of gene expression profiling, especially in analyses of cellular processes involving complex modulations of gene expression.
Collapse
|