1
|
Cao S, Wang L, Feng Y, Peng XD, Li LM. A data integration approach unveils a transcriptional signature of type 2 diabetes progression in rat and human islets. PLoS One 2023; 18:e0292579. [PMID: 37816033 PMCID: PMC10564241 DOI: 10.1371/journal.pone.0292579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 09/22/2023] [Indexed: 10/12/2023] Open
Abstract
Pancreatic islet failure is a key characteristic of type 2 diabetes besides insulin resistance. To get molecular insights into the pathology of islets in type 2 diabetes, we developed a computational approach to integrating expression profiles of Goto-Kakizaki and Wistar rat islets from a designed experiment with those of the human islets from an observational study. A principal gene-eigenvector in the expression profiles characterized by up-regulated angiogenesis and down-regulated oxidative phosphorylation was identified conserved across the two species. In the case of Goto-Kakizaki versus Wistar islets, such alteration in gene expression can be verified directly by the treatment-control tests over time, and corresponds to the alteration of α/β-cell distribution obtained by quantifying the islet micrographs. Furthermore, the correspondence between the dual sample- and gene-eigenvectors unveils more delicate structures. In the case of rats, the up- and down-trend of insulin mRNA levels before and after week 8 correspond respectively to the top two principal eigenvectors. In the case of human, the top two principal eigenvectors correspond respectively to the late and early stages of diabetes. According to the aggregated expression signature, a large portion of genes involved in the hypoxia-inducible factor signaling pathway, which activates transcription of angiogenesis, were significantly up-regulated. Furthermore, top-ranked anti-angiogenic genes THBS1 and PEDF indicate the existence of a counteractive mechanism that is in line with thickened and fragmented capillaries found in the deteriorated islets. Overall, the integrative analysis unravels the principal transcriptional alterations underlying the islet deterioration of morphology and insulin secretion along type 2 diabetes progression.
Collapse
Affiliation(s)
- Shenghao Cao
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Linting Wang
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Yance Feng
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| | - Xiao-ding Peng
- Department of Biochemistry and Molecular Genetics, The University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Lei M. Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
2
|
Feng Y, Li LM. MUREN: a robust and multi-reference approach of RNA-seq transcript normalization. BMC Bioinformatics 2021; 22:386. [PMID: 34320923 PMCID: PMC8317383 DOI: 10.1186/s12859-021-04288-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 07/08/2021] [Indexed: 09/03/2023] Open
Abstract
Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04288-0.
Collapse
Affiliation(s)
- Yance Feng
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Lei M Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China. .,University of Chinese Academy of Sciences, Beijing, China. .,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
3
|
Feng Y, Zhang S, Li L, Li LM. The cis-trans binding strength defined by motif frequencies facilitates statistical inference of transcriptional regulation. BMC Bioinformatics 2019; 20:201. [PMID: 31074378 PMCID: PMC6509875 DOI: 10.1186/s12859-019-2732-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND A key problem in systems biology is the determination of the regulatory mechanism corresponding to a phenotype. An empirical approach in this regard is to compare the expression profiles of cells under two conditions or tissues from two phenotypes and to unravel the underlying transcriptional regulation. We have proposed the method BASE to statistically infer the effective regulatory factors that are responsible for the gene expression differentiation with the help from the binding data between factors and genes. Usually the protein-DNA binding data are obtained by ChIP-seq experiments, which could be costly and are condition-specific. RESULTS Here we report a definition of binding strength based on a probability model. Using this condition-free definition, the BASE method needs only the frequencies of cis-motifs in regulatory regions, thereby the inferences can be carried out in silico. The directional regulation can be inferred by considering down- and up-regulation separately. We showed the effectiveness of the approach by one case study. In the study of the effects of polyunsaturated fatty acids (PUFA), namely, docosahexaenoic (DHA) and eicosapentaenoic (EPA) diets on mouse small intestine cells, the inferences of regulations are consistent with those reported in the literature, including PPARα and NFκB, respectively corresponding to enhanced adipogenesis and reduced inflammation. Moreover, we discovered enhanced RORA regulation of circadian rhythm, and reduced ETS1 regulation of angiogenesis. CONCLUSIONS With the probabilistic definition of cis-trans binding affinity, the BASE method could obtain the significances of TF regulation changes corresponding to a gene expression differentiation profile between treatment and control samples. The landscape of the inferred cis-trans regulations is helpful for revealing the underlying molecular mechanisms. Particularly we reported a more comprehensive regulation induced by EPA&DHA diet.
Collapse
Affiliation(s)
- Yance Feng
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sheng Zhang
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Liang Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lei M Li
- National Center of Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China. .,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China.
| |
Collapse
|
4
|
A Novel Dual Eigen-Analysis of Mouse Multi-Tissues' Expression Profiles Unveils New Perspectives into Type 2 Diabetes. Sci Rep 2017; 7:5044. [PMID: 28698587 PMCID: PMC5506042 DOI: 10.1038/s41598-017-05405-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 05/26/2017] [Indexed: 12/30/2022] Open
Abstract
Type 2 diabetes (T2D) is a complex and polygenic disease yet in need of a complete picture of its development mechanisms. To better understand the mechanisms, we examined gene expression profiles of multi-tissues from outbred mice fed with a high-fat diet (HFD) or regular chow at weeks 1, 9, and 18. To analyze such complex data, we proposed a novel dual eigen-analysis, in which the sample- and gene-eigenvectors correspond respectively to the macro- and micro-biology information. The dual eigen-analysis identified the HFD eigenvectors as well as the endogenous eigenvectors for each tissue. The results imply that HFD influences the hepatic function or the pancreatic development as an exogenous factor, while in adipose HFD's impact roughly coincides with the endogenous eigenvector driven by aging. The enrichment analysis of the eigenvectors revealed diverse HFD impact on the three tissues over time. The diversity includes: inflammation, degradation of branched chain amino acids (BCAA), and regulation of peroxisome proliferator activated receptor gamma (PPARγ). We reported that in the pancreas remarkable up-regulation of angiogenesis as downstream of the HIF signaling pathway precedes hyperinsulinemia. The dual eigen-analysis and discoveries provide new evaluations/guidance in T2D prevention and therapy, and will also promote new thinking in biology and medicine.
Collapse
|
5
|
Kremsky I, Morgan TE, Hou X, Li L, Finch CE. Age-changes in gene expression in primary mixed glia cultures from young vs. old rat cerebral cortex are modified by interactions with neurons. Brain Behav Immun 2012; 26:797-802. [PMID: 22226781 PMCID: PMC3703782 DOI: 10.1016/j.bbi.2011.12.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/27/2011] [Revised: 12/08/2011] [Accepted: 12/19/2011] [Indexed: 11/25/2022] Open
Abstract
Astrocytic GFAP expression increases during normal aging in many brain regions and in primary astrocyte cultures derived from aging rodent brains. As shown below, we unexpectedly found that the age-related increase of GFAP expression was suppressed in mixed glia (astrocytes+microglia). However, the age-related increase of GFAP was observed when E18 neurons were co-cultured with mixed glia. Thus, the presence of microglia can suppress the age-related increase of GFAP, in primary cultures of astrocytes. To more broadly characterize how aging and co-culture with neurons alters glial gene expression, we profiled gene expression in mixed glia from young (3 mo) and old (24 mo) male rat cerebral cortex by Affymetrix microarray (Rat230 2.0). The majority of age changes were independent of the presence of neurons. Overall, the expression of twofold more genes increased with age than decreased with age. The minority of age changes that were either suppressed or revealed by the presence of neurons may be useful to analyze glial-neuron interaction during aging. Some in vitro changes are shared with those of aging rat hippocampus in studies from the Landfield group (Rowe et al., 2007; Kadish et al., 2009).
Collapse
|
6
|
Hulsman M, Mentink A, van Someren EP, Dechering KJ, de Boer J, Reinders MJ. Delineation of amplification, hybridization and location effects in microarray data yields better-quality normalization. BMC Bioinformatics 2010; 11:156. [PMID: 20346103 PMCID: PMC2857856 DOI: 10.1186/1471-2105-11-156] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2009] [Accepted: 03/26/2010] [Indexed: 11/17/2022] Open
Abstract
Background Oligonucleotide arrays have become one of the most widely used high-throughput tools in biology. Due to their sensitivity to experimental conditions, normalization is a crucial step when comparing measurements from these arrays. Normalization is, however, far from a solved problem. Frequently, we encounter datasets with significant technical effects that currently available methods are not able to correct. Results We show that by a careful decomposition of probe specific amplification, hybridization and array location effects, a normalization can be performed that allows for a much improved analysis of these data. Identification of the technical sources of variation between arrays has allowed us to build statistical models that are used to estimate how the signal of individual probes is affected, based on their properties. This enables a model-based normalization that is probe-specific, in contrast with the signal intensity distribution normalization performed by many current methods. Next to this, we propose a novel way of handling background correction, enabling the use of background information to weight probes during summarization. Testing of the proposed method shows a much improved detection of differentially expressed genes over earlier proposed methods, even when tested on (experimentally tightly controlled and replicated) spike-in datasets. Conclusions When a limited number of arrays are available, or when arrays are run in different batches, technical effects have a large influence on the measured expression of genes. We show that a detailed modelling and correction of these technical effects allows for an improved analysis in these situations.
Collapse
Affiliation(s)
- Marc Hulsman
- Delft Bioinformatics Lab, Delft University of Technology, Mekelweg 4, Delft 2628 CD, The Netherlands.
| | | | | | | | | | | |
Collapse
|
7
|
Ge H, Wei M, Fabrizio P, Hu J, Cheng C, Longo VD, Li LM. Comparative analyses of time-course gene expression profiles of the long-lived sch9Delta mutant. Nucleic Acids Res 2009; 38:143-58. [PMID: 19880387 PMCID: PMC2800218 DOI: 10.1093/nar/gkp849] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
In an attempt to elucidate the underlying longevity-promoting mechanisms of mutants lacking SCH9, which live three times as long as wild type chronologically, we measured their time-course gene expression profiles. We interpreted their expression time differences by statistical inferences based on prior biological knowledge, and identified the following significant changes: (i) between 12 and 24 h, stress response genes were up-regulated by larger fold changes and ribosomal RNA (rRNA) processing genes were down-regulated more dramatically; (ii) mitochondrial ribosomal protein genes were not up-regulated between 12 and 60 h as wild type were; (iii) electron transport, oxidative phosphorylation and TCA genes were down-regulated early; (iv) the up-regulation of TCA and electron transport was accompanied by deep down-regulation of rRNA processing over time; and (v) rRNA processing genes were more volatile over time, and three associated cis-regulatory elements [rRNA processing element (rRPE), polymerase A and C (PAC) and glucose response element (GRE)] were identified. Deletion of AZF1, which encodes the transcriptional factor that binds to the GRE element, reversed the lifespan extension of sch9Δ. The significant alterations in these time-dependent expression profiles imply that the lack of SCH9 turns on the longevity programme that extends the lifespan through changes in metabolic pathways and protection mechanisms, particularly, the regulation of aerobic respiration and rRNA processing.
Collapse
Affiliation(s)
- Huanying Ge
- Andrus Gerontology Center, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | | | | | | | | | | | | |
Collapse
|
8
|
Kadota K, Nakai Y, Shimizu K. Ranking differentially expressed genes from Affymetrix gene expression data: methods with reproducibility, sensitivity, and specificity. Algorithms Mol Biol 2009; 4:7. [PMID: 19386098 PMCID: PMC2679019 DOI: 10.1186/1748-7188-4-7] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2008] [Accepted: 04/22/2009] [Indexed: 12/20/2022] Open
Abstract
Background To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix GeneChip system need to select both a preprocessing algorithm to obtain expression-level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility. Results We compared eight conventional methods for ranking genes: weighted average difference (WAD), average difference (AD), fold change (FC), rank products (RP), moderated t statistic (modT), significance analysis of microarrays (samT), shrinkage t statistic (shrinkT), and intensity-based moderated t statistic (ibmT) with six preprocessing algorithms (PLIER, VSN, FARMS, multi-mgMOS (mmgMOS), MBEI, and GCRMA). A total of 36 real experimental datasets was evaluated on the basis of the area under the receiver operating characteristic curve (AUC) as a measure for both sensitivity and specificity. We found that the RP method performed well for VSN-, FARMS-, MBEI-, and GCRMA-preprocessed data, and the WAD method performed well for mmgMOS-preprocessed data. Our analysis of the MicroArray Quality Control (MAQC) project's datasets showed that the FC-based gene ranking methods (WAD, AD, FC, and RP) had a higher level of reproducibility: The percentages of overlapping genes (POGs) across different sites for the FC-based methods were higher overall than those for the t-statistic-based methods (modT, samT, shrinkT, and ibmT). In particular, POG values for WAD were the highest overall among the FC-based methods irrespective of the choice of preprocessing algorithm. Conclusion Our results demonstrate that to increase sensitivity, specificity, and reproducibility in microarray analyses, we need to select suitable combinations of preprocessing algorithms and gene ranking methods. We recommend the use of FC-based methods, in particular RP or WAD.
Collapse
|