1
|
Ziemann M, Abeysooriya M, Bora A, Lamon S, Kasu MS, Norris MW, Wong YT, Craig JM. Direction-aware functional class scoring enrichment analysis of infinium DNA methylation data. Epigenetics 2024; 19:2375022. [PMID: 38967555 PMCID: PMC11229754 DOI: 10.1080/15592294.2024.2375022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 06/26/2024] [Indexed: 07/06/2024] Open
Abstract
Infinium Methylation BeadChip arrays remain one of the most popular platforms for epigenome-wide association studies, but tools for downstream pathway analysis have their limitations. Functional class scoring (FCS) is a group of pathway enrichment techniques that involve the ranking of genes and evaluation of their collective regulation in biological systems, but the implementations described for Infinium methylation array data do not retain direction information, which is important for mechanistic understanding of genomic regulation. Here, we evaluate several candidate FCS methods that retain directional information. According to simulation results, the best-performing method involves the mean aggregation of probe limma t-statistics by gene followed by a rank-ANOVA enrichment test using the mitch package. This method, which we call 'LAM,' outperformed an existing over-representation analysis method in simulations, and showed higher sensitivity and robustness in an analysis of real lung tumour-normal paired datasets. Using matched RNA-seq data, we examine the relationship of methylation differences at promoters and gene bodies with RNA expression at the level of pathways in lung cancer. To demonstrate the utility of our approach, we apply it to three other contexts where public data were available. First, we examine the differential pathway methylation associated with chronological age. Second, we investigate pathway methylation differences in infants conceived with in vitro fertilization. Lastly, we analyse differential pathway methylation in 19 disease states, identifying hundreds of novel associations. These results show LAM is a powerful method for the detection of differential pathway methylation complementing existing methods. A reproducible vignette is provided to illustrate how to implement this method.
Collapse
Affiliation(s)
- Mark Ziemann
- Bioinformatics Working Group, Burnet Institute, Melbourne, Australia
- School of Life and Environmental Sciences, Deakin University, Geelong, Australia
| | - Mandhri Abeysooriya
- School of Life and Environmental Sciences, Deakin University, Geelong, Australia
- School of Exercise and Nutrition Sciences, Institute for Physical Activity and Nutrition, Deakin University, Geelong, Australia
| | - Anusuiya Bora
- Bioinformatics Working Group, Burnet Institute, Melbourne, Australia
- School of Life and Environmental Sciences, Deakin University, Geelong, Australia
| | - Séverine Lamon
- School of Exercise and Nutrition Sciences, Institute for Physical Activity and Nutrition, Deakin University, Geelong, Australia
| | - Mary Sravya Kasu
- School of Life and Environmental Sciences, Deakin University, Geelong, Australia
| | - Mitchell W. Norris
- School of Life and Environmental Sciences, Deakin University, Geelong, Australia
| | - Yen Ting Wong
- School of Medicine, Deakin University, Geelong, Australia
- Murdoch Children’s Research Institute, Melbourne, Australia
| | - Jeffrey M. Craig
- School of Medicine, Deakin University, Geelong, Australia
- Murdoch Children’s Research Institute, Melbourne, Australia
| |
Collapse
|
2
|
Sayed IM, Vo DT, Alcantara J, Inouye KM, Pranadinata RF, Luo L, Boland CR, Goyal NP, Kuo DJ, Huang SC, Sahoo D, Ghosh P, Das S. Molecular Signatures for Microbe-Associated Colorectal Cancers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.26.595902. [PMID: 38853996 PMCID: PMC11160670 DOI: 10.1101/2024.05.26.595902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Background Genetic factors and microbial imbalances play crucial roles in colorectal cancers (CRCs), yet the impact of infections on cancer initiation remains poorly understood. While bioinformatic approaches offer valuable insights, the rising incidence of CRCs creates a pressing need to precisely identify early CRC events. We constructed a network model to identify continuum states during CRC initiation spanning normal colonic tissue to pre-cancer lesions (adenomatous polyps) and examined the influence of microbes and host genetics. Methods A Boolean network was built using a publicly available transcriptomic dataset from healthy and adenoma affected patients to identify an invariant Microbe-Associated Colorectal Cancer Signature (MACS). We focused on Fusobacterium nucleatum ( Fn ), a CRC-associated microbe, as a model bacterium. MACS-associated genes and proteins were validated by RT-qPCR, RNA seq, ELISA, IF and IHCs in tissues and colon-derived organoids from genetically predisposed mice ( CPC-APC Min+/- ) and patients (FAP, Lynch Syndrome, PJS, and JPS). Results The MACS that is upregulated in adenomas consists of four core genes/proteins: CLDN2/Claudin-2 (leakiness), LGR5/leucine-rich repeat-containing receptor (stemness), CEMIP/cell migration-inducing and hyaluronan-binding protein (epithelial-mesenchymal transition) and IL8/Interleukin-8 (inflammation). MACS was induced upon Fn infection, but not in response to infection with other enteric bacteria or probiotics. MACS induction upon Fn infection was higher in CPC-APC Min+/- organoids compared to WT controls. The degree of MACS expression in the patient-derived organoids (PDOs) generally corresponded with the known lifetime risk of CRCs. Conclusions Computational prediction followed by validation in the organoid-based disease model identified the early events in CRC initiation. MACS reveals that the CRC-associated microbes induce a greater risk in the genetically predisposed hosts, suggesting its potential use for risk prediction and targeted cancer prevention.
Collapse
|
3
|
Chen Y, Mao R, Xu J, Huang Y, Xu J, Cui S, Zhu Z, Ji X, Huang S, Huang Y, Huang HY, Yen SC, Lin YCD, Huang HD. A Causal Regulation Modeling Algorithm for Temporal Events with Application to Escherichia coli's Aerobic to Anaerobic Transition. Int J Mol Sci 2024; 25:5654. [PMID: 38891842 PMCID: PMC11171773 DOI: 10.3390/ijms25115654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Revised: 05/10/2024] [Accepted: 05/21/2024] [Indexed: 06/21/2024] Open
Abstract
Time-series experiments are crucial for understanding the transient and dynamic nature of biological phenomena. These experiments, leveraging advanced classification and clustering algorithms, allow for a deep dive into the cellular processes. However, while these approaches effectively identify patterns and trends within data, they often need to improve in elucidating the causal mechanisms behind these changes. Building on this foundation, our study introduces a novel algorithm for temporal causal signaling modeling, integrating established knowledge networks with sequential gene expression data to elucidate signal transduction pathways over time. Focusing on Escherichia coli's (E. coli) aerobic to anaerobic transition (AAT), this research marks a significant leap in understanding the organism's metabolic shifts. By applying our algorithm to a comprehensive E. coli regulatory network and a time-series microarray dataset, we constructed the cross-time point core signaling and regulatory processes of E. coli's AAT. Through gene expression analysis, we validated the primary regulatory interactions governing this process. We identified a novel regulatory scheme wherein environmentally responsive genes, soxR and oxyR, activate fur, modulating the nitrogen metabolism regulators fnr and nac. This regulatory cascade controls the stress regulators ompR and lrhA, ultimately affecting the cell motility gene flhD, unveiling a novel regulatory axis that elucidates the complex regulatory dynamics during the AAT process. Our approach, merging empirical data with prior knowledge, represents a significant advance in modeling cellular signaling processes, offering a deeper understanding of microbial physiology and its applications in biotechnology.
Collapse
Affiliation(s)
- Yigang Chen
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Runbo Mao
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
| | - Jiatong Xu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Yixian Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Jingyi Xu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
| | - Shidong Cui
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Zihao Zhu
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Xiang Ji
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Shenghan Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Yanzhe Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
| | - Hsi-Yuan Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Shih-Chung Yen
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Yang-Chi-Duang Lin
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| | - Hsien-Da Huang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China; (Y.C.); (R.M.); (J.X.); (Y.H.); (J.X.); (S.C.); (Z.Z.); (X.J.); (S.H.); (Y.H.); (H.-Y.H.); (S.-C.Y.)
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China
| |
Collapse
|
4
|
Candia J, Ferrucci L. Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks. PLoS One 2024; 19:e0302696. [PMID: 38753612 PMCID: PMC11098418 DOI: 10.1371/journal.pone.0302696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 04/09/2024] [Indexed: 05/18/2024] Open
Abstract
Pathway enrichment analysis is a ubiquitous computational biology method to interpret a list of genes (typically derived from the association of large-scale omics data with phenotypes of interest) in terms of higher-level, predefined gene sets that share biological function, chromosomal location, or other common features. Among many tools developed so far, Gene Set Enrichment Analysis (GSEA) stands out as one of the pioneering and most widely used methods. Although originally developed for microarray data, GSEA is nowadays extensively utilized for RNA-seq data analysis. Here, we quantitatively assessed the performance of a variety of GSEA modalities and provide guidance in the practical use of GSEA in RNA-seq experiments. We leveraged harmonized RNA-seq datasets available from The Cancer Genome Atlas (TCGA) in combination with large, curated pathway collections from the Molecular Signatures Database to obtain cancer-type-specific target pathway lists across multiple cancer types. We carried out a detailed analysis of GSEA performance using both gene-set and phenotype permutations combined with four different choices for the Kolmogorov-Smirnov enrichment statistic. Based on our benchmarks, we conclude that the classic/unweighted gene-set permutation approach offered comparable or better sensitivity-vs-specificity tradeoffs across cancer types compared with other, more complex and computationally intensive permutation methods. Finally, we analyzed other large cohorts for thyroid cancer and hepatocellular carcinoma. We utilized a new consensus metric, the Enrichment Evidence Score (EES), which showed a remarkable agreement between pathways identified in TCGA and those from other sources, despite differences in cancer etiology. This finding suggests an EES-based strategy to identify a core set of pathways that may be complemented by an expanded set of pathways for downstream exploratory analysis. This work fills the existing gap in current guidelines and benchmarks for the use of GSEA with RNA-seq data and provides a framework to enable detailed benchmarking of other RNA-seq-based pathway analysis tools.
Collapse
Affiliation(s)
- Julián Candia
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| | - Luigi Ferrucci
- Longitudinal Studies Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, United States of America
| |
Collapse
|
5
|
Zheng X, Lim PK, Mutwil M, Wang Y. A method for mining condition-specific co-expressed genes in Camellia sinensis based on k-means clustering. BMC PLANT BIOLOGY 2024; 24:373. [PMID: 38714965 PMCID: PMC11077725 DOI: 10.1186/s12870-024-05086-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 04/30/2024] [Indexed: 05/12/2024]
Abstract
BACKGROUND As one of the world's most important beverage crops, tea plants (Camellia sinensis) are renowned for their unique flavors and numerous beneficial secondary metabolites, attracting researchers to investigate the formation of tea quality. With the increasing availability of transcriptome data on tea plants in public databases, conducting large-scale co-expression analyses has become feasible to meet the demand for functional characterization of tea plant genes. However, as the multidimensional noise increases, larger-scale co-expression analyses are not always effective. Analyzing a subset of samples generated by effectively downsampling and reorganizing the global sample set often leads to more accurate results in co-expression analysis. Meanwhile, global-based co-expression analyses are more likely to overlook condition-specific gene interactions, which may be more important and worthy of exploration and research. RESULTS Here, we employed the k-means clustering method to organize and classify the global samples of tea plants, resulting in clustered samples. Metadata annotations were then performed on these clustered samples to determine the "conditions" represented by each cluster. Subsequently, we conducted gene co-expression network analysis (WGCNA) separately on the global samples and the clustered samples, resulting in global modules and cluster-specific modules. Comparative analyses of global modules and cluster-specific modules have demonstrated that cluster-specific modules exhibit higher accuracy in co-expression analysis. To measure the degree of condition specificity of genes within condition-specific clusters, we introduced the correlation difference value (CDV). By incorporating the CDV into co-expression analyses, we can assess the condition specificity of genes. This approach proved instrumental in identifying a series of high CDV transcription factor encoding genes upregulated during sustained cold treatment in Camellia sinensis leaves and buds, and pinpointing a pair of genes that participate in the antioxidant defense system of tea plants under sustained cold stress. CONCLUSIONS To summarize, downsampling and reorganizing the sample set improved the accuracy of co-expression analysis. Cluster-specific modules were more accurate in capturing condition-specific gene interactions. The introduction of CDV allowed for the assessment of condition specificity in gene co-expression analyses. Using this approach, we identified a series of high CDV transcription factor encoding genes related to sustained cold stress in Camellia sinensis. This study highlights the importance of considering condition specificity in co-expression analysis and provides insights into the regulation of the cold stress in Camellia sinensis.
Collapse
Affiliation(s)
- Xinghai Zheng
- Tea Research Institute, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| | - Peng Ken Lim
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| | - Yuefei Wang
- Tea Research Institute, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
| |
Collapse
|
6
|
Baker BH, Freije S, MacDonald JW, Bammler TK, Benson C, Carroll KN, Enquobahrie DA, Karr CJ, LeWinn KZ, Zhao Q, Bush NR, Sathyanarayana S, Paquette AG. Placental transcriptomic signatures of prenatal and preconceptional maternal stress. Mol Psychiatry 2024; 29:1179-1191. [PMID: 38212375 PMCID: PMC11176062 DOI: 10.1038/s41380-023-02403-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/13/2024]
Abstract
Prenatal exposure to maternal psychological stress is associated with increased risk for adverse birth and child health outcomes. Accumulating evidence suggests that preconceptional maternal stress may also be transmitted intergenerationally to negatively impact offspring. However, understanding of mechanisms linking these exposures to offspring outcomes, particularly those related to placenta, is limited. Using RNA sequencing, we identified placental transcriptomic signatures associated with maternal prenatal stressful life events (SLEs) and childhood traumatic events (CTEs) in 1 029 mother-child pairs in two birth cohorts from Washington state and Memphis, Tennessee. We evaluated individual gene-SLE/CTE associations and performed an ensemble of gene set enrichment analyses combing across 11 popular enrichment methods. Higher number of prenatal SLEs was significantly (FDR < 0.05) associated with increased expression of ADGRG6, a placental tissue-specific gene critical in placental remodeling, and decreased expression of RAB11FIP3, an endocytosis and endocytic recycling gene, and SMYD5, a histone methyltransferase. Prenatal SLEs and maternal CTEs were associated with gene sets related to several biological pathways, including upregulation of protein processing in the endoplasmic reticulum, protein secretion, and ubiquitin mediated proteolysis, and down regulation of ribosome, epithelial mesenchymal transition, DNA repair, MYC targets, and amino acid-related pathways. The directional associations in these pathways corroborate prior non-transcriptomic mechanistic studies of psychological stress and mental health disorders, and have previously been implicated in pregnancy complications and adverse birth outcomes. Accordingly, our findings suggest that maternal exposure to psychosocial stressors during pregnancy as well as the mother's childhood may disrupt placental function, which may ultimately contribute to adverse pregnancy, birth, and child health outcomes.
Collapse
Affiliation(s)
- Brennan H Baker
- University of Washington, Seattle, WA, USA.
- Seattle Children's Research Institute, Seattle, WA, USA.
| | | | | | | | - Ciara Benson
- Global Alliance to Prevent Preterm Birth and Stillbirth (GAPPS), Lynnwood, WA, USA
| | | | | | | | - Kaja Z LeWinn
- University of California San Francisco, San Francisco, CA, USA
| | - Qi Zhao
- University of Tennessee Health Sciences Center, Memphis, TN, USA
| | - Nicole R Bush
- University of California San Francisco, San Francisco, CA, USA
| | - Sheela Sathyanarayana
- University of Washington, Seattle, WA, USA
- Seattle Children's Research Institute, Seattle, WA, USA
| | - Alison G Paquette
- University of Washington, Seattle, WA, USA
- Seattle Children's Research Institute, Seattle, WA, USA
| |
Collapse
|
7
|
Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RPJ, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. PLoS Comput Biol 2024; 20:e1011814. [PMID: 38527092 PMCID: PMC10994553 DOI: 10.1371/journal.pcbi.1011814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 04/04/2024] [Accepted: 03/11/2024] [Indexed: 03/27/2024] Open
Abstract
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. PathIntegrate is available as an open-source Python package.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Clement Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Russell Bowler
- National Jewish Health, Denver, Colorado, United States of America
| | - Fabien Jourdan
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Katerina J. Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Rachel PJ Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
8
|
Chen H, King FJ, Zhou B, Wang Y, Canedy CJ, Hayashi J, Zhong Y, Chang MW, Pache L, Wong JL, Jia Y, Joslin J, Jiang T, Benner C, Chanda SK, Zhou Y. Drug target prediction through deep learning functional representation of gene signatures. Nat Commun 2024; 15:1853. [PMID: 38424040 PMCID: PMC10904399 DOI: 10.1038/s41467-024-46089-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 02/14/2024] [Indexed: 03/02/2024] Open
Abstract
Many machine learning applications in bioinformatics currently rely on matching gene identities when analyzing input gene signatures and fail to take advantage of preexisting knowledge about gene functions. To further enable comparative analysis of OMICS datasets, including target deconvolution and mechanism of action studies, we develop an approach that represents gene signatures projected onto their biological functions, instead of their identities, similar to how the word2vec technique works in natural language processing. We develop the Functional Representation of Gene Signatures (FRoGS) approach by training a deep learning model and demonstrate that its application to the Broad Institute's L1000 datasets results in more effective compound-target predictions than models based on gene identities alone. By integrating additional pharmacological activity data sources, FRoGS significantly increases the number of high-quality compound-target predictions relative to existing approaches, many of which are supported by in silico and/or experimental evidence. These results underscore the general utility of FRoGS in machine learning-based bioinformatics applications. Prediction networks pre-equipped with the knowledge of gene functions may help uncover new relationships among gene signatures acquired by large-scale OMICs studies on compounds, cell types, disease models, and patient cohorts.
Collapse
Affiliation(s)
- Hao Chen
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA.
- Department of Computer Science and Engineering, University of California, Riverside, 900 University Avenue, Riverside, CA, 92521, USA.
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| | - Frederick J King
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
| | - Bin Zhou
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
| | - Yu Wang
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
| | - Carter J Canedy
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
| | - Joel Hayashi
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
| | - Yang Zhong
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
| | - Max W Chang
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Lars Pache
- NCI Designated Cancer Center, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Julian L Wong
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
| | - Yong Jia
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
| | - John Joslin
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, 900 University Avenue, Riverside, CA, 92521, USA
| | - Christopher Benner
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Sumit K Chanda
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, 92037, USA
| | - Yingyao Zhou
- Novartis Biomedical Research, 10675 John Jay Hopkins Drive, San Diego, CA, 92121, USA.
| |
Collapse
|
9
|
Kauffmann J, Esders M, Ruff L, Montavon G, Samek W, Muller KR. From Clustering to Cluster Explanations via Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1926-1940. [PMID: 35797317 DOI: 10.1109/tnnls.2022.3185901] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A recent trend in machine learning has been to enrich learned models with the ability to explain their own predictions. The emerging field of explainable AI (XAI) has so far mainly focused on supervised learning, in particular, deep neural network classifiers. In many practical problems, however, the label information is not given and the goal is instead to discover the underlying structure of the data, for example, its clusters. While powerful methods exist for extracting the cluster structure in data, they typically do not answer the question why a certain data point has been assigned to a given cluster. We propose a new framework that can, for the first time, explain cluster assignments in terms of input features in an efficient and reliable manner. It is based on the novel insight that clustering models can be rewritten as neural networks-or "neuralized." Cluster predictions of the obtained networks can then be quickly and accurately attributed to the input features. Several showcases demonstrate the ability of our method to assess the quality of learned clusters and to extract novel insights from the analyzed data and representations.
Collapse
|
10
|
Venn B, Leifeld T, Zhang P, Mühlhaus T. Temporal classification of short time series data. BMC Bioinformatics 2024; 25:30. [PMID: 38233793 PMCID: PMC10792935 DOI: 10.1186/s12859-024-05636-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 01/03/2024] [Indexed: 01/19/2024] Open
Abstract
MOTIVATION Within the frame of their genetic capacity, organisms are able to modify their molecular state to cope with changing environmental conditions or induced genetic disposition. As high throughput methods are becoming increasingly affordable, time series analysis techniques are applied frequently to study the complex dynamic interplay between genes, proteins, and metabolites at the physiological and molecular level. Common analysis approaches fail to simultaneously include (i) information about the replicate variance and (ii) the limited number of responses/shapes that a biological system is typically able to take. RESULTS We present a novel approach to model and classify short time series signals, conceptually based on a classical time series analysis, where the dependency of the consecutive time points is exploited. Constrained spline regression with automated model selection separates between noise and signal under the assumption that highly frequent changes are less likely to occur, simultaneously preserving information about the detected variance. This enables a more precise representation of the measured information and improves temporal classification in order to identify biologically interpretable correlations among the data. AVAILABILITY AND IMPLEMENTATION An open source F# implementation of the presented method and documentation of its usage is freely available in the TempClass repository, https://github.com/CSBiology/TempClass [58].
Collapse
Affiliation(s)
- Benedikt Venn
- Computational Systems Biology, RPTU Kaiserslautern, 67663, Kaiserslautern, Germany
| | - Thomas Leifeld
- Institute of Automatic Control, RPTU Kaiserslautern, 67663, Kaiserslautern, Germany
| | - Ping Zhang
- Institute of Automatic Control, RPTU Kaiserslautern, 67663, Kaiserslautern, Germany
| | - Timo Mühlhaus
- Computational Systems Biology, RPTU Kaiserslautern, 67663, Kaiserslautern, Germany.
| |
Collapse
|
11
|
Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RP, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574780. [PMID: 38260498 PMCID: PMC10802464 DOI: 10.1101/2024.01.09.574780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. The PathIntegrate Python package is available at https://github.com/cwieder/PathIntegrate.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Clement Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Russell Bowler
- National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA
| | - Fabien Jourdan
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Rachel Pj Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
12
|
Li Z, Song C, Yang J, Jia Z, Chen D, Yan C, Tian L, Wu X. Clustering algorithm based on DINNSM and its application in gene expression data analysis. Technol Health Care 2024; 32:229-239. [PMID: 38759052 PMCID: PMC11191479 DOI: 10.3233/thc-248020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2024]
Abstract
BACKGROUND Selecting an appropriate similarity measurement method is crucial for obtaining biologically meaningful clustering modules. Commonly used measurement methods are insufficient in capturing the complexity of biological systems and fail to accurately represent their intricate interactions. OBJECTIVE This study aimed to obtain biologically meaningful gene modules by using the clustering algorithm based on a similarity measurement method. METHODS A new algorithm called the Dual-Index Nearest Neighbor Similarity Measure (DINNSM) was proposed. This algorithm calculated the similarity matrix between genes using Pearson's or Spearman's correlation. It was then used to construct a nearest-neighbor table based on the similarity matrix. The final similarity matrix was reconstructed using the positions of shared genes in the nearest neighbor table and the number of shared genes. RESULTS Experiments were conducted on five different gene expression datasets and compared with five widely used similarity measurement techniques for gene expression data. The findings demonstrate that when utilizing DINNSM as the similarity measure, the clustering results performed better than using alternative measurement techniques. CONCLUSIONS DINNSM provided more accurate insights into the intricate biological connections among genes, facilitating the identification of more accurate and biological gene co-expression modules.
Collapse
Affiliation(s)
- Zongjin Li
- Department of Computer, Qinghai Normal University, Xining, China
| | - Changxin Song
- Department of Mechanical Engineering and Information, Shanghai Urban Construction Vocational College, Shanghai, China
| | - Jiyu Yang
- Department of Cardiovascular Medicine, Xining First People’s Hospital, Xining, China
| | - Zeyu Jia
- Department of Computer, Qinghai Normal University, Xining, China
| | - Dongzhen Chen
- School of Materials Science and Engineering, Xi’an Polytechnic University, Xi’an, China
| | - Chengying Yan
- Department of Cardiovascular Medicine, Xining First People’s Hospital, Xining, China
| | - Liqin Tian
- Department of Computer, Qinghai Normal University, Xining, China
- School of Computer, North China Institute of Science and Technology, Langfang, China
| | - Xiaoming Wu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
13
|
Roper B, Mathews JC, Nadeem S, Park JH. Vis-SPLIT: Interactive Hierarchical Modeling for mRNA Expression Classification. IEEE VISUALIZATION CONFERENCE : VIS. IEEE CONFERENCE ON VISUALIZATION 2023; 2023:106-110. [PMID: 38881685 PMCID: PMC11179685 DOI: 10.1109/vis54172.2023.00030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
We propose an interactive visual analytics tool, Vis-SPLIT, for partitioning a population of individuals into groups with similar gene signatures. Vis-SPLIT allows users to interactively explore a dataset and exploit visual separations to build a classification model for specific cancers. The visualization components reveal gene expression and correlation to assist specific partitioning decisions, while also providing overviews for the decision model and clustered genetic signatures. We demonstrate the effectiveness of our framework through a case study and evaluate its usability with domain experts. Our results show that Vis-SPLIT can classify patients based on their genetic signatures to effectively gain insights into RNA sequencing data, as compared to an existing classification system.
Collapse
|
14
|
Xue L, Wu Y, Lin Y. Dissecting and improving gene regulatory network inference using single-cell transcriptome data. Genome Res 2023; 33:1609-1621. [PMID: 37580132 PMCID: PMC10620053 DOI: 10.1101/gr.277488.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 08/07/2023] [Indexed: 08/16/2023]
Abstract
Single-cell transcriptome data has been widely used to reconstruct gene regulatory networks (GRNs) controlling critical biological processes such as development and differentiation. Although a growing list of algorithms has been developed to infer GRNs using such data, achieving an inference accuracy consistently higher than random guessing has remained challenging. To address this, it is essential to delineate how the accuracy of regulatory inference is limited. Here, we systematically characterized factors limiting the accuracy of inferred GRNs and demonstrated that using pre-mRNA information can help improve regulatory inference compared to the typically used information (i.e., mature mRNA). Using kinetic modeling and simulated single-cell data sets, we showed that target genes' mature mRNA levels often fail to accurately report upstream regulatory activities because of gene-level and network-level factors, which can be improved by using pre-mRNA levels. We tested this finding on public single-cell RNA-seq data sets using intronic reads as proxies of pre-mRNA levels and can indeed achieve a higher inference accuracy compared to using exonic reads (corresponding to mature mRNAs). Using experimental data sets, we further validated findings from the simulated data sets and identified factors such as transcription factor activity dynamics influencing the accuracy of pre-mRNA-based inference. This work delineates the fundamental limitations of gene regulatory inference and helps improve GRN inference using single-cell RNA-seq data.
Collapse
Affiliation(s)
- Lingfeng Xue
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| | - Yan Wu
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
| | - Yihan Lin
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871;
- The MOE Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing, China, 100871
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China, 100871
| |
Collapse
|
15
|
Kunke M, Knöfler H, Dahlke E, Zanon Rodriguez L, Böttner M, Larionov A, Saudenova M, Ohrenschall GM, Westermann M, Porubsky S, Bernardes JP, Häsler R, Magnin JL, Koepsell H, Jouret F, Theilig F. Targeted deletion of von-Hippel-Lindau in the proximal tubule conditions the kidney against early diabetic kidney disease. Cell Death Dis 2023; 14:562. [PMID: 37626062 PMCID: PMC10457389 DOI: 10.1038/s41419-023-06074-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 08/01/2023] [Accepted: 08/15/2023] [Indexed: 08/27/2023]
Abstract
Diabetic kidney disease (DKD) is the leading cause of end-stage renal disease. Glomerular hyperfiltration and albuminuria subject the proximal tubule (PT) to a subsequent elevation of workload, growth, and hypoxia. Hypoxia plays an ambiguous role in the development and progression of DKD and shall be clarified in our study. PT-von-Hippel-Lindau (Vhl)-deleted mouse model in combination with streptozotocin (STZ)-induced type I diabetes mellitus (DM) was phenotyped. In contrary to PT-Vhl-deleted STZ-induced type 1 DM mice, proteinuria and glomerular hyperfiltration occurred in diabetic control mice the latter due to higher nitric oxide synthase 1 and sodium and glucose transporter expression. PT Vhl deletion and DKD share common alterations in gene expression profiles, including glomerular and tubular morphology, and tubular transport and metabolism. Compared to diabetic control mice, the most significantly altered in PT Vhl-deleted STZ-induced type 1 DM mice were Ldc-1, regulating cellular oxygen consumption rate, and Zbtb16, inhibiting autophagy. Alignment of altered genes in heat maps uncovered that Vhl deletion prior to STZ-induced DM preconditioned the kidney against DKD. HIF-1α stabilization leading to histone modification and chromatin remodeling resets most genes altered upon DKD towards the control level. These data demonstrate that PT HIF-1α stabilization is a hallmark of early DKD and that targeting hypoxia prior to the onset of type 1 DM normalizes renal cell homeostasis and prevents DKD development.
Collapse
Affiliation(s)
- Madlen Kunke
- Institute of Anatomy, Christian Albrechts-University Kiel, Kiel, Germany
| | - Hannah Knöfler
- Institute of Anatomy, Christian Albrechts-University Kiel, Kiel, Germany
| | - Eileen Dahlke
- Institute of Anatomy, Christian Albrechts-University Kiel, Kiel, Germany
| | | | - Martina Böttner
- Institute of Anatomy, Christian Albrechts-University Kiel, Kiel, Germany
| | - Alexey Larionov
- Institute of Anatomy, Department of Medicine, University of Fribourg, Fribourg, Switzerland
| | | | | | | | | | - Joana P Bernardes
- Department of Dermatology and Allergy, University Hospital Schleswig-Holstein, Kiel, Germany
| | - Robert Häsler
- Department of Dermatology and Allergy, University Hospital Schleswig-Holstein, Kiel, Germany
| | | | - Hermann Koepsell
- Institute of Anatomy and Cell Biology, Julius-Maximilians-University of Würzburg, Würzburg, Germany
| | - François Jouret
- Groupe Interdisciplinaire de Génoprotéomique Appliquée (GIGA), Cardiovascular Sciences, University of Liège (ULiège), Liège, Belgium
- Division of Nephrology, CHU of Liège, University of Liège (CHU ULiège), Liège, Belgium
| | - Franziska Theilig
- Institute of Anatomy, Christian Albrechts-University Kiel, Kiel, Germany.
- Institute of Anatomy, Department of Medicine, University of Fribourg, Fribourg, Switzerland.
| |
Collapse
|
16
|
Zage PE, Huo Y, Subramonian D, Le Clorennec C, Ghosh P, Sahoo D. Identification of a novel gene signature for neuroblastoma differentiation using a Boolean implication network. Genes Chromosomes Cancer 2023; 62:313-331. [PMID: 36680522 PMCID: PMC10257350 DOI: 10.1002/gcc.23124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 01/13/2023] [Accepted: 01/16/2023] [Indexed: 01/22/2023] Open
Abstract
Although induction of differentiation represents an effective strategy for neuroblastoma treatment, the mechanisms underlying neuroblastoma differentiation are poorly understood. We generated a computational model of neuroblastoma differentiation consisting of interconnected gene clusters identified based on symmetric and asymmetric gene expression relationships. We identified a differentiation signature consisting of series of gene clusters comprised of 1251 independent genes that predicted neuroblastoma differentiation in independent datasets and in neuroblastoma cell lines treated with agents known to induce differentiation. This differentiation signature was associated with patient outcomes in multiple independent patient cohorts and validated the role of MYCN expression as a marker of neuroblastoma differentiation. Our results further identified novel genes associated with MYCN via asymmetric Boolean implication relationships that would not have been identified using symmetric computational approaches and that were associated with both neuroblastoma differentiation and patient outcomes. Our differentiation signature included a cluster of genes involved in intracellular signaling and growth factor receptor trafficking pathways that is strongly associated with neuroblastoma differentiation, and we validated the associations of UBE4B, a gene within this cluster, with neuroblastoma cell and tumor differentiation. Our findings demonstrate that Boolean network analyses of symmetric and asymmetric gene expression relationships can identify novel genes and pathways relevant for neuroblastoma tumor differentiation that could represent potential therapeutic targets.
Collapse
Affiliation(s)
- Peter E. Zage
- Department of Pediatrics, Division of Hematology-Oncology, University of California San Diego (UCSD), La Jolla, CA
| | - Yuchen Huo
- Department of Pediatrics, Division of Hematology-Oncology, University of California San Diego (UCSD), La Jolla, CA
| | - Divya Subramonian
- Department of Pediatrics, Division of Hematology-Oncology, University of California San Diego (UCSD), La Jolla, CA
| | - Christophe Le Clorennec
- Department of Pediatrics, Division of Hematology-Oncology, University of California San Diego (UCSD), La Jolla, CA
| | - Pradipta Ghosh
- Department of Medicine, UCSD, La Jolla, CA
- Department of Cellular and Molecular Medicine, UCSD, La Jolla, CA
- Veterans Affairs Medical Center, La Jolla, CA
| | - Debashis Sahoo
- Department of Pediatrics, Division of Hematology-Oncology, University of California San Diego (UCSD), La Jolla, CA
- Department of Computer Science and Engineering, Jacobs School of Engineering, UCSD, La Jolla, CA
| |
Collapse
|
17
|
Xiang G, Giardine B, An L, Sun C, Keller CA, Heuston EF, Anderson SM, Kirby M, Bodine D, Zhang Y, Hardison RC. Snapshot: a package for clustering and visualizing epigenetic history during cell differentiation. BMC Bioinformatics 2023; 24:102. [PMID: 36941541 PMCID: PMC10026520 DOI: 10.1186/s12859-023-05223-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 03/07/2023] [Indexed: 03/23/2023] Open
Abstract
BACKGROUND Epigenetic modification of chromatin plays a pivotal role in regulating gene expression during cell differentiation. The scale and complexity of epigenetic data pose significant challenges for biologists to identify the regulatory events controlling cell differentiation. RESULTS To reduce the complexity, we developed a package, called Snapshot, for clustering and visualizing candidate cis-regulatory elements (cCREs) based on their epigenetic signals during cell differentiation. This package first introduces a binarized indexing strategy for clustering the cCREs. It then provides a series of easily interpretable figures for visualizing the signal and epigenetic state patterns of the cCREs clusters during the cell differentiation. It can also use different hierarchies of cell types to highlight the epigenetic history specific to any particular cell lineage. We demonstrate the utility of Snapshot using data from a consortium project for ValIdated Systematic IntegratiON (VISION) of epigenomic data in hematopoiesis. CONCLUSION The package Snapshot can identify all distinct clusters of genomic locations with unique epigenetic signal patterns during cell differentiation. It outperforms other methods in terms of interpreting and reproducing the identified cCREs clusters. The package of Snapshot is available at GitHub: https://github.com/guanjue/Snapshot .
Collapse
Affiliation(s)
- Guanjue Xiang
- The Bioinformatics and Genomics Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA.
| | - Belinda Giardine
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Lin An
- The Bioinformatics and Genomics Program, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Chen Sun
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA
| | - Cheryl A Keller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | | | | | | | - David Bodine
- NHGRI Hematopoiesis Section, GMBB, Bethesda, MD, USA
| | - Yu Zhang
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
18
|
Dillard LR, Glass EM, Lewis AL, Thomas-White K, Papin JA. Metabolic Network Models of the Gardnerella Pangenome Identify Key Interactions with the Vaginal Environment. mSystems 2023; 8:e0068922. [PMID: 36511689 PMCID: PMC9948698 DOI: 10.1128/msystems.00689-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 11/13/2022] [Indexed: 12/15/2022] Open
Abstract
Gardnerella is the primary pathogenic bacterial genus present in the polymicrobial condition known as bacterial vaginosis (BV). Despite BV's high prevalence and associated chronic and acute women's health impacts, the Gardnerella pangenome is largely uncharacterized at both the genetic and functional metabolic levels. Here, we used genome-scale metabolic models to characterize in silico the Gardnerella pangenome metabolic content. We also assessed the metabolic functional capacity in a BV-positive cervicovaginal fluid context. The metabolic capacity varied widely across the pangenome, with 38.15% of all reactions being core to the genus, compared to 49.60% of reactions identified as being unique to a smaller subset of species. We identified 57 essential genes across the pangenome via in silico gene essentiality screens within two simulated vaginal metabolic environments. Four genes, gpsA, fas, suhB, and psd, were identified as core essential genes critical for the metabolic function of all analyzed bacterial species of the Gardnerella genus. Further understanding these core essential metabolic functions could inform novel therapeutic strategies to treat BV. Machine learning applied to simulated metabolic network flux distributions showed limited clustering based on the sample isolation source, which further supports the presence of extensive core metabolic functionality across this genus. These data represent the first metabolic modeling of the Gardnerella pangenome and illustrate strain-specific interactions with the vaginal metabolic environment across the pangenome. IMPORTANCE Bacterial vaginosis (BV) is the most common vaginal infection among reproductive-age women. Despite its prevalence and associated chronic and acute women's health impacts, the diverse bacteria involved in BV infection remain poorly characterized. Gardnerella is the genus of bacteria most commonly and most abundantly represented during BV. In this paper, we use metabolic models, which are a computational representation of the possible functional metabolism of an organism, to investigate metabolic conservation, gene essentiality, and pathway utilization across 110 Gardnerella strains. These models allow us to investigate in silico how strains may differ with respect to their metabolic interactions with the vaginal-host environment.
Collapse
Affiliation(s)
- Lillian R. Dillard
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, USA
| | - Emma M. Glass
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, USA
| | - Amanda L. Lewis
- Department of Obstetrics and Gynecology, University of California—San Diego, La Jolla, California, USA
| | | | - Jason A. Papin
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, USA
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, USA
| |
Collapse
|
19
|
Le Provost G, Lalanne C, Lesur I, Louvet JM, Delzon S, Kremer A, Labadie K, Aury JM, Da Silva C, Moritz T, Plomion C. Oak stands along an elevation gradient have different molecular strategies for regulating bud phenology. BMC PLANT BIOLOGY 2023; 23:108. [PMID: 36814198 PMCID: PMC9948485 DOI: 10.1186/s12870-023-04069-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 01/16/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND Global warming raises serious concerns about the persistence of species and populations locally adapted to their environment, simply because of the shift it produces in their adaptive landscape. For instance, the phenological cycle of tree species may be strongly affected by higher winter temperatures and late frost in spring. Given the variety of ecosystem services they provide, the question of forest tree adaptation has received increasing attention in the scientific community and catalyzed research efforts in ecology, evolutionary biology and functional genomics to study their adaptive capacity to respond to such perturbations. RESULTS In the present study, we used an elevation gradient in the Pyrenees Mountains to explore the gene expression network underlying dormancy regulation in natural populations of sessile oak stands sampled along an elevation cline and potentially adapted to different climatic conditions mainly driven by temperature. By performing analyses of gene expression in terminal buds we identified genes displaying significant dormancy, elevation or dormancy-by-elevation interaction effects. Our Results highlighted that low- and high-altitude populations have evolved different molecular strategies for minimizing late frost damage and maximizing the growth period, thereby increasing potentially their respective fitness in these contrasting environmental conditions. More particularly, population from high elevation overexpressed genes involved in the inhibition of cell elongation and delaying flowering time while genes involved in cell division and flowering, enabling buds to flush earlier were identified in population from low elevation. CONCLUSION Our study made it possible to identify key dormancy-by-elevation responsive genes revealing that the stands analyzed in this study have evolved distinct molecular strategies to adapt their bud phenology in response to temperature.
Collapse
Affiliation(s)
| | | | - Isabelle Lesur
- INRAE, Univ. Bordeaux, BIOGECO, F-33610, Cestas, France
- Helix Venture, F-33700, Mérignac, France
| | | | | | | | - Karine Labadie
- Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Corinne Da Silva
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Thomas Moritz
- Department of Forest Genetics and Plant Physiology, Umeå Plant Science Centre, Swedish University of Agricultural Sciences, 901 87, Umeå, Sweden
| | | |
Collapse
|
20
|
Uncovering the complex genetic architecture of human plasma lipidome using machine learning methods. Sci Rep 2023; 13:3078. [PMID: 36813803 PMCID: PMC9947228 DOI: 10.1038/s41598-023-30168-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 02/16/2023] [Indexed: 02/24/2023] Open
Abstract
Genetic architecture of plasma lipidome provides insights into regulation of lipid metabolism and related diseases. We applied an unsupervised machine learning method, PGMRA, to discover phenotype-genotype many-to-many relations between genotype and plasma lipidome (phenotype) in order to identify the genetic architecture of plasma lipidome profiled from 1,426 Finnish individuals aged 30-45 years. PGMRA involves biclustering genotype and lipidome data independently followed by their inter-domain integration based on hypergeometric tests of the number of shared individuals. Pathway enrichment analysis was performed on the SNP sets to identify their associated biological processes. We identified 93 statistically significant (hypergeometric p-value < 0.01) lipidome-genotype relations. Genotype biclusters in these 93 relations contained 5977 SNPs across 3164 genes. Twenty nine of the 93 relations contained genotype biclusters with more than 50% unique SNPs and participants, thus representing most distinct subgroups. We identified 30 significantly enriched biological processes among the SNPs involved in 21 of these 29 most distinct genotype-lipidome subgroups through which the identified genetic variants can influence and regulate plasma lipid related metabolism and profiles. This study identified 29 distinct genotype-lipidome subgroups in the studied Finnish population that may have distinct disease trajectories and therefore could be useful in precision medicine research.
Collapse
|
21
|
Naldurtiker A, Batchu P, Kouakou B, Terrill TH, McCommon GW, Kannan G. Differential gene expression analysis using RNA-seq in the blood of goats exposed to transportation stress. Sci Rep 2023; 13:1984. [PMID: 36737466 PMCID: PMC9898539 DOI: 10.1038/s41598-023-29224-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 01/31/2023] [Indexed: 02/05/2023] Open
Abstract
Transportation stress causes significant changes in physiological responses in goats; however, studies exploring the transcriptome of stress are very limited. The objective of this study was to determine the differential gene expressions and related pathways in the blood samples using RNA-seq procedure in Spanish goats subjected to different durations of transportation stress. Fifty-four male Spanish goats (8-mo old; BW = 29.7 ± 2.03 kg) were randomly subjected to one of three treatments (TRT; n = 18 goats/treatment): (1) transported for 180 min, (2) transported for 30 min, or (3) held in pens (control). Blood samples were collected before and after treatment for stress hormone, metabolite, and transcriptomic analysis. RNA-seq technology was used to obtain the transcriptome profiles of blood. Analysis of physiological data using SAS showed that plasma cortisol concentrations were higher (P < 0.01) in 180 min and 30 min groups compared to the control group. Enrichment analysis of DEGs related to transportation stress through Gene Ontology and KEGG databases revealed that the differentially expressed genes related to inflammatory pathways, caspases, and apoptosis such as IL1R2, CASP14, CD14, TLR4, and MAPK14 were highly enriched in the transported group of goats compared to non-transported goats. Stress in goats leads to a sequence of events at cellular and molecular levels that causes inflammation and apoptosis.
Collapse
Affiliation(s)
- Aditya Naldurtiker
- Agricultural Research Station, Fort Valley State University, 1005 State University Drive, Fort Valley, GA, 31030, USA
| | - Phaneendra Batchu
- Agricultural Research Station, Fort Valley State University, 1005 State University Drive, Fort Valley, GA, 31030, USA
| | - Brou Kouakou
- Agricultural Research Station, Fort Valley State University, 1005 State University Drive, Fort Valley, GA, 31030, USA
| | - Thomas H Terrill
- Agricultural Research Station, Fort Valley State University, 1005 State University Drive, Fort Valley, GA, 31030, USA
| | - George W McCommon
- Agricultural Research Station, Fort Valley State University, 1005 State University Drive, Fort Valley, GA, 31030, USA
| | - Govind Kannan
- Agricultural Research Station, Fort Valley State University, 1005 State University Drive, Fort Valley, GA, 31030, USA.
| |
Collapse
|
22
|
Zhu W, Ding M, Chang J, Liao H, Xiao G, Wang Q. A 9-gene prognostic signature for kidney renal clear cell carcinoma overall survival based on co-expression and regression analyses. Chem Biol Drug Des 2023; 101:422-437. [PMID: 36053927 DOI: 10.1111/cbdd.14141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 08/10/2022] [Accepted: 08/30/2022] [Indexed: 01/18/2023]
Abstract
This research attempted to screen potential signatures associated with KIRC progression and overall survival by weighted gene co-expression network analysis (WGCNA) and Cox regression. The KIRC-associated mRNA expression and clinical data were accessed from The Cancer Genome Atlas (TCGA) database. Differentially expressed genes (DEGs) were screened by differential analysis. A co-expression network was constructed by "WGCNA". Based on WGCNA module, GO and KEGG analyses were performed. Protein-protein interaction (PPI) network was constructed. Prognostic signatures were screened by Lasso-Cox regression. Prognostic model was evaluated by Receiver Operating Characteristic (ROC) and Kaplan-Meier (K-M) curves. Multivariate Cox and nomogram were introduced to examine whether risk score could be an independent marker. qRT-PCR was introduced to determine expression of 9 hub genes in KIRC clinical tumor tissues and adjacent tissues, respectively. Genes in the green module were highly associated with clinical status, and green module genes were significantly enriched in mitotic nuclear division, cell cycle, and p53 signaling pathway. Twenty-six candidates were subsequently screened out from the green module. Next, a 9-gene prognostic model (DLGAP5, NUF2, TOP2A, RRM2, HJURP, PLK1, AURKB, KIF18A, CCNB2) was constructed. The predicting ability of the model was optimal. Some cancer-related signaling pathways were differently activated between two risk score groups. Additionally, under-expression of some signature genes (AURKB, CCNB2, PLK1, RRM2, TOP2A) was associated with better survival rate for KIRC patients. Meanwhile, all 9 hub genes were substantially overexpressed in KIRC patients. A KIRC prognostic signature was screened in this study, contributing valuable findings to KIRC biomarker development.
Collapse
Affiliation(s)
- Wenwen Zhu
- Department of Oncology, the Affiliated Hospital of Shaoxing University, Zhejiang, China
| | - Mengyu Ding
- Department of Oncology, the Affiliated Hospital of Shaoxing University, Zhejiang, China
| | - Jian Chang
- Department of Oncology, the Affiliated Hospital of Shaoxing University, Zhejiang, China
| | - Hui Liao
- Department of Oncology, the Affiliated Hospital of Shaoxing University, Zhejiang, China
| | - Geqiong Xiao
- Department of Oncology, the Affiliated Hospital of Shaoxing University, Zhejiang, China
| | - Qiong Wang
- Department of Oncology, the Affiliated Hospital of Shaoxing University, Zhejiang, China
| |
Collapse
|
23
|
Secher T, Couturier A, Huot L, Bouscayrol H, Grandjean T, Boulard O, Hot D, Ryffel B, Chamaillard M. A Protective Role of NOD2 on Oxazolone-induced Intestinal Inflammation Through IL-1β-mediated Signalling Pathway. J Crohns Colitis 2023; 17:111-122. [PMID: 35917251 DOI: 10.1093/ecco-jcc/jjac106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Indexed: 01/31/2023]
Abstract
BACKGROUND AND AIMS NOD2 has emerged as a critical player in the induction of both Th1 and Th2 responses for potentiation and polarisation of antigen-dependent immunity. Loss-of-function mutations in the NOD2-encoding gene and deregulation of its downstream signalling pathway have been linked to Crohn's disease. Although it is well documented that NOD2 is capable of sensing bacterial muramyl dipeptide, it remains counter-intuitive to link development of overt intestinal inflammation to a loss of bacterial-induced inflammatory response. We hypothesised that a T helper bias could also contribute to an autoimmune-like colitis different from inflammation that is fully fledged by Th1 type cells. METHODS An oedematous bowel wall with a mixed Th1/Th2 response was induced in mice by intrarectal instillation of the haptenating agent oxazolone. Survival and clinical scoring were evaluated. At several time points after instillation, colonic damage was assessed by macroscopic and microscopic observations. To evaluate the involvement of NOD2 in immunochemical phenomena, quantitative polymerase chain reaction [PCR] and flow cytometry analysis were performed. Bone marrow chimera experimentation allowed us to evaluate the role of haematopoietic/non-hematopoietic NOD2-expressing cells. RESULTS Herein, we identified a key regulatory circuit whereby NOD2-mediated sensing of a muramyl dipeptide [MDP] by radio-resistant cells improves colitis with a mixed Th1/Th2 response that is induced by oxazolone. Genetic ablation of either Nod2 or Ripk2 precipitated oxazolone colitis that is predominantly linked to a lack of interferon-gamma. Bone marrow chimera experiments revealed that inactivation of Nod2 signalling in non-haematopoietic cells is causing a biased M1-M2 polarisation of macrophages and a decreased frequency of splenic regulatory T cells that correlates with an impaired activation of CD4 + T cells within mesenteric lymph nodes. Mechanistically, mice were protected from oxazolone-induced colitis upon administration of MDP in an interleukin-1- and interleukin-23-dependent manner. CONCLUSIONS These findings indicate that Nod2 signalling may prevent pathological conversion of T helper cells for maintenance of tissue homeostasis.
Collapse
Affiliation(s)
- Thomas Secher
- INEM, Orléans University, CNRS UMR 7355, F-45071, Orléans, France.,CEPR, Tours University, INSERM U1100, F-37000, Tours, France
| | | | - Ludovic Huot
- Centre d'Infection et d'Immunité de Lille, Université de Lille, CNRS, Inserm, CHRU Lille, Institut Pasteur de Lille, U1019-UMR 9017, F-59000, Lille, France
| | - Helene Bouscayrol
- Service d'oncologie-radiothérapie, CHR d'Orléans-La Source, Orléans, France
| | - Teddy Grandjean
- Centre d'Infection et d'Immunité de Lille, Université de Lille, CNRS, Inserm, CHRU Lille, Institut Pasteur de Lille, U1019-UMR 9017, F-59000, Lille, France
| | - Olivier Boulard
- Laboratory of Cell Physiology, Inserm U1003, University of Lille, Lille, France
| | - David Hot
- CEPR, Tours University, INSERM U1100, F-37000, Tours, France.,University of Lille, CNRS, Inserm, CHU Lille, Institut Pasteur de Lille, US 41-UAR 2014-PLBS, F-59000 Lille, France
| | - Bernhard Ryffel
- INEM, Orléans University, CNRS UMR 7355, F-45071, Orléans, France
| | - Mathias Chamaillard
- Laboratory of Cell Physiology, Inserm U1003, University of Lille, Lille, France
| |
Collapse
|
24
|
Avcu E, Newman O, Ahlfors SP, Gow DW. Neural evidence suggests phonological acceptability judgments reflect similarity, not constraint evaluation. Cognition 2023; 230:105322. [PMID: 36370613 PMCID: PMC9712273 DOI: 10.1016/j.cognition.2022.105322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 10/24/2022] [Accepted: 11/01/2022] [Indexed: 11/11/2022]
Abstract
Acceptability judgments are a primary source of evidence in formal linguistic research. Within the generative linguistic tradition, these judgments are attributed to evaluation of novel forms based on implicit knowledge of rules or constraints governing well-formedness. In the domain of phonological acceptability judgments, other factors including ease of articulation and similarity to known forms have been hypothesized to influence evaluation. We used data-driven neural techniques to identify the relative contributions of these factors. Granger causality analysis of magnetic resonance imaging (MRI)-constrained magnetoencephalography (MEG) and electroencephalography (EEG) data revealed patterns of interaction between brain regions that support explicit judgments of the phonological acceptability of spoken nonwords. Comparisons of data obtained with nonwords that varied in terms of onset consonant cluster attestation and acceptability revealed different cortical regions and effective connectivity patterns associated with phonological acceptability judgments. Attested forms produced stronger influences of brain regions implicated in lexical representation and sensorimotor simulation on acoustic-phonetic regions, whereas unattested forms produced stronger influence of phonological control mechanisms on acoustic-phonetic processing. Unacceptable forms produced widespread patterns of interaction consistent with attempted search or repair. Together, these results suggest that speakers' phonological acceptability judgments reflect lexical and sensorimotor factors.
Collapse
Affiliation(s)
- Enes Avcu
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States of America.
| | - Olivia Newman
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States of America
| | - Seppo P Ahlfors
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States of America; Department of Radiology, Harvard Medical School, Boston, MA, United States of America
| | - David W Gow
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States of America; Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, United States of America; Department of Psychology, Salem State University, Salem, MA, United States of America; Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, United States of America
| |
Collapse
|
25
|
Gow DW, Avcu E, Schoenhaut A, Sorensen DO, Ahlfors SP. Abstract representations in temporal cortex support generative linguistic processing. LANGUAGE, COGNITION AND NEUROSCIENCE 2022; 38:765-778. [PMID: 37332658 PMCID: PMC10270390 DOI: 10.1080/23273798.2022.2157029] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 11/21/2022] [Indexed: 06/20/2023]
Abstract
Generativity, the ability to create and evaluate novel constructions, is a fundamental property of human language and cognition. The productivity of generative processes is determined by the scope of the representations they engage. Here we examine the neural representation of reduplication, a productive phonological process that can create novel forms through patterned syllable copying (e.g. ba-mih → ba-ba-mih, ba-mih-mih, or ba-mih-ba). Using MRI-constrained source estimates of combined MEG/EEG data collected during an auditory artificial grammar task, we identified localized cortical activity associated with syllable reduplication pattern contrasts in novel trisyllabic nonwords. Neural decoding analyses identified a set of predominantly right hemisphere temporal lobe regions whose activity reliably discriminated reduplication patterns evoked by untrained, novel stimuli. Effective connectivity analyses suggested that sensitivity to abstracted reduplication patterns was propagated between these temporal regions. These results suggest that localized temporal lobe activity patterns function as abstract representations that support linguistic generativity.
Collapse
Affiliation(s)
- David W. Gow
- Department of Neurology Massachusetts General Hospital and Harvard Medical School; Boston, MA, 02114
- Department of Psychology, Salem State University; Salem, MA, 01970
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital; Charlestown, MA, 02129
- Program in Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard Medical School; Boston, MA 02115
| | - Enes Avcu
- Department of Neurology Massachusetts General Hospital and Harvard Medical School; Boston, MA, 02114
| | - Adriana Schoenhaut
- Department of Neurology Massachusetts General Hospital and Harvard Medical School; Boston, MA, 02114
| | - David O. Sorensen
- Program in Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard Medical School; Boston, MA 02115
| | - Seppo P. Ahlfors
- Program in Speech and Hearing Bioscience and Technology, Division of Medical Sciences, Harvard Medical School; Boston, MA 02115
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School; Boston, MA, 02114
| |
Collapse
|
26
|
Abstract
The idea behind novel single-cell RNA sequencing (scRNA-seq) pipelines is to isolate single cells through microfluidic approaches and generate sequencing libraries in which the transcripts are tagged to track their cell of origin. Modern scRNA-seq platforms are capable of analyzing up to many thousands of cells in each run. Then, combined with massive high-throughput sequencing producing billions of reads, scRNA-seq allows the assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution.In this chapter, we describe how cell subpopulation discovery algorithms, integrated into rCASC, could be efficiently executed on cloud-HPC infrastructure. To achieve this task, we focus on the StreamFlow framework which provides container-native runtime support for scientific workflows in cloud/HPC environments.
Collapse
|
27
|
Xiao H, Ma Y, Zhou Z, Li X, Ding K, Wu Y, Wu T, Chen D. Disease patterns of coronary heart disease and type 2 diabetes harbored distinct and shared genetic architecture. Cardiovasc Diabetol 2022; 21:276. [PMID: 36494812 PMCID: PMC9738029 DOI: 10.1186/s12933-022-01715-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 12/02/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Coronary heart disease (CHD) and type 2 diabetes (T2D) are two complex diseases with complex interrelationships. However, the genetic architecture of the two diseases is often studied independently by the individual single-nucleotide polymorphism (SNP) approach. Here, we presented a genotypic-phenotypic framework for deciphering the genetic architecture underlying the disease patterns of CHD and T2D. METHOD A data-driven SNP-set approach was performed in a genome-wide association study consisting of subpopulations with different disease patterns of CHD and T2D (comorbidity, CHD without T2D, T2D without CHD and all none). We applied nonsmooth nonnegative matrix factorization (nsNMF) clustering to generate SNP sets interacting the information of SNP and subject. Relationships between SNP sets and phenotype sets harboring different disease patterns were then assessed, and we further co-clustered the SNP sets into a genetic network to topologically elucidate the genetic architecture composed of SNP sets. RESULTS We identified 23 non-identical SNP sets with significant association with CHD or T2D (SNP-set based association test, P < 3.70 × [Formula: see text]). Among them, disease patterns involving CHD and T2D were related to distinct SNP sets (Hypergeometric test, P < 2.17 × [Formula: see text]). Accordingly, numerous genes (e.g., KLKs, GRM8, SHANK2) and pathways (e.g., fatty acid metabolism) were diversely implicated in different subtypes and related pathophysiological processes. Finally, we showed that the genetic architecture for disease patterns of CHD and T2D was composed of disjoint genetic networks (heterogeneity), with common genes contributing to it (pleiotropy). CONCLUSION The SNP-set approach deciphered the complexity of both genotype and phenotype as well as their complex relationships. Different disease patterns of CHD and T2D share distinct genetic architectures, for which lipid metabolism related to fibrosis may be an atherogenic pathway that is specifically activated by diabetes. Our findings provide new insights for exploring new biological pathways.
Collapse
Affiliation(s)
- Han Xiao
- grid.11135.370000 0001 2256 9319Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191 China
| | - Yujia Ma
- grid.11135.370000 0001 2256 9319Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191 China
| | - Zechen Zhou
- grid.11135.370000 0001 2256 9319Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191 China
| | - Xiaoyi Li
- grid.11135.370000 0001 2256 9319Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191 China
| | - Kexin Ding
- grid.11135.370000 0001 2256 9319Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191 China
| | - Yiqun Wu
- grid.11135.370000 0001 2256 9319Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191 China
| | - Tao Wu
- grid.11135.370000 0001 2256 9319Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191 China
| | - Dafang Chen
- grid.11135.370000 0001 2256 9319Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, 100191 China
| |
Collapse
|
28
|
Wieder C, Lai RPJ, Ebbels TMD. Single sample pathway analysis in metabolomics: performance evaluation and application. BMC Bioinformatics 2022; 23:481. [PMID: 36376837 PMCID: PMC9664704 DOI: 10.1186/s12859-022-05005-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/25/2022] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Single sample pathway analysis (ssPA) transforms molecular level omics data to the pathway level, enabling the discovery of patient-specific pathway signatures. Compared to conventional pathway analysis, ssPA overcomes the limitations by enabling multi-group comparisons, alongside facilitating numerous downstream analyses such as pathway-based machine learning. While in transcriptomics ssPA is a widely used technique, there is little literature evaluating its suitability for metabolomics. Here we provide a benchmark of established ssPA methods (ssGSEA, GSVA, SVD (PLAGE), and z-score) alongside the evaluation of two novel methods we propose: ssClustPA and kPCA, using semi-synthetic metabolomics data. We then demonstrate how ssPA can facilitate pathway-based interpretation of metabolomics data by performing a case-study on inflammatory bowel disease mass spectrometry data, using clustering to determine subtype-specific pathway signatures. RESULTS While GSEA-based and z-score methods outperformed the others in terms of recall, clustering/dimensionality reduction-based methods provided higher precision at moderate-to-high effect sizes. A case study applying ssPA to inflammatory bowel disease data demonstrates how these methods yield a much richer depth of interpretation than conventional approaches, for example by clustering pathway scores to visualise a pathway-based patient subtype-specific correlation network. We also developed the sspa python package (freely available at https://pypi.org/project/sspa/ ), providing implementations of all the methods benchmarked in this study. CONCLUSION This work underscores the value ssPA methods can add to metabolomic studies and provides a useful reference for those wishing to apply ssPA methods to metabolomics data.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, UK
| | - Rachel P J Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, UK
| | - Timothy M D Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
29
|
Bhandari N, Walambe R, Kotecha K, Khare SP. A comprehensive survey on computational learning methods for analysis of gene expression data. Front Mol Biosci 2022; 9:907150. [PMID: 36458095 PMCID: PMC9706412 DOI: 10.3389/fmolb.2022.907150] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 09/28/2022] [Indexed: 09/19/2023] Open
Abstract
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
Collapse
Affiliation(s)
- Nikita Bhandari
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
| | - Rahee Walambe
- Electronics and Telecommunication Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Ketan Kotecha
- Computer Science Department, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Center for Applied AI (SCAAI), Symbiosis International (Deemed University), Pune, India
| | - Satyajeet P. Khare
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, India
| |
Collapse
|
30
|
Molecular diversity and phenotypic pleiotropy of ancient genomic regulatory loci derived from human endogenous retrovirus type H (HERVH) promoter LTR7 and HERVK promoter LTR5_Hs and their contemporary impacts on pathophysiology of Modern Humans. Mol Genet Genomics 2022; 297:1711-1740. [PMID: 36121513 PMCID: PMC9483895 DOI: 10.1007/s00438-022-01954-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Accepted: 09/09/2022] [Indexed: 11/21/2022]
Abstract
Timelines of population-level effects of viruses on humans varied from the evolutionary scale of million years to contemporary spread of viral infections. Correspondingly, these events are exemplified by: (i) emergence of human endogenous retroviruses (HERVs) from ancient germline infections leading to stable integration of viral genomes into human chromosomes; and (ii) wide-spread viral infections reaching a global pandemic state such as the COVID-19 pandemic. Despite significant efforts, understanding of HERV’s roles in governance of genomic regulatory networks, their impacts on primate evolution and development of human-specific physiological and pathological phenotypic traits remains limited. Remarkably, present analyses revealed that expression of a dominant majority of genes (1696 of 1944 genes; 87%) constituting high-confidence down-steam regulatory targets of defined HERV loci was significantly altered in cells infected with the SARS-CoV-2 coronavirus, a pathogen causing the global COVID-19 pandemic. This study focused on defined sub-sets of DNA sequences derived from HERVs that are expressed at specific stages of human preimplantation embryogenesis and exert regulatory actions essential for self-renewal and pluripotency. Evolutionary histories of LTR7/HERVH and LTR5_Hs/HERVK were charted based on evidence of the earliest presence and expansion of highly conserved (HC) LTR sequences. Sequence conservation analyses of most recent releases 17 primate species’ genomes revealed that LTR7/HERVH have entered germlines of primates in Africa after the separation of the New World Monkey lineage, while LTR5_Hs/HERVK successfully colonized primates’ germlines after the segregation of Gibbons’ species. Subsequently, both LTR7 and LTR5_Hs undergo a marked ~ fourfold–fivefold expansion in genomes of Great Apes. Timelines of quantitative expansion of both LTR7 and LTR5_Hs loci during evolution of Great Apes appear to replicate the consensus evolutionary sequence of increasing cognitive and behavioral complexities of non-human primates, which seems particularly striking for LTR7 loci and 11 distinct LTR7 subfamilies. Consistent with previous reports, identified in this study, 351 human-specific (HS) insertions of LTR7 (175 loci) and LTR5_Hs (176 loci) regulatory sequences have been linked to genes implicated in establishment and maintenance of naïve and primed pluripotent states and preimplantation embryogenesis phenotypes. Unexpectedly, HS-LTRs manifest regulatory connectivity to genes encoding markers of 12 distinct cells’ populations of fetal gonads, as well as genes implicated in physiology and pathology of human spermatogenesis, including Y-linked spermatogenic failure, oligo- and azoospermia. Granular interrogations of genes linked with 11 distinct LTR7 subfamilies revealed that mammalian offspring survival (MOS) genes seem to remain one of consistent regulatory targets throughout ~ 30 MYA of the divergent evolution of LTR7 loci. Differential GSEA of MOS versus non-MOS genes identified clearly discernable dominant enrichment patterns of phenotypic traits affected by MOS genes linked with LTR7 (562 MOS genes) and LTR5_Hs (126 MOS genes) regulatory loci across the large panel of genomics and proteomics databases reflecting a broad spectrum of human physiological and pathological traits. GSEA of LTR7-linked MOS genes identified more than 2200 significantly enriched records of human common and rare diseases and gene signatures of 466 significantly enriched records of Human Phenotype Ontology traits, including Autosomal Dominant (92 genes) and Autosomal Recessive (93 genes) Inheritance. LTR7 regulatory elements appear linked with genes implicated in functional and morphological features of central nervous system, including synaptic transmission and protein–protein interactions at synapses, as well as gene signatures differentially regulated in cells of distinct neurodevelopmental stages and morphologically diverse cell types residing and functioning in human brain. These include Neural Stem/Precursor cells, Radial Glia cells, Bergman Glia cells, Pyramidal cells, Tanycytes, Immature neurons, Interneurons, Trigeminal neurons, GABAergic neurons, and Glutamatergic neurons. GSEA of LTR7-linked genes identified significantly enriched gene sets encoding markers of more than 80 specialized types of neurons and markers of 521 human brain regions, most prominently, subiculum and dentate gyrus. Identification and characterization of 1944 genes comprising high-confidence down-steam regulatory targets of LTR7 and/or LTR5_Hs loci validated and extended these observations by documenting marked enrichments for genes implicated in neoplasm metastasis, intellectual disability, autism, multiple cancer types, Alzheimer’s, schizophrenia, and other brain disorders. Overall, genes representing down-stream regulatory targets of ancient retroviral LTRs exert the apparently cooperative and exceedingly broad phenotypic impacts on human physiology and pathology. This is exemplified by altered expression of 93% high-confidence LTR targets in cells infected by contemporary viruses, revealing a convergence of virus-inflicted aberrations on genomic regulatory circuitry governed by ancient retroviral LTR elements and interference with human cells’ differentiation programs.
Collapse
|
31
|
Ogasahara T, Kouzai Y, Watanabe M, Takahashi A, Takahagi K, Kim JS, Matsui H, Yamamoto M, Toyoda K, Ichinose Y, Mochida K, Noutoshi Y. Time-series transcriptome of Brachypodium distachyon during bacterial flagellin-induced pattern-triggered immunity. FRONTIERS IN PLANT SCIENCE 2022; 13:1004184. [PMID: 36186055 PMCID: PMC9521188 DOI: 10.3389/fpls.2022.1004184] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 09/01/2022] [Indexed: 05/30/2023]
Abstract
Plants protect themselves from microorganisms by inducing pattern-triggered immunity (PTI) via recognizing microbe-associated molecular patterns (MAMPs), conserved across many microbes. Although the MAMP perception mechanism and initial events during PTI have been well-characterized, knowledge of the transcriptomic changes in plants, especially monocots, is limited during the intermediate and terminal stages of PTI. Here, we report a time-series high-resolution RNA-sequencing (RNA-seq) analysis during PTI in the leaf disks of Brachypodium distachyon. We identified 6,039 differentially expressed genes (DEGs) in leaves sampled at 0, 0.5, 1, 3, 6, and 12 hours after treatment (hat) with the bacterial flagellin peptide flg22. The k-means clustering method classified these DEGs into 10 clusters (6 upregulated and 4 downregulated). Based on the results, we selected 10 PTI marker genes in B. distachyon. Gene ontology (GO) analysis suggested a tradeoff between defense responses and photosynthesis during PTI. The data indicated the recovery of photosynthesis started at least at 12 hat. Over-representation analysis of transcription factor genes and cis-regulatory elements in DEG promoters implied the contribution of 12 WRKY transcription factors in plant defense at the early stage of PTI induction.
Collapse
Affiliation(s)
- Tsubasa Ogasahara
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Yusuke Kouzai
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
- Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Megumi Watanabe
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Akihiro Takahashi
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Kotaro Takahagi
- Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan
| | - June-Sik Kim
- Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Hidenori Matsui
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Mikihiro Yamamoto
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Kazuhiro Toyoda
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Yuki Ichinose
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Keiichi Mochida
- Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
- Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan
- School of Information and Data Sciences, Nagasaki University, Nagasaki, Japan
| | - Yoshiteru Noutoshi
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| |
Collapse
|
32
|
Gillard J, O’Riordan E, Zhigljavsky A. Polynomial whitening for high-dimensional data. Comput Stat 2022. [DOI: 10.1007/s00180-022-01277-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
AbstractThe inverse square root of a covariance matrix is often desirable for performing data whitening in the process of applying many common multivariate data analysis methods. Direct calculation of the inverse square root is not available when the covariance matrix is either singular or nearly singular, as often occurs in high dimensions. We develop new methods, which we broadly call polynomial whitening, to construct a low-degree polynomial in the empirical covariance matrix which has similar properties to the true inverse square root of the covariance matrix (should it exist). Our method does not suffer in singular or near-singular settings, and is computationally tractable in high dimensions. We demonstrate that our construction of low-degree polynomials provides a good substitute for high-dimensional inverse square root covariance matrices, in both $$d < N$$
d
<
N
and $$d \ge N$$
d
≥
N
cases. We offer examples on data whitening, outlier detection and principal component analysis to demonstrate the performance of the proposed method.
Collapse
|
33
|
Comparative biomarker analysis of PALOMA-2/3 trials for palbociclib. NPJ Precis Oncol 2022; 6:56. [PMID: 35974168 PMCID: PMC9381541 DOI: 10.1038/s41698-022-00297-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 06/20/2022] [Indexed: 11/08/2022] Open
Abstract
While cyclin-dependent kinase 4/6 (CDK4/6) inhibitors, including palbociclib, combined with endocrine therapy (ET), are becoming the standard-of-care for hormone receptor-positive/human epidermal growth factor receptor 2‒negative metastatic breast cancer, further mechanistic insights are needed to maximize benefit from the treatment regimen. Herein, we conducted a systematic comparative analysis of gene expression/progression-free survival relationship from two phase 3 trials (PALOMA-2 [first-line] and PALOMA-3 [≥second-line]). In the ET-only arm, there was no inter-therapy line correlation. However, adding palbociclib resulted in concordant biomarkers independent of initial ET responsiveness, with shared sensitivity genes enriched in estrogen response and resistance genes over-represented by mTORC1 signaling and G2/M checkpoint. Biomarker patterns from the combination arm resembled patterns observed in ET in advanced treatment-naive patients, especially patients likely to be endocrine-responsive. Our findings suggest palbociclib may recondition endocrine-resistant tumors to ET, and may guide optimal therapeutic sequencing by partnering CDK4/6 inhibitors with different ETs. Pfizer (NCT01740427; NCT01942135).
Collapse
|
34
|
Gupta SR. Prediction time of breast cancer tumor recurrence using Machine Learning. Cancer Treat Res Commun 2022; 32:100602. [PMID: 35797887 DOI: 10.1016/j.ctarc.2022.100602] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/01/2022] [Accepted: 06/28/2022] [Indexed: 06/15/2023]
Abstract
An in-depth study using the database from GLOBOCAN, CDC, and WHO health repository highlights the lethality of breast cancer, taking thousands of lives each year. However, a timely prediction of cancer can help patients to consult the doctor on time. In the past, various studies have successfully predicted the nature of the tumor to be benign or malignant and if the breast cancer tumor will reoccur or not but, no time-based models have been studied. With the help of Machine Learning, this study shows various prediction models that can be used to predict tumor reoccurrence time as accurately as 1 year. Among the 198 patients analyzed, 40% of the total patients were predicted to have breast cancer tumors reoccurring within 1st year of the diagnosis. The proposed machine learning techniques use various classification models such as Spectral clustering, DBSCAN, and k-means along with prediction models like Support Vector Machines (SVM), Decision trees, and Random Forest. The results demonstrate the ability of the model to predict the time taken by the tumor to reoccur or the time taken by the patient for full recovery with the best accuracy of 78.7% using SVM. This population-based study performed on multivariate real attributed characteristics data can therefore provide the patients a reasonable estimate about their recovery time or the time before which they should consult the doctor.
Collapse
Affiliation(s)
- Siddharth Raj Gupta
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA; Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.
| |
Collapse
|
35
|
Sheng W, Wang X, Wang Z, Li Q, Zheng Y, Chen S. A Differential Evolution Algorithm With Adaptive Niching and K-Means Operation for Data Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6181-6195. [PMID: 33284774 DOI: 10.1109/tcyb.2020.3035887] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Clustering, as an important part of data mining, is inherently a challenging problem. This article proposes a differential evolution algorithm with adaptive niching and k -means operation (denoted as DE_ANS_AKO) for partitional data clustering. Within the proposed algorithm, an adaptive niching scheme, which can dynamically adjust the size of each niche in the population, is devised and integrated to prevent premature convergence of evolutionary search, thus appropriately searching the space to identify the optimal or near-optimal solution. Furthermore, to improve the search efficiency, an adaptive k -means operation has been designed and employed at the niche level of population. The performance of the proposed algorithm has been evaluated on synthetic as well as real datasets and compared with related methods. The experimental results reveal that the proposed algorithm is able to reliably and efficiently deliver high quality clustering solutions and generally outperforms related methods implemented for comparisons.
Collapse
|
36
|
The genomic landscape of cholangiocarcinoma reveals the disruption of post-transcriptional modifiers. Nat Commun 2022; 13:3061. [PMID: 35650238 PMCID: PMC9160072 DOI: 10.1038/s41467-022-30708-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 05/12/2022] [Indexed: 11/09/2022] Open
Abstract
Molecular variation between geographical populations and subtypes indicate potential genomic heterogeneity and novel genomic features within CCA. Here, we analyze exome-sequencing data of 87 perihilar cholangiocarcinoma (pCCA) and 261 intrahepatic cholangiocarcinoma (iCCA) cases from 3 Asian centers (including 43 pCCAs and 24 iCCAs from our center). iCCA tumours demonstrate a higher tumor mutation burden and copy number alteration burden (CNAB) than pCCA tumours, and high CNAB indicates a poorer pCCA prognosis. We identify 12 significantly mutated genes and 5 focal CNA regions, and demonstrate common mutations in post-transcriptional modification-related potential driver genes METTL14 and RBM10 in pCCA tumours. Finally we demonstrate the tumour-suppressive role of METTL14, a major RNA N6-adenosine methyltransferase (m6A), and illustrate that its loss-of-function mutation R298H may act through m6A modification on potential driver gene MACF1. Our results may be valuable for better understanding of how post-transcriptional modification can affect CCA development, and highlight both similarities and differences between pCCA and iCCA. Cholangiocarcinoma is a heterogenous group of cancers, with large genetic variation seen within subtypes. Here, the authors find 12 significantly mutated genes and 5 focal CNA regions were found in perihilar cholangiocarcinoma, and identified METTL14 to have a potential tumour suppressive role.
Collapse
|
37
|
Akçay S, Güven E, Afzal M, Kazmi I. Non-negative matrix factorization and differential expression analyses identify hub genes linked to progression and prognosis of glioblastoma multiforme. Gene 2022; 824:146395. [PMID: 35283227 DOI: 10.1016/j.gene.2022.146395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 02/10/2022] [Accepted: 03/04/2022] [Indexed: 12/25/2022]
Abstract
One of the most prevailing primary brain tumors in adult human male is glioblastoma multiforme (GBM), which is categorized by rapid cellular growth. Even though the combination therapy comprises surgery, chemotherapy, and adjuvant therapies, the survival rate, on average, is 14.6 months. Glioma stem cells (GSCs) have key roles in tumorigenesis, progression, and defiance against chemotherapy and radiotherapy. In our study, firstly, the gene expression dataset GSE124145 was retrieved; the non-negative matrix factorization (NMF) method was applied on GBM dataset, and differentially expressed genes analysis (DEGs) was performed. After which, overlapping genes between metagenes and DEGs were detected to examine the Gene Ontology (GO) categories in the biological process (BP) in the stemness of GBM. The common hub genes were used to construct protein-protein interaction (PPI) network and further GO, while Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway was utilized to pinpoint the real hub genes. The analysis of hub genes particular for the same GO categories demonstrated that specific hub genes triggered distinct features of the same biological processes. After utilizing GSE124145 and The Cancer Genome Atlas (TCGA) dataset for survival analysis, we screened five real hub genes: GUCA1A, RFC2, GNG11, MMP19, and NRG1, which are strongly associated with the progression and prognosis of GBM. The DEGs analysis revealed that all real hub genes were overexpressed in GBM and TCGA datasets, which further validates our results. The constructed study of PPI, GO, and KEGG pathway on common hub genes was performed. Finally, the KEGG pathways performed on the top 15 candidate hub genes (including six real hub genes) of the PPI network in the GBM gene expression dataset study found mitogen-activated protein kinase (Mapk) signaling pathway to be the most significant pathway. The rest of the hub genes reviewed throughout the analysis might be favorable targets for diagnosing and treating GBM and lower-grade gliomas.
Collapse
Affiliation(s)
- Sevinç Akçay
- Department of Molecular Biology of Genetics, Kırşehir Ahi Evran University, Kırşehir, Turkey
| | - Emine Güven
- Department of Biomedical Engineering, Düzce University, Düzce, Turkey
| | - Muhammad Afzal
- Department of Pharmacology, College of Pharmacy, Jouf University, Sakaka, AlJouf 72341, Saudi Arabia.
| | - Imran Kazmi
- Department of Biochemistry, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
38
|
Nouraei H, Nouraei H, Rabkin SW. Comparison of Unsupervised Machine Learning Approaches for Cluster Analysis to Define Subgroups of Heart Failure with Preserved Ejection Fraction with Different Outcomes. Bioengineering (Basel) 2022; 9:bioengineering9040175. [PMID: 35447735 PMCID: PMC9033031 DOI: 10.3390/bioengineering9040175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 04/01/2022] [Accepted: 04/13/2022] [Indexed: 11/30/2022] Open
Abstract
Heart failure with preserved ejection (HFpEF) is a heterogenous condition affecting nearly half of all patients with heart failure (HF). Artificial intelligence methodologies can be useful to identify patient subclassifications with important clinical implications. We sought a comparison of different machine learning (ML) techniques and clustering capabilities in defining meaningful subsets of patients with HFpEF. Three unsupervised clustering strategies, hierarchical clustering, K-prototype, and partitioning around medoids (PAM), were used to identify distinct clusters in patients with HFpEF, based on a wide range of demographic, laboratory, and clinical parameters. The study population had a median age of 77 years, with a female majority, and moderate diastolic dysfunction. Hierarchical clustering produced six groups but two were too small (two and seven cases) to be clinically meaningful. The K-prototype methods produced clusters in which several clinical and biochemical features did not show statistically significant differences and there was significant overlap between the clusters. The PAM methodology provided the best group separations and identified six mutually exclusive groups (HFpEF1-6) with statistically significant differences in patient characteristics and outcomes. Comparison of three different unsupervised ML clustering strategies, hierarchical clustering, K-prototype, and partitioning around medoids (PAM), was performed on a mixed dataset of patients with HFpEF containing clinical and numerical data. The PAM method identified six distinct subsets of patients with HFpEF with different long-term outcomes or mortality. By comparison, the two other clustering algorithms, the hierarchical clustering and K-prototype, were less optimal.
Collapse
|
39
|
Patowary P, Bhattacharyya DK, Barah P. SNMRS: An advanced measure for Co-expression network analysis. Comput Biol Med 2022; 143:105222. [PMID: 35121360 DOI: 10.1016/j.compbiomed.2022.105222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 01/03/2022] [Accepted: 01/03/2022] [Indexed: 11/17/2022]
Abstract
The challenge of identifying modules in a gene interaction network is important for a better understanding of the overall network architecture. In this work, we develop a novel similarity measure called Scaling-and-Shifting Normalized Mean Residue Similarity (SNMRS), based on the existing NMRS technique [1]. SNMRS yields correlation values in the range of 0 to +1 corresponding to negative and positive dependency. To study the performance of our measure, internal validation of extracted clusters resulting from different methods is carried out. Based on the performance, we choose hierarchical clustering and apply the same using the corresponding dissimilarity (distance) values of SNMRS scores, and utilize a dynamic tree cut method for extracting dense modules. The modules are validated using a literature search, KEGG pathway analysis, and gene-ontology analyses on the genes that make up the modules. Moreover, our measure can handle absolute, shifting, scaling, and shifting-and-scaling correlations and provides better performance than several other measures in terms of cluster-validity indices. Also, SNMRS based module detection method results in interesting biologically relevant patterns from gene microarray and RNA-seq dataset. A set of crucial genes having high relevance with the ESCC are also identified.
Collapse
Affiliation(s)
- Pallabi Patowary
- Department of Computer Science and Engineering, Tezpur University, Assam, India.
| | | | - Pankaj Barah
- Dept. of Molecular Biology and Biotechnology Tezpur University, Assam, India.
| |
Collapse
|
40
|
Wellinger RE, Aguilar-Ruiz JS. A new challenge for data analytics: transposons. BioData Min 2022; 15:9. [PMID: 35337342 PMCID: PMC8957154 DOI: 10.1186/s13040-022-00294-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
- Ralf E Wellinger
- Centro Andaluz de Biología Molecular y Medicina Regenerativa-CABIMER, Universidad de Sevilla - CSIC - Universidad Pablo de Olavide, Seville, 41092, Spain.,Department of Genetics, University of Seville, Seville, 41012, Spain
| | | |
Collapse
|
41
|
Jeuken GS, Tobin NP, Käll L. Survival analysis of pathway activity as a prognostic determinant in breast cancer. PLoS Comput Biol 2022; 18:e1010020. [PMID: 35344554 PMCID: PMC8989354 DOI: 10.1371/journal.pcbi.1010020] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 04/07/2022] [Accepted: 03/15/2022] [Indexed: 12/17/2022] Open
Abstract
High throughput biology enables the measurements of relative concentrations of thousands of biomolecules from e.g. tissue samples. The process leaves the investigator with the problem of how to best interpret the potentially large number of differences between samples. Many activities in a cell depend on ordered reactions involving multiple biomolecules, often referred to as pathways. It hence makes sense to study differences between samples in terms of altered pathway activity, using so-called pathway analysis. Traditional pathway analysis gives significance to differences in the pathway components' concentrations between sample groups, however, less frequently used methods for estimating individual samples' pathway activities have been suggested. Here we demonstrate that such a method can be used for pathway-based survival analysis. Specifically, we investigate the pathway activities' association with patients' survival time based on the transcription profiles of the METABRIC dataset. Our implementation shows that pathway activities are better prognostic markers for survival time in METABRIC than the individual transcripts. We also demonstrate that we can regress out the effect of individual pathways on other pathways, which allows us to estimate the other pathways' residual pathway activity on survival. Furthermore, we illustrate how one can visualize the often interdependent measures over hierarchical pathway databases using sunburst plots.
Collapse
Affiliation(s)
- Gustavo S. Jeuken
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH – Royal Institute of Technology, Solna, Sweden
| | - Nicholas P. Tobin
- Department of Oncology and Pathology, Karolinska Institutet and University Hospital, Stockholm, Sweden
| | - Lukas Käll
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH – Royal Institute of Technology, Solna, Sweden
- * E-mail:
| |
Collapse
|
42
|
Göcz B, Takács S, Skrapits K, Rumpler É, Solymosi N, Póliska S, Colledge WH, Hrabovszky E, Sárvári M. Estrogen differentially regulates transcriptional landscapes of preoptic and arcuate kisspeptin neuron populations. Front Endocrinol (Lausanne) 2022; 13:960769. [PMID: 36093104 PMCID: PMC9454256 DOI: 10.3389/fendo.2022.960769] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 07/28/2022] [Indexed: 11/17/2022] Open
Abstract
Kisspeptin neurons residing in the rostral periventricular area of the third ventricle (KPRP3V) and the arcuate nucleus (KPARC) mediate positive and negative estrogen feedback, respectively. Here, we aim to compare transcriptional responses of KPRP3V and KPARC neurons to estrogen. Transgenic mice were ovariectomized and supplemented with either 17β-estradiol (E2) or vehicle. Fluorescently tagged KPRP3V neurons collected by laser-capture microdissection were subjected to RNA-seq. Bioinformatics identified 222 E2-dependent genes. Four genes encoding neuropeptide precursors (Nmb, Kiss1, Nts, Penk) were robustly, and Cartpt was subsignificantly upregulated, suggesting putative contribution of multiple neuropeptides to estrogen feedback mechanisms. Using overrepresentation analysis, the most affected KEGG pathways were neuroactive ligand-receptor interaction and dopaminergic synapse. Next, we re-analyzed our previously obtained KPARC neuron RNA-seq data from the same animals using identical bioinformatic criteria. The identified 1583 E2-induced changes included suppression of many neuropeptide precursors, granins, protein processing enzymes, and other genes related to the secretory pathway. In addition to distinct regulatory responses, KPRP3V and KPARC neurons exhibited sixty-two common changes in genes encoding three hormone receptors (Ghsr, Pgr, Npr2), GAD-65 (Gad2), calmodulin and its regulator (Calm1, Pcp4), among others. Thirty-four oppositely regulated genes (Kiss1, Vgf, Chrna7, Tmem35a) were also identified. The strikingly different transcriptional responses in the two neuron populations prompted us to explore the transcriptional mechanism further. We identified ten E2-dependent transcription factors in KPRP3V and seventy in KPARC neurons. While none of the ten transcription factors interacted with estrogen receptor-α, eight of the seventy did. We propose that an intricate, multi-layered transcriptional mechanism exists in KPARC neurons and a less complex one in KPRP3V neurons. These results shed new light on the complexity of estrogen-dependent regulatory mechanisms acting in the two functionally distinct kisspeptin neuron populations and implicate additional neuropeptides and mechanisms in estrogen feedback.
Collapse
Affiliation(s)
- Balázs Göcz
- Laboratory of Reproductive Neurobiology, Institute of Experimental Medicine, Budapest, Hungary
- János Szentágothai Doctoral School of Neurosciences, Semmelweis University, Budapest, Hungary
- *Correspondence: Erik Hrabovszky, ; Miklós Sárvári, ; Balázs Göcz,
| | - Szabolcs Takács
- Laboratory of Reproductive Neurobiology, Institute of Experimental Medicine, Budapest, Hungary
| | - Katalin Skrapits
- Laboratory of Reproductive Neurobiology, Institute of Experimental Medicine, Budapest, Hungary
| | - Éva Rumpler
- Laboratory of Reproductive Neurobiology, Institute of Experimental Medicine, Budapest, Hungary
| | - Norbert Solymosi
- Centre for Bioinformatics, University of Veterinary Medicine, Budapest, Hungary
| | - Szilárd Póliska
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - William H. Colledge
- Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom
| | - Erik Hrabovszky
- Laboratory of Reproductive Neurobiology, Institute of Experimental Medicine, Budapest, Hungary
- *Correspondence: Erik Hrabovszky, ; Miklós Sárvári, ; Balázs Göcz,
| | - Miklós Sárvári
- Laboratory of Reproductive Neurobiology, Institute of Experimental Medicine, Budapest, Hungary
- *Correspondence: Erik Hrabovszky, ; Miklós Sárvári, ; Balázs Göcz,
| |
Collapse
|
43
|
Marczyk M, Macioszek A, Tobiasz J, Polanska J, Zyla J. Importance of SNP Dependency Correction and Association Integration for Gene Set Analysis in Genome-Wide Association Studies. Front Genet 2021; 12:767358. [PMID: 34956320 PMCID: PMC8696167 DOI: 10.3389/fgene.2021.767358] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 11/10/2021] [Indexed: 11/13/2022] Open
Abstract
A typical genome-wide association study (GWAS) analyzes millions of single-nucleotide polymorphisms (SNPs), several of which are in a region of the same gene. To conduct gene set analysis (GSA), information from SNPs needs to be unified at the gene level. A widely used practice is to use only the most relevant SNP per gene; however, there are other methods of integration that could be applied here. Also, the problem of nonrandom association of alleles at two or more loci is often neglected. Here, we tested the impact of incorporation of different integrations and linkage disequilibrium (LD) correction on the performance of several GSA methods. Matched normal and breast cancer samples from The Cancer Genome Atlas database were used to evaluate the performance of six GSA algorithms: Coincident Extreme Ranks in Numerical Observations (CERNO), Gene Set Enrichment Analysis (GSEA), GSEA-SNP, improved GSEA for GWAS (i-GSEA4GWAS), Meta-Analysis Gene-set Enrichment of variaNT Associations (MAGENTA), and Over-Representation Analysis (ORA). Association of SNPs to phenotype was calculated using modified McNemar's test. Results for SNPs mapped to the same gene were integrated using Fisher and Stouffer methods and compared with the minimum p-value method. Four common measures were used to quantify the performance of all combinations of methods. Results of GSA analysis on GWAS were compared to the one performed on gene expression data. Comparing all evaluation metrics across different GSA algorithms, integrations, and LD correction, we highlighted CERNO, and MAGENTA with Stouffer as the most efficient. Applying LD correction increased prioritization and specificity of enrichment outcomes for all tested algorithms. When Fisher or Stouffer were used with LD, sensitivity and reproducibility were also better. Using any integration method was beneficial in comparison with a minimum p-value method in specific combinations. The correlation between GSA results from genomic and transcriptomic level was the highest when Stouffer integration was combined with LD correction. We thoroughly evaluated different approaches to GSA in GWAS in terms of performance to guide others to select the most effective combinations. We showed that LD correction and Stouffer integration could increase the performance of enrichment analysis and encourage the usage of these techniques.
Collapse
Affiliation(s)
- Michal Marczyk
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland.,Yale Cancer Center, Yale School of Medicine, New Haven, CT, United States
| | - Agnieszka Macioszek
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Tobiasz
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Polanska
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| | - Joanna Zyla
- Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
44
|
Ozgul OF, Bardak B, Tan M. A Convolutional Deep Clustering Framework for Gene Expression Time Series. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2198-2207. [PMID: 32324563 DOI: 10.1109/tcbb.2020.2988985] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The functional or regulatory processes within the cell are explicitly governed by the expression levels of a subset of its genes. Gene expression time series captures activities of individual genes over time and aids revealing underlying cellular dynamics. An important step in high-throughput gene expression time series experiment is clustering genes based on their temporal expression patterns and is conventionally achieved by unsupervised machine learning techniques. However, most of the clustering techniques either suffer from the short length of gene expression time series or ignore temporal structure of the data. In this work, we propose DeepTrust, a novel deep learning-based framework for gene expression time series clustering which can overcome these issues. DeepTrust initially transforms time series data into images to obtain richer data representations. Afterwards, a deep convolutional clustering algorithm is applied on the constructed images. Analyses on both simulated and biological data sets exhibit the efficiency of this new framework, compared to widely used clustering techniques. We also utilize enrichment analyses to illustrate the biological plausibility of the clusters detected by DeepTrust. Our code and data are available from http://github.com/tanlab/DeepTrust.
Collapse
|
45
|
Yu SJ, Cong L, Pan Q, Ding LL, Lei S, Cheng LY, Fang YH, Wei ZT, Liu HQ, Ran C. Whole genome sequencing and bulked segregant analysis suggest a new mechanism of amitraz resistance in the citrus red mite, Panonychus citri (Acari: Tetranychidae). PEST MANAGEMENT SCIENCE 2021; 77:5032-5048. [PMID: 34223705 DOI: 10.1002/ps.6544] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 06/17/2021] [Accepted: 07/05/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Amitraz is a broad-spectrum insecticide/acaricide for the control of aphids, psyllids, ticks and mites. Current evidence suggests that ticks and phytophagous mites have developed strong resistance to amitraz. Previous studies have shown that multiple mechanisms are associated with amitraz resistance in ticks, but very few reports have involved Panonychus citri. We therefore used whole genome sequencing and bulked segregant analysis (BSA) to identify the mechanism underlying P. citri's resistance to amitraz. RESULTS High-quality assembly of the whole P. citri genome was completed, resulting in a genome of approximately 83.97 Mb and a contig N50 of approximately 1.81 Mb. Gene structure predictions revealed 11 577 genes, of which 10 940 genes were annotated. Trait-associated regions in the genome were mapped with bulked segregant analysis and 38 candidate SNPs were obtained, of which T752C had the strongest correlation with the resistant trait, located at the 5' untranslated region (UTR) of the β-2R adrenergic-like octopamine receptor gene. The mutation resulted in the formation of a short hairpin loop structure in mRNA and gene expression was down-regulated by more than 50% in the amitraz-resistant strain. Validation of the T752C mutation in field populations of P. citri found that the correlation between the resistance ratio and the base mutation was 94.40%. CONCLUSION Our results suggest that this 5' UTR mutation of the β-2R octopamine receptor gene, confers amitraz resistance in P. citri. This discovery provides a new explanation for the mechanism of pest resistance: base mutations in the 5' untranslated region of target gene may regulate the susceptibility of pests to pesticides.
Collapse
Affiliation(s)
- Shi-Jiang Yu
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| | - Lin Cong
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| | - Qi Pan
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| | - Li-Li Ding
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| | - Shuang Lei
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| | - Lu-Yan Cheng
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| | - Yun-Hong Fang
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| | - Zhi-Tang Wei
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| | - Hao-Qiang Liu
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| | - Chun Ran
- Citrus Research Institute, Southwest University/Chinese Academy of Agricultural Sciences, National Engineering Research Center for Citrus, Chongqing, China
| |
Collapse
|
46
|
Lima LADO, Miranda GHN, Aragão WAB, Bittencourt LO, Dos Santos SM, de Souza MPC, Nogueira LS, de Oliveira EHC, Monteiro MC, Dionizio A, Leite AL, Pessan JP, Buzalaf MAR, Lima RR. Effects of Fluoride on Submandibular Glands of Mice: Changes in Oxidative Biochemistry, Proteomic Profile, and Genotoxicity. Front Pharmacol 2021; 12:715394. [PMID: 34646132 PMCID: PMC8503261 DOI: 10.3389/fphar.2021.715394] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 08/04/2021] [Indexed: 01/21/2023] Open
Abstract
Although fluoride (F) is well-known to prevent dental caries, changes in cell processes in different tissues have been associated with its excessive exposure. Thus, this study aimed to evaluate the effects of F exposure on biochemical, proteomic, and genotoxic parameters of submandibular glands. Twenty one old rats (n = 30) were allocated into three groups: 60 days administration of drinking water containing 10 mgF/L, 50 mgF/L, or only deionized water (control). The submandibular glands were collected for oxidative biochemistry, protein expression profile, and genotoxic potential analyses. The results showed that both F concentrations increased the levels of thiobarbituric acid–reactive substances (TBARS) and reduced glutathione (GSH) and changed the proteomic profile, mainly regarding the cytoskeleton and cellular activity. Only the exposure to 50 mgF/L induced significant changes in DNA integrity. These findings reinforce the importance of continuous monitoring of F concentration in drinking water and the need for strategies to minimize F intake from other sources to obtain maximum preventive/therapeutic effects and avoid potential adverse effects.
Collapse
Affiliation(s)
| | - Giza Hellen Nonato Miranda
- Laboratory of Functional and Structural Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Brazil
| | - Walessa Alana Bragança Aragão
- Laboratory of Functional and Structural Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Brazil
| | - Leonardo Oliveira Bittencourt
- Laboratory of Functional and Structural Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Brazil
| | - Sávio Monteiro Dos Santos
- Laboratory of Clinical Immunology and Oxidative Stress, Faculty of Pharmacy, Institute of Health Sciences, Federal University of Pará, Belém, Brazil
| | | | - Lygia S Nogueira
- Laboratory of Cell Culture and Cytogenetics, Environment Section, Evandro Chagas Institute, Ananindeua, Brazil
| | | | - Marta Chagas Monteiro
- Laboratory of Clinical Immunology and Oxidative Stress, Faculty of Pharmacy, Institute of Health Sciences, Federal University of Pará, Belém, Brazil
| | - Aline Dionizio
- Department of Biological Sciences, Bauru School of Dentistry, University of São Paulo, Bauru, Brazil
| | - Aline Lima Leite
- Department of Chemistry, University of Nebraska-Lincoln, Lincoln, NE, United States
| | - Juliano Pelim Pessan
- Department of Preventive and Restorative Dentistry, School of Dentistry, São Paulo State University (UNESP), Araçatuba, Brazil
| | | | - Rafael Rodrigues Lima
- Laboratory of Functional and Structural Biology, Institute of Biological Sciences, Federal University of Pará, Belém, Brazil
| |
Collapse
|
47
|
Wieder C, Frainay C, Poupin N, Rodríguez-Mier P, Vinson F, Cooke J, Lai RPJ, Bundy JG, Jourdan F, Ebbels T. Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis. PLoS Comput Biol 2021; 17:e1009105. [PMID: 34492007 PMCID: PMC8448349 DOI: 10.1371/journal.pcbi.1009105] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 09/17/2021] [Accepted: 08/23/2021] [Indexed: 11/18/2022] Open
Abstract
Over-representation analysis (ORA) is one of the commonest pathway analysis approaches used for the functional interpretation of metabolomics datasets. Despite the widespread use of ORA in metabolomics, the community lacks guidelines detailing its best-practice use. Many factors have a pronounced impact on the results, but to date their effects have received little systematic attention. Using five publicly available datasets, we demonstrated that changes in parameters such as the background set, differential metabolite selection methods, and pathway database used can result in profoundly different ORA results. The use of a non-assay-specific background set, for example, resulted in large numbers of false-positive pathways. Pathway database choice, evaluated using three of the most popular metabolic pathway databases (KEGG, Reactome, and BioCyc), led to vastly different results in both the number and function of significantly enriched pathways. Factors that are specific to metabolomics data, such as the reliability of compound identification and the chemical bias of different analytical platforms also impacted ORA results. Simulated metabolite misidentification rates as low as 4% resulted in both gain of false-positive pathways and loss of truly significant pathways across all datasets. Our results have several practical implications for ORA users, as well as those using alternative pathway analysis methods. We offer a set of recommendations for the use of ORA in metabolomics, alongside a set of minimal reporting guidelines, as a first step towards the standardisation of pathway analysis in metabolomics.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Clément Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Pablo Rodríguez-Mier
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Florence Vinson
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Rachel PJ Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Jacob G. Bundy
- Section of Biomolecular Medicine, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Fabien Jourdan
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
- MetaToul-MetaboHUB, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
48
|
Identification of Three Key Genes Associated with Hepatocellular Carcinoma Progression Based on Co-expression Analysis. Cell Biochem Biophys 2021; 80:301-309. [PMID: 34406599 DOI: 10.1007/s12013-021-01028-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/28/2021] [Indexed: 10/20/2022]
Abstract
Hepatocellular carcinoma (HCC) is the fifth most common cancer and one of the leading causes of cancer-related death in the world. Due to the recurrence of HCC, its survival rate is still low. Therefore, it is vital to seek prognostic biomarkers for HCC. In this study, differential analysis was conducted on gene expression data in The Cancer Genome Atlas -LIHC, and 4482 differentially expressed genes in tumor tissue were selected. Then, weighted gene co-expression network analysis was used to analyze the co-expression of the gained differential genes. By module-trait correlation analysis, the turquoise gene module that was significantly related to tumor grade, pathologic_T stage, and clinical stage was identified. Thereafter, enrichment analysis of genes in this module uncovered that the genes were mainly enriched in the signaling pathways involved in spliceosome and cell cycle. After that, through correlation analysis, 18 hub genes highly correlated with tumor grade, clinical stage, pathologic_T stage, and the turquoise module were selected. Meanwhile, protein-protein interaction (PPI) network was constructed by using genes in the module. Finally, three key genes, heterogeneous nuclear ribonucleoprotein L, serrate RNA effector molecule, and cyclin B2, were identified by intersecting the top 30 genes with the highest connectivity in PPI network and the previously obtained 18 hub genes in the turquoise module. Further survival analysis revealed that high expression of the three key genes predicted poor prognosis of HCC. These results indicated the direction for further research on clinical diagnosis and prognostic biomarkers of HCC.
Collapse
|
49
|
Lodhi N, Singh R, Rajput SP, Saquib Q. SARS-CoV-2: Understanding the Transcriptional Regulation of ACE2 and TMPRSS2 and the Role of Single Nucleotide Polymorphism (SNP) at Codon 72 of p53 in the Innate Immune Response against Virus Infection. Int J Mol Sci 2021; 22:8660. [PMID: 34445373 PMCID: PMC8395432 DOI: 10.3390/ijms22168660] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Revised: 07/28/2021] [Accepted: 08/02/2021] [Indexed: 12/15/2022] Open
Abstract
Human ACE2 and the serine protease TMPRSS2 of novel SARS-CoV-2 are primary entry receptors in host cells. Expression of these genes at the transcriptional level has not been much discussed in detail. The ISRE elements of the ACE2 promoter are a binding site for the ISGF3 complex of the JAK/STAT signaling pathway. TMPRSS2, including IFNβ, STAT1, and STAT2, has the PARP1 binding site near to TSS either up or downstream promoter region. It is well documented that PARP1 regulates gene expression at the transcription level. Therefore, to curb virus infection, both promoting type I IFN signaling to boost innate immunity and prevention of virus entry by inhibiting PARP1, ACE2 or TMPRSS2 are safe options. Most importantly, our aim is to attract the attention of the global scientific community towards the codon 72 Single Nucleotide Polymorphism (SNP) of p53 and its underneath role in the innate immune response against SARS-CoV-2. Here, we discuss codon 72 SNP of human p53's role in the different innate immune response to restrict virus-mediated mortality rate only in specific parts of the world. In addition, we discuss potential targets and emerging therapies using bioengineered bacteriophage, anti-sense, or CRISPR strategies.
Collapse
Affiliation(s)
- Niraj Lodhi
- Clinical Research (Research and Development Division) miRNA Analytics LLC, Harlem Bio-Space, New York, NY 10027, USA
| | - Rubi Singh
- Department of Pharmacology, Weill Cornell Medicine, New York, NY 10065, USA;
| | | | - Quaiser Saquib
- Department of Zoology, College of Sciences, King Saud University, Riyadh 12372, Saudi Arabia;
| |
Collapse
|
50
|
Ferreira CER, Campos GS, Schmidt PI, Sollero BP, Goularte KL, Corcini CD, Gasperin BG, Lucia T, Boligon AA, Cardoso FF. Genome-wide association and genomic prediction for scrotal circumference in Hereford and Braford bulls. Theriogenology 2021; 172:268-280. [PMID: 34303226 DOI: 10.1016/j.theriogenology.2021.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 07/12/2021] [Accepted: 07/14/2021] [Indexed: 11/19/2022]
Abstract
Scrotal circumference (SC) is widely used as a selection criterion for bulls in breeding programs, since it is easily assessed and correlated with several desirable reproductive traits. The objectives of this study were: to perform a genome-wide association study (GWAS) to identify genomic regions associated with SC adjusted for age (SCa) and for both age and weight (SCaw); to select Tag SNPs from GWAS to construct low-density panel for genomic prediction; and to compare the prediction accuracy of the SC through different methods for Braford and Hereford bulls from the same genetic breeding program. Data of SC from 18,172 bulls (30.4 ± 3.7 cm) and of genotypes from 131 sires and 3,545 animals were used. From GWAS, the top 1% of 1-Mb windows were observed on chromosome (BTA) 2, 20, 7, 8, 15, 3, 16, 27, 6 and 8 for SCa and on BTA 8, 15, 16, 21, 19, 2, 6, 5 and 10 for SCaw, representing 17.4% and 18.8% of the additive genetic variance of SCa and SCaw, respectively. The MeSH analysis was able to translate genomic information providing biological meanings of more specific gene functions related to the SCa and SCaw. The genomic enhancement methods, especially single step GBLUP, that combined phenotype and pedigree data with direct genomic values generated gains in accuracy in relation to pedigree BLUP, suggesting that genomic predictions should be applied to improve genetic gain and to narrow the generation interval compared to traditional methods. The proposed Tag-SNP panels may be useful for lower-cost commercial genomic prediction applications in the future, when the number of bulls in the reference population increases for SC in Hereford and Braford breeds.
Collapse
Affiliation(s)
- Carlos E R Ferreira
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil.
| | - Gabriel S Campos
- Departamento de Zootecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Patricia I Schmidt
- Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual de São Paulo, Jaboticabal, SP, Brazil
| | | | - Karina L Goularte
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Carine D Corcini
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Bernardo G Gasperin
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Thomaz Lucia
- ReproPel, Faculdade de Veterinária, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Arione A Boligon
- Departamento de Zootecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas, Pelotas, RS, Brazil
| | - Fernando F Cardoso
- Departamento de Zootecnia, Faculdade de Agronomia Eliseu Maciel, Universidade Federal de Pelotas, Pelotas, RS, Brazil; Embrapa Pecuária Sul, Bagé, RS, Brazil
| |
Collapse
|