1
|
Disner GR, Fernandes TADM, Nishiyama-Jr MY, Lima C, Wincent E, Lopes-Ferreira M. TnP and AHR-CYP1A1 Signaling Crosstalk in an Injury-Induced Zebrafish Inflammation Model. Pharmaceuticals (Basel) 2024; 17:1155. [PMID: 39338318 PMCID: PMC11435205 DOI: 10.3390/ph17091155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 08/21/2024] [Accepted: 08/29/2024] [Indexed: 09/30/2024] Open
Abstract
Aryl Hydrocarbon Receptor (AHR) signaling is crucial for regulating the biotransformation of xenobiotics and physiological processes like inflammation and immunity. Meanwhile, Thalassophryne nattereri Peptide (TnP), a promising anti-inflammatory candidate from toadfish venom, demonstrates therapeutic effects through immunomodulation. However, its influence on AHR signaling remains unexplored. This study aimed to elucidate TnP's molecular mechanisms on the AHR-cytochrome P450, family 1 (CYP1) pathway upon injury-induced inflammation in wild-type (WT) and Ahr2-knockdown (KD) zebrafish larvae through transcriptomic analysis and Cyp1a reporters. TnP, while unable to directly activate AHR, potentiated AHR activation by the high-affinity ligand 6-Formylindolo [3,2-b]carbazole (FICZ), implying a role as a CYP1A inhibitor, confirmed by in vitro studies. This interplay suggests TnP's ability to modulate the AHR-CYP1 complex, prompting investigations into its influence on biotransformation pathways and injury-induced inflammation. Here, the inflammation model alone resulted in a significant response on the transcriptome, with most differentially expressed genes (DEGs) being upregulated across the groups. Ahr2-KD resulted in an overall greater number of DEGs, as did treatment with the higher dose of TnP in both WT and KD embryos. Genes related to oxidative stress and inflammatory response were the most apparent under inflamed conditions for both WT and KD groups, e.g., Tnfrsf1a, Irf1b, and Mmp9. TnP, specifically, induces the expression of Hspa5, Hsp90aa1.2, Cxcr3.3, and Mpeg1.2. Overall, this study suggests an interplay between TnP and the AHR-CYP1 pathway, stressing the inflammatory modulation through AHR-dependent mechanisms. Altogether, these results may offer new avenues in novel therapeutic strategies, such as based on natural bioactive molecules, harnessing AHR modulation for targeted and sustained drug effects in inflammatory conditions.
Collapse
Affiliation(s)
- Geonildo Rodrigo Disner
- Immunoregulation Unit, Laboratory of Applied Toxinology (CeTICS/FAPESP), Butantan Institute, São Paulo 05585-000, Brazil
- Unit of System Toxicology, Institute of Environmental Medicine, Karolinska Institutet, 171 77 Solna, Sweden
| | - Thales Alves de Melo Fernandes
- Nucleus of Bioinformatics and Computational Biology, Laboratory of Applied Toxinology, Butantan Institute, São Paulo 05585-000, Brazil
| | - Milton Yutaka Nishiyama-Jr
- Nucleus of Bioinformatics and Computational Biology, Laboratory of Applied Toxinology, Butantan Institute, São Paulo 05585-000, Brazil
| | - Carla Lima
- Immunoregulation Unit, Laboratory of Applied Toxinology (CeTICS/FAPESP), Butantan Institute, São Paulo 05585-000, Brazil
| | - Emma Wincent
- Unit of System Toxicology, Institute of Environmental Medicine, Karolinska Institutet, 171 77 Solna, Sweden
| | - Monica Lopes-Ferreira
- Immunoregulation Unit, Laboratory of Applied Toxinology (CeTICS/FAPESP), Butantan Institute, São Paulo 05585-000, Brazil
| |
Collapse
|
2
|
Qin H, Shi X, Zhou H. scSwinFormer: A Transformer-Based Cell-Type Annotation Method for scRNA-Seq Data Using Smooth Gene Embedding and Global Features. J Chem Inf Model 2024; 64:6316-6323. [PMID: 39101690 DOI: 10.1021/acs.jcim.4c00616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2024]
Abstract
Single-cell omics techniques have made it possible to analyze individual cells in biological samples, providing us with a more detailed understanding of cellular heterogeneity and biological systems. Accurate identification of cell types is critical for single-cell RNA sequencing (scRNA-seq) analysis. However, scRNA-seq data are usually high dimensional and sparse, posing a great challenge to analyze scRNA-seq data. Existing cell-type annotation methods are either constrained in modeling scRNA-seq data or lack consideration of long-term dependencies of characterized genes. In this work, we developed a Transformer-based deep learning method, scSwinFormer, for the cell-type annotation of large-scale scRNA-seq data. Sequence modeling of scRNA-seq data is performed using the smooth gene embedding module, and then, the potential dependencies of genes are captured by the self-attention module. Subsequently, the global information inherent in scRNA-seq data is synthesized using the Cell Token, thereby facilitating accurate cell-type annotation. We evaluated the performance of our model against current state-of-the-art scRNA-seq cell-type annotation methods on multiple real data sets. ScSwinFormer outperforms the current state-of-the-art scRNA-seq cell-type annotation methods in both external and benchmark data set experiments.
Collapse
Affiliation(s)
- Hengyu Qin
- School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
| | - Xiumin Shi
- School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
| | - Han Zhou
- School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
3
|
Feng S, Wang Z, Jin Y, Xu S. TabDEG: Classifying differentially expressed genes from RNA-seq data based on feature extraction and deep learning framework. PLoS One 2024; 19:e0305857. [PMID: 39037985 PMCID: PMC11262683 DOI: 10.1371/journal.pone.0305857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 06/05/2024] [Indexed: 07/24/2024] Open
Abstract
Traditional differential expression genes (DEGs) identification models have limitations in small sample size datasets because they require meeting distribution assumptions, otherwise resulting high false positive/negative rates due to sample variation. In contrast, tabular data model based on deep learning (DL) frameworks do not need to consider the data distribution types and sample variation. However, applying DL to RNA-Seq data is still a challenge due to the lack of proper labeling and the small sample size compared to the number of genes. Data augmentation (DA) extracts data features using different methods and procedures, which can significantly increase complementary pseudo-values from limited data without significant additional cost. Based on this, we combine DA and DL framework-based tabular data model, propose a model TabDEG, to predict DEGs and their up-regulation/down-regulation directions from gene expression data obtained from the Cancer Genome Atlas database. Compared to five counterpart methods, TabDEG has high sensitivity and low misclassification rates. Experiment shows that TabDEG is robust and effective in enhancing data features to facilitate classification of high-dimensional small sample size datasets and validates that TabDEG-predicted DEGs are mapped to important gene ontology terms and pathways associated with cancer.
Collapse
Affiliation(s)
- Sifan Feng
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, Guangdong, China
| | - Zhenyou Wang
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, Guangdong, China
| | - Yinghua Jin
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, Guangdong, China
| | - Shengbin Xu
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, Guangdong, China
| |
Collapse
|
4
|
Latif‐Hernandez A, Yang T, Butler RR, Losada PM, Minhas PS, White H, Tran KC, Liu H, Simmons DA, Langness V, Andreasson KI, Wyss‐Coray T, Longo FM. A TrkB and TrkC partial agonist restores deficits in synaptic function and promotes activity-dependent synaptic and microglial transcriptomic changes in a late-stage Alzheimer's mouse model. Alzheimers Dement 2024; 20:4434-4460. [PMID: 38779814 PMCID: PMC11247716 DOI: 10.1002/alz.13857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 03/12/2024] [Accepted: 04/02/2024] [Indexed: 05/25/2024]
Abstract
INTRODUCTION Tropomyosin related kinase B (TrkB) and C (TrkC) receptor signaling promotes synaptic plasticity and interacts with pathways affected by amyloid beta (Aβ) toxicity. Upregulating TrkB/C signaling could reduce Alzheimer's disease (AD)-related degenerative signaling, memory loss, and synaptic dysfunction. METHODS PTX-BD10-2 (BD10-2), a small molecule TrkB/C receptor partial agonist, was orally administered to aged London/Swedish-APP mutant mice (APPL/S) and wild-type controls. Effects on memory and hippocampal long-term potentiation (LTP) were assessed using electrophysiology, behavioral studies, immunoblotting, immunofluorescence staining, and RNA sequencing. RESULTS In APPL/S mice, BD10-2 treatment improved memory and LTP deficits. This was accompanied by normalized phosphorylation of protein kinase B (Akt), calcium-calmodulin-dependent kinase II (CaMKII), and AMPA-type glutamate receptors containing the subunit GluA1; enhanced activity-dependent recruitment of synaptic proteins; and increased excitatory synapse number. BD10-2 also had potentially favorable effects on LTP-dependent complement pathway and synaptic gene transcription. DISCUSSION BD10-2 prevented APPL/S/Aβ-associated memory and LTP deficits, reduced abnormalities in synapse-related signaling and activity-dependent transcription of synaptic genes, and bolstered transcriptional changes associated with microglial immune response. HIGHLIGHTS Small molecule modulation of tropomyosin related kinase B (TrkB) and C (TrkC) restores long-term potentiation (LTP) and behavior in an Alzheimer's disease (AD) model. Modulation of TrkB and TrkC regulates synaptic activity-dependent transcription. TrkB and TrkC receptors are candidate targets for translational therapeutics. Electrophysiology combined with transcriptomics elucidates synaptic restoration. LTP identifies neuron and microglia AD-relevant human-mouse co-expression modules.
Collapse
Affiliation(s)
- Amira Latif‐Hernandez
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
| | - Tao Yang
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
| | - Robert R. Butler
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
| | - Patricia Moran Losada
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
- Wu Tsai Neurosciences Institute, Stanford UniversityStanfordCaliforniaUSA
| | - Paras S. Minhas
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
| | - Halle White
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
| | - Kevin C. Tran
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
| | - Harry Liu
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
| | - Danielle A. Simmons
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
| | - Vanessa Langness
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
| | - Katrin I. Andreasson
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
- Wu Tsai Neurosciences Institute, Stanford UniversityStanfordCaliforniaUSA
- Chan Zuckerberg BiohubSan FranciscoCaliforniaUSA
| | - Tony Wyss‐Coray
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
- Wu Tsai Neurosciences Institute, Stanford UniversityStanfordCaliforniaUSA
- The Phil and Penny Knight Initiative for Brain ResilienceStanford UniversityStanfordCaliforniaUSA
| | - Frank M. Longo
- Department of Neurology & Neurological SciencesStanford University School of MedicinePalo AltoCaliforniaUSA
- Wu Tsai Neurosciences Institute, Stanford UniversityStanfordCaliforniaUSA
| |
Collapse
|
5
|
Mylarshchikov D, Nikolskaya A, Bogomaz O, Zharikova A, Mironov A. BaRDIC: robust peak calling for RNA-DNA interaction data. NAR Genom Bioinform 2024; 6:lqae054. [PMID: 38774512 PMCID: PMC11106031 DOI: 10.1093/nargab/lqae054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 04/29/2024] [Accepted: 05/09/2024] [Indexed: 05/24/2024] Open
Abstract
Chromatin-associated non-coding RNAs play important roles in various cellular processes by targeting genomic loci. Two types of genome-wide NGS experiments exist to detect such targets: 'one-to-al', which focuses on targets of a single RNA, and 'all-to-al', which captures targets of all RNAs in a sample. As with many NGS experiments, they are prone to biases and noise, so it becomes essential to detect 'peaks'-specific interactions of an RNA with genomic targets. Here, we present BaRDIC-Binomial RNA-DNA Interaction Caller-a tailored method to detect peaks in both types of RNA-DNA interaction data. BaRDIC is the first tool to simultaneously take into account the two most prominent biases in the data: chromatin heterogeneity and distance-dependent decay of interaction frequency. Since RNAs differ in their interaction preferences, BaRDIC adapts peak sizes according to the abundances and contact patterns of individual RNAs. These features enable BaRDIC to make more robust predictions than currently applied peak-calling algorithms and better handle the characteristic sparsity of all-to-all data. The BaRDIC package is freely available at https://github.com/dmitrymyl/BaRDIC.
Collapse
Affiliation(s)
- Dmitry E Mylarshchikov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory, Moscow 119234, Russia
| | - Arina I Nikolskaya
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory, Moscow 119234, Russia
| | - Olesja D Bogomaz
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory, Moscow 119234, Russia
| | - Anastasia A Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory, Moscow 119234, Russia
- Kharkevich Institute for Information Transmission Problems RAS, Bolshoy Karetny per., Moscow 127051, Russia
| | - Andrey A Mironov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory, Moscow 119234, Russia
- Kharkevich Institute for Information Transmission Problems RAS, Bolshoy Karetny per., Moscow 127051, Russia
| |
Collapse
|
6
|
Bhattacharyya N, Chai N, Hafford-Tear NJ, Sadan AN, Szabo A, Zarouchlioti C, Jedlickova J, Leung SK, Liao T, Dudakova L, Skalicka P, Parekh M, Moghul I, Jeffries AR, Cheetham ME, Muthusamy K, Hardcastle AJ, Pontikos N, Liskova P, Tuft SJ, Davidson AE. Deciphering novel TCF4-driven mechanisms underlying a common triplet repeat expansion-mediated disease. PLoS Genet 2024; 20:e1011230. [PMID: 38713708 PMCID: PMC11101122 DOI: 10.1371/journal.pgen.1011230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 05/17/2024] [Accepted: 03/19/2024] [Indexed: 05/09/2024] Open
Abstract
Fuchs endothelial corneal dystrophy (FECD) is an age-related cause of vision loss, and the most common repeat expansion-mediated disease in humans characterised to date. Up to 80% of European FECD cases have been attributed to expansion of a non-coding CTG repeat element (termed CTG18.1) located within the ubiquitously expressed transcription factor encoding gene, TCF4. The non-coding nature of the repeat and the transcriptomic complexity of TCF4 have made it extremely challenging to experimentally decipher the molecular mechanisms underlying this disease. Here we comprehensively describe CTG18.1 expansion-driven molecular components of disease within primary patient-derived corneal endothelial cells (CECs), generated from a large cohort of individuals with CTG18.1-expanded (Exp+) and CTG 18.1-independent (Exp-) FECD. We employ long-read, short-read, and spatial transcriptomic techniques to interrogate expansion-specific transcriptomic biomarkers. Interrogation of long-read sequencing and alternative splicing analysis of short-read transcriptomic data together reveals the global extent of altered splicing occurring within Exp+ FECD, and unique transcripts associated with CTG18.1-expansions. Similarly, differential gene expression analysis highlights the total transcriptomic consequences of Exp+ FECD within CECs. Furthermore, differential exon usage, pathway enrichment and spatial transcriptomics reveal TCF4 isoform ratio skewing solely in Exp+ FECD with potential downstream functional consequences. Lastly, exome data from 134 Exp- FECD cases identified rare (minor allele frequency <0.005) and potentially deleterious (CADD>15) TCF4 variants in 7/134 FECD Exp- cases, suggesting that TCF4 variants independent of CTG18.1 may increase FECD risk. In summary, our study supports the hypothesis that at least two distinct pathogenic mechanisms, RNA toxicity and TCF4 isoform-specific dysregulation, both underpin the pathophysiology of FECD. We anticipate these data will inform and guide the development of translational interventions for this common triplet-repeat mediated disease.
Collapse
Affiliation(s)
- Nihar Bhattacharyya
- University College London Institute of Ophthalmology, London, United Kingdom
| | - Niuzheng Chai
- University College London Institute of Ophthalmology, London, United Kingdom
| | | | - Amanda N. Sadan
- University College London Institute of Ophthalmology, London, United Kingdom
| | - Anita Szabo
- University College London Institute of Ophthalmology, London, United Kingdom
| | | | - Jana Jedlickova
- Department of Paediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Szi Kay Leung
- Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| | - Tianyi Liao
- University College London Institute of Ophthalmology, London, United Kingdom
| | - Lubica Dudakova
- Department of Paediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Pavlina Skalicka
- Department of Paediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
- Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Mohit Parekh
- University College London Institute of Ophthalmology, London, United Kingdom
| | - Ismail Moghul
- University College London Institute of Ophthalmology, London, United Kingdom
- Moorfields Eye Hospital, London, United Kingdom
| | - Aaron R. Jeffries
- Faculty of Health and Life Sciences, University of Exeter, Exeter, United Kingdom
| | - Michael E. Cheetham
- University College London Institute of Ophthalmology, London, United Kingdom
| | | | - Alison J. Hardcastle
- University College London Institute of Ophthalmology, London, United Kingdom
- Moorfields Eye Hospital, London, United Kingdom
| | - Nikolas Pontikos
- University College London Institute of Ophthalmology, London, United Kingdom
- Moorfields Eye Hospital, London, United Kingdom
| | - Petra Liskova
- Department of Paediatrics and Inherited Metabolic Disorders, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
- Department of Ophthalmology, First Faculty of Medicine, Charles University and General University Hospital in Prague, Prague, Czech Republic
| | - Stephen J. Tuft
- University College London Institute of Ophthalmology, London, United Kingdom
- Moorfields Eye Hospital, London, United Kingdom
| | - Alice E. Davidson
- University College London Institute of Ophthalmology, London, United Kingdom
- Moorfields Eye Hospital, London, United Kingdom
| |
Collapse
|
7
|
Peng M, Lin B, Zhang J, Zhou Y, Lin B. scFSNN: a feature selection method based on neural network for single-cell RNA-seq data. BMC Genomics 2024; 25:264. [PMID: 38459442 PMCID: PMC10924397 DOI: 10.1186/s12864-024-10160-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 02/25/2024] [Indexed: 03/10/2024] Open
Abstract
While single-cell RNA sequencing (scRNA-seq) allows researchers to analyze gene expression in individual cells, its unique characteristics like over-dispersion, zero-inflation, high gene-gene correlation, and large data volume with many features pose challenges for most existing feature selection methods. In this paper, we present a feature selection method based on neural network (scFSNN) to solve classification problem for the scRNA-seq data. scFSNN is an embedded method that can automatically select features (genes) during model training, control the false discovery rate of selected features and adaptively determine the number of features to be eliminated. Extensive simulation and real data studies demonstrate its excellent feature selection ability and predictive performance.
Collapse
Affiliation(s)
- Minjiao Peng
- School of Mathematical Sciences, Shenzhen University, Nanshan, Shenzhen, 518060, Guangdong, China
- School of Mathematics and Statistics and KLAS, Northeast Normal University, Renmin Street, Changchun, 130000, Jilin, China
| | - Baoqin Lin
- Experimental Center, The First Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong, 510405, China
| | - Jun Zhang
- School of Mathematical Sciences, Shenzhen University, Nanshan, Shenzhen, 518060, Guangdong, China
| | - Yan Zhou
- School of Mathematical Sciences, Shenzhen University, Nanshan, Shenzhen, 518060, Guangdong, China
| | - Bingqing Lin
- School of Mathematical Sciences, Shenzhen University, Nanshan, Shenzhen, 518060, Guangdong, China.
| |
Collapse
|
8
|
Joodaki M, Shaigan M, Parra V, Bülow RD, Kuppe C, Hölscher DL, Cheng M, Nagai JS, Goedertier M, Bouteldja N, Tesar V, Barratt J, Roberts IS, Coppo R, Kramann R, Boor P, Costa IG. Detection of PatIent-Level distances from single cell genomics and pathomics data with Optimal Transport (PILOT). Mol Syst Biol 2024; 20:57-74. [PMID: 38177382 PMCID: PMC10883279 DOI: 10.1038/s44320-023-00003-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/20/2023] [Accepted: 11/24/2023] [Indexed: 01/06/2024] Open
Abstract
Although clinical applications represent the next challenge in single-cell genomics and digital pathology, we still lack computational methods to analyze single-cell or pathomics data to find sample-level trajectories or clusters associated with diseases. This remains challenging as single-cell/pathomics data are multi-scale, i.e., a sample is represented by clusters of cells/structures, and samples cannot be easily compared with each other. Here we propose PatIent Level analysis with Optimal Transport (PILOT). PILOT uses optimal transport to compute the Wasserstein distance between two individual single-cell samples. This allows us to perform unsupervised analysis at the sample level and uncover trajectories or cellular clusters associated with disease progression. We evaluate PILOT and competing approaches in single-cell genomics or pathomics studies involving various human diseases with up to 600 samples/patients and millions of cells or tissue structures. Our results demonstrate that PILOT detects disease-associated samples from large and complex single-cell or pathomics data. Moreover, PILOT provides a statistical approach to find changes in cell populations, gene expression, and tissue structures related to the trajectories or clusters supporting interpretation of predictions.
Collapse
Affiliation(s)
- Mehdi Joodaki
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, Germany
| | - Mina Shaigan
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, Germany
| | - Victor Parra
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, Germany
| | - Roman D Bülow
- Institute of Pathology, RWTH Aachen University Medical School, Aachen, Germany
| | - Christoph Kuppe
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University, Aachen, Germany
| | - David L Hölscher
- Institute of Pathology, RWTH Aachen University Medical School, Aachen, Germany
| | - Mingbo Cheng
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, Germany
| | - James S Nagai
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, Germany
| | - Michaël Goedertier
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, Germany
- Institute of Pathology, RWTH Aachen University Medical School, Aachen, Germany
| | - Nassim Bouteldja
- Institute of Pathology, RWTH Aachen University Medical School, Aachen, Germany
| | - Vladimir Tesar
- Department of Nephrology, 1st Faculty of Medicine and General University Hospital, Charles University, Prague, Czech Republic
| | - Jonathan Barratt
- John Walls Renal Unit, University Hospital of Leicester National Health Service Trust, Leicester, UK
- Department of Cardiovascular Sciences, University of Leicester, Leicester, UK
| | - Ian Sd Roberts
- Department of Cellular Pathology, Oxford University Hospitals National Health Services Foundation Trust, Oxford, UK
| | - Rosanna Coppo
- Fondazione Ricerca Molinette, Regina Margherita Children's University Hospital, Torino, Italy
| | - Rafael Kramann
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University, Aachen, Germany
- Department of Internal Medicine, Nephrology and Transplantation, Erasmus Medical Center, Rotterdam, Netherlands
| | - Peter Boor
- Institute of Pathology, RWTH Aachen University Medical School, Aachen, Germany.
| | - Ivan G Costa
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, Germany.
| |
Collapse
|
9
|
Schloss PD. Waste not, want not: revisiting the analysis that called into question the practice of rarefaction. mSphere 2024; 9:e0035523. [PMID: 38054712 PMCID: PMC10826360 DOI: 10.1128/msphere.00355-23] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 10/24/2023] [Indexed: 12/07/2023] Open
Abstract
In 2014, McMurdie and Holmes published the provocatively titled "Waste not, want not: why rarefying microbiome data is inadmissible." The claims of their study have significantly altered how microbiome researchers control for the unavoidable uneven sequencing depths that are inherent in modern 16S rRNA gene sequencing. Confusion over the distinction between the definitions of rarefying and rarefaction continues to cloud the interpretation of their results. More importantly, the authors made a variety of problematic choices when designing and analyzing their simulations. I identified 11 factors that could have compromised the results of the original study. I reproduced the original simulation results and assessed the impact of those factors on the underlying conclusion that rarefying data is inadmissible. Throughout, the design of the original study made choices that caused rarefying and rarefaction to appear to perform worse than they truly did. Most important were the approaches used to assess ecological distances, the removal of samples with low sequencing depth, and not accounting for conditions where sequencing effort is confounded with treatment group. Although the original study criticized rarefying for the arbitrary removal of valid data, repeatedly rarefying data many times (i.e., rarefaction) incorporates all the data. In contrast, it is the removal of rare taxa that would appear to remove valid data. Overall, I show that rarefaction is the most robust approach to control for uneven sequencing effort when considered across a variety of alpha and beta diversity metrics.IMPORTANCEOver the past 10 years, the best method for normalizing the sequencing depth of samples characterized by 16S rRNA gene sequencing has been contentious. An often cited article by McMurdie and Holmes forcefully argued that rarefying the number of sequence counts was "inadmissible" and should not be employed. However, I identified a number of problems with the design of their simulations and analysis that compromised their results. In fact, when I reproduced and expanded upon their analysis, it was clear that rarefaction was actually the most robust approach for controlling for uneven sequencing effort across samples. Rarefaction limits the rate of falsely detecting and rejecting differences between treatment groups. Far from being "inadmissible", rarefaction is a valuable tool for analyzing microbiome sequence data.
Collapse
Affiliation(s)
- Patrick D. Schloss
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
10
|
Xia Y. Statistical normalization methods in microbiome data with application to microbiome cancer research. Gut Microbes 2023; 15:2244139. [PMID: 37622724 PMCID: PMC10461514 DOI: 10.1080/19490976.2023.2244139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 07/12/2023] [Accepted: 07/31/2023] [Indexed: 08/26/2023] Open
Abstract
Mounting evidence has shown that gut microbiome is associated with various cancers, including gastrointestinal (GI) tract and non-GI tract cancers. But microbiome data have unique characteristics and pose major challenges when using standard statistical methods causing results to be invalid or misleading. Thus, to analyze microbiome data, it not only needs appropriate statistical methods, but also requires microbiome data to be normalized prior to statistical analysis. Here, we first describe the unique characteristics of microbiome data and the challenges in analyzing them (Section 2). Then, we provide an overall review on the available normalization methods of 16S rRNA and shotgun metagenomic data along with examples of their applications in microbiome cancer research (Section 3). In Section 4, we comprehensively investigate how the normalization methods of 16S rRNA and shotgun metagenomic data are evaluated. Finally, we summarize and conclude with remarks on statistical normalization methods (Section 5). Altogether, this review aims to provide a broad and comprehensive view and remarks on the promises and challenges of the statistical normalization methods in microbiome data with microbiome cancer research examples.
Collapse
Affiliation(s)
- Yinglin Xia
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Illinois Chicago, Chicago, USA
| |
Collapse
|
11
|
Chrun T, Maze EA, Roper KJ, Vatzia E, Paudyal B, McNee A, Martini V, Manjegowda T, Freimanis G, Silesian A, Polo N, Clark B, Besell E, Booth G, Carr BV, Edmans M, Nunez A, Koonpaew S, Wanasen N, Graham SP, Tchilian E. Simultaneous co-infection with swine influenza A and porcine reproductive and respiratory syndrome viruses potentiates adaptive immune responses. Front Immunol 2023; 14:1192604. [PMID: 37287962 PMCID: PMC10242126 DOI: 10.3389/fimmu.2023.1192604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 05/09/2023] [Indexed: 06/09/2023] Open
Abstract
Porcine respiratory disease is multifactorial and most commonly involves pathogen co-infections. Major contributors include swine influenza A (swIAV) and porcine reproductive and respiratory syndrome (PRRSV) viruses. Experimental co-infection studies with these two viruses have shown that clinical outcomes can be exacerbated, but how innate and adaptive immune responses contribute to pathogenesis and pathogen control has not been thoroughly evaluated. We investigated immune responses following experimental simultaneous co-infection of pigs with swIAV H3N2 and PRRSV-2. Our results indicated that clinical disease was not significantly exacerbated, and swIAV H3N2 viral load was reduced in the lung of the co-infected animals. PRRSV-2/swIAV H3N2 co-infection did not impair the development of virus-specific adaptive immune responses. swIAV H3N2-specific IgG serum titers and PRRSV-2-specific CD8β+ T-cell responses in blood were enhanced. Higher proportions of polyfunctional CD8β+ T-cell subset in both blood and lung washes were found in PRRSV-2/swIAV H3N2 co-infected animals compared to the single-infected groups. Our findings provide evidence that systemic and local host immune responses are not negatively affected by simultaneous swIAV H3N2/PRRSV-2 co-infection, raising questions as to the mechanisms involved in disease modulation.
Collapse
Affiliation(s)
| | | | | | | | | | - Adam McNee
- The Pirbright Institute, Woking, United Kingdom
| | | | | | | | | | - Noemi Polo
- The Pirbright Institute, Woking, United Kingdom
| | - Becky Clark
- The Pirbright Institute, Woking, United Kingdom
| | | | | | | | | | - Alejandro Nunez
- Pathology and Animal Sciences, Animal and Plant Health Agency, Addlestone, United Kingdom
| | - Surapong Koonpaew
- Virology and Cell Technology Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathumthani, Thailand
| | - Nanchaya Wanasen
- Virology and Cell Technology Laboratory, National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathumthani, Thailand
| | | | | |
Collapse
|
12
|
Rahnenführer J, De Bin R, Benner A, Ambrogi F, Lusa L, Boulesteix AL, Migliavacca E, Binder H, Michiels S, Sauerbrei W, McShane L. Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges. BMC Med 2023; 21:182. [PMID: 37189125 DOI: 10.1186/s12916-023-02858-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Accepted: 04/03/2023] [Indexed: 05/17/2023] Open
Abstract
BACKGROUND In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. METHODS Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 "High-dimensional data" of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. RESULTS The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. CONCLUSIONS This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses.
Collapse
Affiliation(s)
| | | | - Axel Benner
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Federico Ambrogi
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
- Scientific Directorate, IRCCS Policlinico San Donato, San Donato Milanese, Italy
| | - Lara Lusa
- Department of Mathematics, Faculty of Mathematics, Natural Sciences and Information Technology, University of Primorksa, Koper, Slovenia
- Institute of Biostatistics and Medical Informatics, University of Ljubljana, Ljubljana, Slovenia
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Munich, Germany
| | | | - Harald Binder
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Stefan Michiels
- Service de Biostatistique et d'Épidémiologie, Gustave Roussy, Université Paris-Saclay, Villejuif, France
- Oncostat U1018, Inserm, Université Paris-Saclay, Labeled Ligue Contre le Cancer, Villejuif, France
| | - Willi Sauerbrei
- Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
| | - Lisa McShane
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, MD, USA.
| |
Collapse
|
13
|
Pandey D, Onkara Perumal P. A scoping review on deep learning for next-generation RNA-Seq. data analysis. Funct Integr Genomics 2023; 23:134. [PMID: 37084004 DOI: 10.1007/s10142-023-01064-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/24/2023] [Accepted: 04/17/2023] [Indexed: 04/22/2023]
Abstract
In the last decade, transcriptome research adopting next-generation sequencing (NGS) technologies has gathered incredible momentum amongst functional genomics scientists, particularly amongst clinical/biomedical research groups. The progressive enfoldment/adoption of NGS technologies has incited an abundance of next-generation transcriptomic data harbouring an opulence of new knowledge in public databases. Nevertheless, knowledge discovery from these next-generation RNA-Seq. data analysis necessitates extensive bioinformatics know-how besides elaborate data analysis software packages consistent with the type and context of data analysis. Several reliability and reproducibility concerns continue to impede RNA-Seq. data analysis. Characteristic challenges comprise of data quality, hardware and networking provisions, selection and prioritisation of data analysis tools, and yet significantly implementing of robust machine learning algorithms for maximised exploitation of these experimental transcriptomic data. Over the years, numerous machine learning algorithms have been implemented for improved transcriptomic data analysis executing predominantly shallow learning approaches. More recently, deep learning algorithms are becoming more mainstream, and enactment for next-generation RNA-Seq. data analysis could be revolutionary in the coming years in the biomedical domain. In this scoping review, we attempt to determine the existing literature's size and potential nature in deep learning and NGS RNA-Seq. data analysis. An analysis of the contemporary topics of next-generation RNA-Seq. data analysis based on deep learning algorithms is critically reviewed, emphasising open-source resources.
Collapse
Affiliation(s)
- Diksha Pandey
- Department of Biotechnology, National Institute of Technology, Warangal, Telanga na, 506004, India
| | - P Onkara Perumal
- Department of Biotechnology, National Institute of Technology, Warangal, Telanga na, 506004, India.
| |
Collapse
|
14
|
Systems level analysis of sex-dependent gene expression changes in Parkinson's disease. NPJ Parkinsons Dis 2023; 9:8. [PMID: 36681675 PMCID: PMC9867746 DOI: 10.1038/s41531-023-00446-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 01/03/2023] [Indexed: 01/22/2023] Open
Abstract
Parkinson's disease (PD) is a heterogeneous disorder, and among the factors which influence the symptom profile, biological sex has been reported to play a significant role. While males have a higher age-adjusted disease incidence and are more frequently affected by muscle rigidity, females present more often with disabling tremors. The molecular mechanisms involved in these differences are still largely unknown, and an improved understanding of the relevant factors may open new avenues for pharmacological disease modification. To help address this challenge, we conducted a meta-analysis of disease-associated molecular sex differences in brain transcriptomics data from case/control studies. Both sex-specific (alteration in only one sex) and sex-dimorphic changes (changes in both sexes, but with opposite direction) were identified. Using further systems level pathway and network analyses, coordinated sex-related alterations were studied. These analyses revealed significant disease-associated sex differences in mitochondrial pathways and highlight specific regulatory factors whose activity changes can explain downstream network alterations, propagated through gene regulatory cascades. Single-cell expression data analyses confirmed the main pathway-level changes observed in bulk transcriptomics data. Overall, our analyses revealed significant sex disparities in PD-associated transcriptomic changes, resulting in coordinated modulations of molecular processes. Among the regulatory factors involved, NR4A2 has already been reported to harbor rare mutations in familial PD and its pharmacological activation confers neuroprotective effects in toxin-induced models of Parkinsonism. Our observations suggest that NR4A2 may warrant further research as a potential adjuvant therapeutic target to address a subset of pathological molecular features of PD that display sex-associated profiles.
Collapse
|
15
|
Li Y, Rahman T, Ma T, Tang L, Tseng GC. A sparse negative binomial mixture model for clustering RNA-seq count data. Biostatistics 2022; 24:68-84. [PMID: 34363675 PMCID: PMC9766880 DOI: 10.1093/biostatistics/kxab025] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 06/03/2021] [Accepted: 06/06/2021] [Indexed: 12/16/2022] Open
Abstract
Clustering with variable selection is a challenging yet critical task for modern small-n-large-p data. Existing methods based on sparse Gaussian mixture models or sparse $K$-means provide solutions to continuous data. With the prevalence of RNA-seq technology and lack of count data modeling for clustering, the current practice is to normalize count expression data into continuous measures and apply existing models with a Gaussian assumption. In this article, we develop a negative binomial mixture model with lasso or fused lasso gene regularization to cluster samples (small $n$) with high-dimensional gene features (large $p$). A modified EM algorithm and Bayesian information criterion are used for inference and determining tuning parameters. The method is compared with existing methods using extensive simulations and two real transcriptomic applications in rat brain and breast cancer studies. The result shows the superior performance of the proposed count data model in clustering accuracy, feature selection, and biological interpretation in pathways.
Collapse
Affiliation(s)
- Yujia Li
- Department of Biostatistics, University of Pittsburgh,
Pittsburgh, PA 15261, USA
| | - Tanbin Rahman
- Department of Biostatistics, University of Pittsburgh,
Pittsburgh, PA 15261, USA
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, University of
Maryland, College Park, MD 20742, USA
| | | | - George C Tseng
- Department of Biostatistics, University of Pittsburgh,
Pittsburgh, PA 15261, USA
| |
Collapse
|
16
|
Moreira JDR, Quiñones A, Lira BS, Robledo JM, Curtin SJ, Vicente MH, Ribeiro DM, Ryngajllo M, Jiménez-Gómez JM, Peres LEP, Rossi M, Zsögön A. SELF PRUNING 3C is a flowering repressor that modulates seed germination, root architecture, and drought responses. JOURNAL OF EXPERIMENTAL BOTANY 2022; 73:6226-6240. [PMID: 35710302 DOI: 10.1093/jxb/erac265] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 06/14/2022] [Indexed: 06/15/2023]
Abstract
Allelic variation in the CETS (CENTRORADIALIS, TERMINAL FLOWER 1, SELF PRUNING) gene family controls agronomically important traits in many crops. CETS genes encode phosphatidylethanolamine-binding proteins that have a central role in the timing of flowering as florigenic and anti-florigenic signals. The great expansion of CETS genes in many species suggests that the functions of this family go beyond flowering induction and repression. Here, we characterized the tomato SELF PRUNING 3C (SP3C) gene, and show that besides acting as a flowering repressor it also regulates seed germination and modulates root architecture. We show that loss of SP3C function in CRISPR/Cas9-generated mutant lines increases root length and reduces root side branching relative to the wild type. Higher SP3C expression in transgenic lines promotes the opposite effects in roots, represses seed germination, and also improves tolerance to water stress in seedlings. These discoveries provide new insights into the role of SP paralogs in agronomically relevant traits, and support future exploration of the involvement of CETS genes in abiotic stress responses.
Collapse
Affiliation(s)
| | - Alejandra Quiñones
- Departamento de Biologia Vegetal, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | | | - Jessenia M Robledo
- Departamento de Biologia Vegetal, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | - Shaun J Curtin
- United States Department of Agriculture, Plant Science Research Unit, St Paul, MN, USA
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA
- Center for Plant Precision Genomics, University of Minnesota, St. Paul, MN, USA
- Center for Genome Engineering, University of Minnesota, St. Paul, MN, USA
| | - Mateus H Vicente
- Departamento de Ciências Biológicas, Escola Superior de Agricultura 'Luiz de Queiroz', Universidade de São Paulo, Piracicaba, SP, Brazil
| | - Dimas M Ribeiro
- Departamento de Biologia Vegetal, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| | | | | | - Lázaro Eustáquio Pereira Peres
- Departamento de Ciências Biológicas, Escola Superior de Agricultura 'Luiz de Queiroz', Universidade de São Paulo, Piracicaba, SP, Brazil
| | - Magdalena Rossi
- Departamento de Botânica, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Agustin Zsögön
- Departamento de Biologia Vegetal, Universidade Federal de Viçosa, Viçosa, MG, Brazil
| |
Collapse
|
17
|
Zhou Y, Peng M, Yang B, Tong T, Zhang B, Tang N. scDLC: a deep learning framework to classify large sample single-cell RNA-seq data. BMC Genomics 2022; 23:504. [PMID: 35831808 PMCID: PMC9281153 DOI: 10.1186/s12864-022-08715-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/21/2022] [Indexed: 11/10/2022] Open
Abstract
Background Using single-cell RNA sequencing (scRNA-seq) data to diagnose disease is an effective technique in medical research. Several statistical methods have been developed for the classification of RNA sequencing (RNA-seq) data, including, for example, Poisson linear discriminant analysis (PLDA), negative binomial linear discriminant analysis (NBLDA), and zero-inflated Poisson logistic discriminant analysis (ZIPLDA). Nevertheless, few existing methods perform well for large sample scRNA-seq data, in particular when the distribution assumption is also violated. Results We propose a deep learning classifier (scDLC) for large sample scRNA-seq data, based on the long short-term memory recurrent neural networks (LSTMs). Our new scDLC does not require a prior knowledge on the data distribution, but instead, it takes into account the dependency of the most outstanding feature genes in the LSTMs model. LSTMs is a special recurrent neural network, which can learn long-term dependencies of a sequence. Conclusions Simulation studies show that our new scDLC performs consistently better than the existing methods in a wide range of settings with large sample sizes. Four real scRNA-seq datasets are also analyzed, and they coincide with the simulation results that our new scDLC always performs the best. The code named “scDLC” is publicly available at https://github.com/scDLC-code/code. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08715-1).
Collapse
Affiliation(s)
- Yan Zhou
- College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, China
| | - Minjiao Peng
- College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, China
| | - Bin Yang
- College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Shenzhen University, Shenzhen, China
| | - Tiejun Tong
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | - Baoxue Zhang
- School of Statistics, Capital University of Economics and Business, Beijing, China
| | - Niansheng Tang
- Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming, China.
| |
Collapse
|
18
|
Nueda MJ, Gandía C, Molina MD. LPDA: A new classification method based on linear programming. PLoS One 2022; 17:e0270403. [PMID: 35797275 PMCID: PMC9262202 DOI: 10.1371/journal.pone.0270403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 06/09/2022] [Indexed: 11/28/2022] Open
Abstract
The search of separation hyperplanes is an efficient way to find rules with classification purposes. This paper presents an alternative mathematical programming formulation to existing methods to find a discriminant hyperplane. The hyperplane H is found by minimizing the sum of all the distances to the area assigned to the group each individual belongs to. It results in a convex optimization problem for which we find an equivalent linear programming problem. We demonstrate that H exists when the centroids of the two groups are not equal. The method is effective dealing with low and high dimensional data where reduction of the dimension is proposed to avoid overfitting problems. We show the performance of this approach with different data sets and comparisons with other classifications methods. The method is called LPDA and it is implemented in a R package available in https://github.com/mjnueda/lpda.
Collapse
Affiliation(s)
- María J. Nueda
- Mathematics Department, University of Alicante, Alicante, Spain
- * E-mail:
| | - Carmen Gandía
- Mathematics Department, University of Alicante, Alicante, Spain
| | | |
Collapse
|
19
|
Rahman T, Huang HE, Li Y, Tai AS, Hseih WP, McClung CA, Tseng G. A sparse negative binomial classifier with covariate adjustment for RNA-seq data. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Tanbin Rahman
- Department of Biostatistics, University of Pittsburgh
| | - Hsin-En Huang
- Institute of Statistics, National Tsing Hua University
| | - Yujia Li
- Department of Biostatistics, University of Pittsburgh
| | - An-Shun Tai
- Institute of Statistics, National Tsing Hua University
| | | | | | - George Tseng
- Department of Biostatistics, University of Pittsburgh
| |
Collapse
|
20
|
Corsini N, Viroli C. Dealing with overdispersion in multivariate count data. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
21
|
Lee E, Guan P, Lim AH, Loh JW, Tan GF, Loh T, Ng DYX, Lee JY, Goh S, Liu W, Ng CCY, Teh BT, Chan JY. Multiregion sequencing of sarcomatoid renal cell carcinoma arising from autosomal dominant polycystic kidney disease. Mol Genet Genomic Med 2022; 10:e1853. [PMID: 35122417 PMCID: PMC8922955 DOI: 10.1002/mgg3.1853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/26/2021] [Accepted: 12/14/2021] [Indexed: 11/24/2022] Open
Abstract
Background Autosomal dominant polycystic kidney disease (ADPKD) is an inherited cystic kidney disease associated with a spectrum of various renal and extrarenal manifestations, including increased risk of kidney cancers. Here, we present the initial molecular description of sarcomatoid renal cell carcinoma (sRCC) arising in the setting of ADPKD. Methods Multiregion whole‐exome sequencing and whole transcriptomic sequencing were used to examine intratumoral molecular heterogeneity among histologically‐distinct spindle (sarcomatoid), epithelioid, or biphasic compartments within the tumor and compared with the non‐malignant ADPKD component. Results Spindle and biphasic components harbored several overlapping driver gene mutations, but do not share any with the epithelioid component. Mutations in ATM, CTNNB1, and NF2 were present only in the biphasic and spindle components, while mutations in BID, FLT3, ARID1B, and SMARCA2 were present only in the epithelioid component. We observed dichotomous evolutionary pathways in the development of epithelioid and spindle compartments, involving early mutations in TP53 and ATM/CTNNB1/NF2 respectively. Wnt, PI3K‐mTOR, and MAPK signaling pathways, known key mechanisms involved in ADPKD development, featured prominently in the sarcomatoid component. Conclusion This highlights that common pro‐oncogenic signals are present between ADPKD and sRCC providing insights into their shared pathobiology.
Collapse
Affiliation(s)
- Elizabeth Lee
- Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore.,Laboratory of Cancer Epigenome, National Cancer Centre Singapore, Singapore, Singapore
| | - Peiyong Guan
- Laboratory of Cancer Epigenome, National Cancer Centre Singapore, Singapore, Singapore.,Laboratory of Biodiversity Genomics, Genome Institute of Singapore, Singapore, Singapore.,Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore
| | - Abner Herbert Lim
- Laboratory of Cancer Epigenome, National Cancer Centre Singapore, Singapore, Singapore.,Cancer Discovery Hub, National Cancer Centre Singapore, Singapore, Singapore
| | - Jui Wan Loh
- Laboratory of Cancer Epigenome, National Cancer Centre Singapore, Singapore, Singapore.,Cancer Discovery Hub, National Cancer Centre Singapore, Singapore, Singapore
| | - Grace Fangmin Tan
- Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore
| | - Tracy Loh
- Department of Anatomical Pathology, Singapore General Hospital, Singapore, Singapore
| | - Dave Yong Xiang Ng
- Laboratory of Cancer Epigenome, National Cancer Centre Singapore, Singapore, Singapore.,Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore
| | - Jing Yi Lee
- Laboratory of Cancer Epigenome, National Cancer Centre Singapore, Singapore, Singapore
| | - Shane Goh
- Cancer Discovery Hub, National Cancer Centre Singapore, Singapore, Singapore
| | - Wei Liu
- Cancer Discovery Hub, National Cancer Centre Singapore, Singapore, Singapore
| | | | - Bin Tean Teh
- Laboratory of Cancer Epigenome, National Cancer Centre Singapore, Singapore, Singapore.,Laboratory of Biodiversity Genomics, Genome Institute of Singapore, Singapore, Singapore.,Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore.,Institute of Molecular and Cellular Biology, ASTAR, Singapore, Singapore.,Oncology Academic Clinical Program, Duke-NUS Medical School, Singapore, Singapore
| | - Jason Yongsheng Chan
- Division of Medical Oncology, National Cancer Centre Singapore, Singapore, Singapore.,Cancer Discovery Hub, National Cancer Centre Singapore, Singapore, Singapore.,Oncology Academic Clinical Program, Duke-NUS Medical School, Singapore, Singapore
| |
Collapse
|
22
|
De Nittis P, Efthymiou S, Sarre A, Guex N, Chrast J, Putoux A, Sultan T, Raza Alvi J, Ur Rahman Z, Zafar F, Rana N, Rahman F, Anwar N, Maqbool S, Zaki MS, Gleeson JG, Murphy D, Galehdari H, Shariati G, Mazaheri N, Sedaghat A, Lesca G, Chatron N, Salpietro V, Christoforou M, Houlden H, Simonds WF, Pedrazzini T, Maroofian R, Reymond A. Inhibition of G-protein signalling in cardiac dysfunction of intellectual developmental disorder with cardiac arrhythmia (IDDCA) syndrome. J Med Genet 2021; 58:815-831. [PMID: 33172956 PMCID: PMC8639930 DOI: 10.1136/jmedgenet-2020-107015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 08/30/2020] [Accepted: 09/04/2020] [Indexed: 11/16/2022]
Abstract
BACKGROUND Pathogenic variants of GNB5 encoding the β5 subunit of the guanine nucleotide-binding protein cause IDDCA syndrome, an autosomal recessive neurodevelopmental disorder associated with cognitive disability and cardiac arrhythmia, particularly severe bradycardia. METHODS We used echocardiography and telemetric ECG recordings to investigate consequences of Gnb5 loss in mouse. RESULTS We delineated a key role of Gnb5 in heart sinus conduction and showed that Gnb5-inhibitory signalling is essential for parasympathetic control of heart rate (HR) and maintenance of the sympathovagal balance. Gnb5-/- mice were smaller and had a smaller heart than Gnb5+/+ and Gnb5+/- , but exhibited better cardiac function. Lower autonomic nervous system modulation through diminished parasympathetic control and greater sympathetic regulation resulted in a higher baseline HR in Gnb5-/- mice. In contrast, Gnb5-/- mice exhibited profound bradycardia on treatment with carbachol, while sympathetic modulation of the cardiac stimulation was not altered. Concordantly, transcriptome study pinpointed altered expression of genes involved in cardiac muscle contractility in atria and ventricles of knocked-out mice. Homozygous Gnb5 loss resulted in significantly higher frequencies of sinus arrhythmias. Moreover, we described 13 affected individuals, increasing the IDDCA cohort to 44 patients. CONCLUSIONS Our data demonstrate that loss of negative regulation of the inhibitory G-protein signalling causes HR perturbations in Gnb5-/- mice, an effect mainly driven by impaired parasympathetic activity. We anticipate that unravelling the mechanism of Gnb5 signalling in the autonomic control of the heart will pave the way for future drug screening.
Collapse
Affiliation(s)
| | - Stephanie Efthymiou
- Department of Neuromuscular Disorders, Queen Square Institute of Neurology, University College London, London, UK
| | - Alexandre Sarre
- Cardiovascular Assessment Facility, University of Lausanne, Lausanne, Switzerland
| | - Nicolas Guex
- Bioinformatics Competence Center, University of Lausanne, Lausanne, Switzerland
| | - Jacqueline Chrast
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Audrey Putoux
- Service de Génétique, Hopital Femme Mere Enfant, Bron, France
| | - Tipu Sultan
- Department of Pediatric Neurology, The Children's Hospital and Institute of Child Health, Lahore, Pakistan
| | - Javeria Raza Alvi
- Department of Pediatric Neurology, The Children's Hospital and Institute of Child Health, Lahore, Pakistan
| | - Zia Ur Rahman
- Department of Pediatric Neurology, The Children's Hospital and Institute of Child Health, Lahore, Pakistan
| | - Faisal Zafar
- Department of Paediatric Neurology, Children's Hospital and Institute of Child Health, Multan, Pakistan
| | - Nuzhat Rana
- Department of Paediatric Neurology, Children's Hospital and Institute of Child Health, Multan, Pakistan
| | - Fatima Rahman
- Department of Developmental-Behavioural Paediatrics, The Children's Hospital and Institute of Child Health, Lahore, Pakistan
| | - Najwa Anwar
- Department of Developmental-Behavioural Paediatrics, The Children's Hospital and Institute of Child Health, Lahore, Pakistan
| | - Shazia Maqbool
- Department of Developmental-Behavioural Paediatrics, The Children's Hospital and Institute of Child Health, Lahore, Pakistan
| | - Maha S Zaki
- Clinical Genetics Department, Human Genetics and Genome Research Division, National Research Centre, Cairo, Egypt
| | - Joseph G Gleeson
- Department of Neuroscience and Pediatrics, Howard Hughes Medical Institute, La Jolla, California, USA
| | - David Murphy
- Department of Neuromuscular Disorders, Queen Square Institute of Neurology, University College London, London, UK
| | - Hamid Galehdari
- Department of Genetics, Faculty of Science, Shahid Chamran University of Ahvaz, Ahwaz, Iran (the Islamic Republic of)
| | - Gholamreza Shariati
- Department of Medical Genetics, Faculty of Medicine, Ahvaz Jondishapour University of Medical Sciences, Ahvaz, Iran (the Islamic Republic of)
| | - Neda Mazaheri
- Department of Genetics, Faculty of Science, Shahid Chamran University of Ahvaz, Ahwaz, Iran (the Islamic Republic of)
| | - Alireza Sedaghat
- Health Research Institute, Diabetes Research Center, Ahvaz Jundishapur University of medical Sciences, Ahvaz, Iran (the Islamic Republic of)
| | - Gaetan Lesca
- Service de Genetique, Hospices Civils de Lyon, Lyon, France
| | - Nicolas Chatron
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- Service de Genetique, Hospices Civils de Lyon, Lyon, France
| | - Vincenzo Salpietro
- Department of Neuromuscular Disorders, Queen Square Institute of Neurology, University College London, London, UK
| | - Marilena Christoforou
- Department of Neuromuscular Disorders, Queen Square Institute of Neurology, University College London, London, UK
| | - Henry Houlden
- Department of Neuromuscular Disorders, Queen Square Institute of Neurology, University College London, London, UK
| | - William F Simonds
- Metabolic Diseases Branch/NIDDK, National Institutes of Health, Bethesda, MD, USA
| | - Thierry Pedrazzini
- Experimental Cardiology Unit, Department of Cardiovascular Medicine, University of Lausanne, Lausanne, Switzerland
| | - Reza Maroofian
- Department of Neuromuscular Disorders, Queen Square Institute of Neurology, University College London, London, UK
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
23
|
Luo G, Shen L, Zhao S, Li R, Song Y, Song S, Yu K, Yang W, Li X, Sun J, Wang Y, Gao C, Liu D, Zhang A. Genome-wide identification of seed storage protein gene regulators in wheat through coexpression analysis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 108:1704-1720. [PMID: 34634158 DOI: 10.1111/tpj.15538] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 09/27/2021] [Indexed: 12/31/2022]
Abstract
Only a few transcriptional regulators of seed storage protein (SSP) genes have been identified in common wheat (Triticum aestivum L.). Coexpression analysis could be an efficient approach to characterize novel transcriptional regulators at the genome-scale considering the correlated expression between transcriptional regulators and target genes. As the A genome donor of common wheat, Triticum urartu is more suitable for coexpression analysis than common wheat considering the diploid genome and single gene copy. In this work, the transcriptome dynamics in endosperm of T. urartu throughout grain filling were revealed by RNA-Seq analysis. In the coexpression analysis, a total of 71 transcription factors (TFs) from 23 families were found to be coexpressed with SSP genes. Among these TFs, TuNAC77 enhanced the transcription of SSP genes by binding to cis-elements distributed in promoters. The homolog of TuNAC77 in common wheat, TaNAC77, shared an identical function, and the total SSPs were reduced by about 24% in common wheat when TaNAC77 was knocked down. This is the first genome-wide identification of transcriptional regulators of SSP genes in wheat, and the newly characterized transcriptional regulators will undoubtedly expand our knowledge of the transcriptional regulation of SSP synthesis.
Collapse
Affiliation(s)
- Guangbin Luo
- State Key Laboratory of Plant Cell and Chromosome Engineering, National Center for Plant Gene Research, Institute of Genetics and Developmental Biology/Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 West Beichen Road, Chaoyang District, Beijing, 100101, China
| | - Lisha Shen
- State Key Laboratory of Plant Cell and Chromosome Engineering, National Center for Plant Gene Research, Institute of Genetics and Developmental Biology/Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 West Beichen Road, Chaoyang District, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Shancen Zhao
- BGI Institute of Applied Agriculture, BGI-Shenzhen, Shenzhen, 518120, China
| | - Ruidong Li
- Graduate Program in Genetics, Genomics and Bioinformatics, University of California, Riverside, CA, USA
| | - Yanhong Song
- State Key Laboratory of Plant Cell and Chromosome Engineering, National Center for Plant Gene Research, Institute of Genetics and Developmental Biology/Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 West Beichen Road, Chaoyang District, Beijing, 100101, China.,College of Agronomy, The Collaborative Innovation Center of Grain Crops in Henan, Henan Agricultural University, 63 Nongye Road, Zhengzhou, 450002, China
| | - Shuyi Song
- State Key Laboratory of Plant Cell and Chromosome Engineering, National Center for Plant Gene Research, Institute of Genetics and Developmental Biology/Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 West Beichen Road, Chaoyang District, Beijing, 100101, China.,College of Agronomy, The Collaborative Innovation Center of Grain Crops in Henan, Henan Agricultural University, 63 Nongye Road, Zhengzhou, 450002, China
| | - Kang Yu
- BGI Institute of Applied Agriculture, BGI-Shenzhen, Shenzhen, 518120, China
| | - Wenlong Yang
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Xin Li
- State Key Laboratory of Plant Cell and Chromosome Engineering, National Center for Plant Gene Research, Institute of Genetics and Developmental Biology/Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 West Beichen Road, Chaoyang District, Beijing, 100101, China
| | - Jiazhu Sun
- State Key Laboratory of Plant Cell and Chromosome Engineering, National Center for Plant Gene Research, Institute of Genetics and Developmental Biology/Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 West Beichen Road, Chaoyang District, Beijing, 100101, China
| | - Yanpeng Wang
- State Key Laboratory of Plant Cell and Chromosome Engineering, National Center for Plant Gene Research, Institute of Genetics and Developmental Biology/Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 West Beichen Road, Chaoyang District, Beijing, 100101, China
| | - Caixia Gao
- State Key Laboratory of Plant Cell and Chromosome Engineering, National Center for Plant Gene Research, Institute of Genetics and Developmental Biology/Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 West Beichen Road, Chaoyang District, Beijing, 100101, China
| | - Dongcheng Liu
- State Key Laboratory of North China Crop Improvement and Regulation, College of Agronomy, Hebei Agricultural University, Baoding, Hebei, 071000, China
| | - Aimin Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, National Center for Plant Gene Research, Institute of Genetics and Developmental Biology/Innovative Academy of Seed Design, Chinese Academy of Sciences, 1 West Beichen Road, Chaoyang District, Beijing, 100101, China.,University of Chinese Academy of Sciences, Beijing, 100049, China.,State Key Laboratory of North China Crop Improvement and Regulation, College of Agronomy, Hebei Agricultural University, Baoding, Hebei, 071000, China
| |
Collapse
|
24
|
Affiliation(s)
- Emily M. Goren
- Department of Statistics Iowa State University Ames Iowa USA
- Now at Seagen Bothell WA USA
| | - Ranjan Maitra
- Department of Statistics Iowa State University Ames Iowa USA
| |
Collapse
|
25
|
Li CZ, Kawaguchi ES, Li G. A New ℓ0-Regularized Log-Linear Poisson Graphical Model with Applications to RNA Sequencing Data. J Comput Biol 2021; 28:880-891. [PMID: 34375132 PMCID: PMC8558075 DOI: 10.1089/cmb.2020.0558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
In this article, we develop a new ℓ 0 -based sparse Poisson graphical model with applications to gene network inference from RNA-seq gene expression count data. Assuming a pair-wise Markov property, we propose to fit a separate broken adaptive ridge-regularized log-linear Poisson regression on each node to evaluate the conditional, instead of marginal, association between two genes in the presence of all other genes. The resulting sparse gene networks are generally more accurate than those generated by the ℓ 1 -regularized Poisson graphical model as demonstrated by our empirical studies. A real data illustration is given on a kidney renal clear cell carcinoma micro-RNA-seq data from the Cancer Genome Atlas.
Collapse
Affiliation(s)
- Caesar Z. Li
- Department of Biostatistics, School of Public Health, University of California at Los Angeles, Los Angeles, California, USA
| | - Eric S. Kawaguchi
- Graduate Programs in Biostatistics and Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Gang Li
- Department of Biostatistics, School of Public Health, University of California at Los Angeles, Los Angeles, California, USA
| |
Collapse
|
26
|
Baker DN, Dyjack N, Braverman V, Hicks SC, Langmead B. Fast and memory-efficient scRNA-seq k-means clustering with various distances. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2021; 2021:24. [PMID: 34778889 PMCID: PMC8586878 DOI: 10.1145/3459930.3469523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Single-cell RNA-sequencing (scRNA-seq) analyses typically begin by clustering a gene-by-cell expression matrix to empirically define groups of cells with similar expression profiles. We describe new methods and a new open source library, minicore, for efficient k-means++ center finding and k-means clustering of scRNA-seq data. Minicore works with sparse count data, as it emerges from typical scRNA-seq experiments, as well as with dense data from after dimensionality reduction. Minicore's novel vectorized weighted reservoir sampling algorithm allows it to find initial k-means++ centers for a 4-million cell dataset in 1.5 minutes using 20 threads. Minicore can cluster using Euclidean distance, but also supports a wider class of measures like Jensen-Shannon Divergence, Kullback-Leibler Divergence, and the Bhattachaiyya distance, which can be directly applied to count data and probability distributions. Further, minicore produces lower-cost centerings more efficiently than scikit-learn for scRNA-seq datasets with millions of cells. With careful handling of priors, minicore implements these distance measures with only minor (<2-fold) speed differences among all distances. We show that a minicore pipeline consisting of k-means++, localsearch++ and mini-batch k-means can cluster a 4-million cell dataset in minutes, using less than 10GiB of RAM. This memory-efficiency enables atlas-scale clustering on laptops and other commodity hardware. Finally, we report findings on which distance measures give clusterings that are most consistent with known cell type labels.
Collapse
Affiliation(s)
- Daniel N Baker
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Nathan Dyjack
- Department of Biostatistics, Johns Hopkins University, Bloomberg, School of Public Health, Baltimore, MD, USA
| | - Vladimir Braverman
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins University, Bloomberg, School of Public Health, Baltimore, MD, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
27
|
Roh H, Kim N, Lee Y, Park J, Kim BS, Lee MK, Park CI, Kim DH. Dual-Organ Transcriptomic Analysis of Rainbow Trout Infected With Ichthyophthirius multifiliis Through Co-Expression and Machine Learning. Front Immunol 2021; 12:677730. [PMID: 34305907 PMCID: PMC8296305 DOI: 10.3389/fimmu.2021.677730] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 05/31/2021] [Indexed: 01/16/2023] Open
Abstract
Ichthyophthirius multifiliis is a major pathogen that causes a high mortality rate in trout farms. However, systemic responses to the pathogen and its interactions with multiple organs during the course of infection have not been well described. In this study, dual-organ transcriptomic responses in the liver and head kidney and hemato-serological indexes were profiled under I. multifiliis infection and recovery to investigate systemic immuno-physiological characteristics. Several strategies for massive transcriptomic interpretation, such as differentially expressed genes (DEGs), Poisson linear discriminant (PLDA), and weighted gene co-expression network analysis (WGCNA) models were used to investigate the featured genes/pathways while minimizing the disadvantages of individual methods. During the course of infection, 6,097 and 2,931 DEGs were identified in the head kidney and liver, respectively. Markers of protein processing in the endoplasmic reticulum, oxidative phosphorylation, and the proteasome were highly expressed. Likewise, simultaneous ferroptosis and cellular reconstruction was observed, which is strongly linked to multiple organ dysfunction. In contrast, pathways relevant to cellular replication were up-regulated in only the head kidney, while endocytosis- and phagosome-related pathways were notably expressed in the liver. Moreover, interestingly, most immune-relevant pathways (e.g., leukocyte trans-endothelial migration, Fc gamma R-mediated phagocytosis) were highly activated in the liver, but the same pathways in the head kidney were down-regulated. These conflicting results from different organs suggest that interpretation of co-expression among organs is crucial for profiling of systemic responses during infection. The dual-organ transcriptomics approaches presented in this study will greatly contribute to our understanding of multi-organ interactions under I. multifiliis infection from a broader perspective.
Collapse
Affiliation(s)
- HyeongJin Roh
- Department of Aquatic Life Medicine, College of Fisheries Science, Pukyong National University, Busan, South Korea
| | - Nameun Kim
- Department of Aquatic Life Medicine, College of Fisheries Science, Pukyong National University, Busan, South Korea
| | - Yoonhang Lee
- Department of Aquatic Life Medicine, College of Fisheries Science, Pukyong National University, Busan, South Korea
| | - Jiyeon Park
- Department of Aquatic Life Medicine, College of Fisheries Science, Pukyong National University, Busan, South Korea
| | - Bo Seong Kim
- Aquatic Disease Control Division, National Institute of Fisheries Science (NIFS), Busan, South Korea
| | - Mu Kun Lee
- Korean Aquatic Organism Disease Inspector Association, Busan, South Korea
| | - Chan-Il Park
- Department of Marine Biology & Aquaculture, College of Marine Science, Gyeongsang National University, Tongyeong, South Korea
| | - Do-Hyung Kim
- Department of Aquatic Life Medicine, College of Fisheries Science, Pukyong National University, Busan, South Korea
| |
Collapse
|
28
|
Li Q, Zhang M, Xie Y, Xiao G. Bayesian Modeling of Spatial Molecular Profiling Data via Gaussian Process. Bioinformatics 2021; 37:4129-4136. [PMID: 34146105 PMCID: PMC9502169 DOI: 10.1093/bioinformatics/btab455] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 05/29/2021] [Accepted: 06/16/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION The location, timing, and abundance of gene expression (both mRNA and proteins) within a tissue define the molecular mechanisms of cell functions. Recent technology breakthroughs in spatial molecular profiling, including imaging-based technologies and sequencing-based technologies, have enabled the comprehensive molecular characterization of single cells while preserving their spatial and morphological contexts. This new bioinformatics scenario calls for effective and robust computational methods to identify genes with spatial patterns. RESULTS We represent a novel Bayesian hierarchical model to analyze spatial transcriptomics data, with several unique characteristics. It models the zero-inflated and over-dispersed counts by deploying a zero-inflated negative binomial model that greatly increases model stability and robustness. Besides, the Bayesian inference framework allows us to borrow strength in parameter estimation in a de novo fashion. As a result, the proposed model shows competitive performances in accuracy and robustness over existing methods in both simulation studies and two real data applications. AVAILABILITY The related R/C ++ source code is available at https://github.com/Minzhe/BOOST-GP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiwei Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, USA
| | - Minzhe Zhang
- Quantitative Biology Research Center, Department of Population and Data Sciences, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Yang Xie
- Quantitative Biology Research Center, Department of Population and Data Sciences, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Guanghua Xiao
- Quantitative Biology Research Center, Department of Population and Data Sciences, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
29
|
Tripp JA, Berrio A, McGraw LA, Matz MV, Davis JK, Inoue K, Thomas JW, Young LJ, Phelps SM. Comparative neurotranscriptomics reveal widespread species differences associated with bonding. BMC Genomics 2021; 22:399. [PMID: 34058981 PMCID: PMC8165761 DOI: 10.1186/s12864-021-07720-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 04/20/2021] [Indexed: 11/28/2022] Open
Abstract
Background Pair bonding with a reproductive partner is rare among mammals but is an important feature of human social behavior. Decades of research on monogamous prairie voles (Microtus ochrogaster), along with comparative studies using the related non-bonding meadow vole (M. pennsylvanicus), have revealed many of the neural and molecular mechanisms necessary for pair-bond formation in that species. However, these studies have largely focused on just a few neuromodulatory systems. To test the hypothesis that neural gene expression differences underlie differential capacities to bond, we performed RNA-sequencing on tissue from three brain regions important for bonding and other social behaviors across bond-forming prairie voles and non-bonding meadow voles. We examined gene expression in the amygdala, hypothalamus, and combined ventral pallidum/nucleus accumbens in virgins and at three time points after mating to understand species differences in gene expression at baseline, in response to mating, and during bond formation. Results We first identified species and brain region as the factors most strongly associated with gene expression in our samples. Next, we found gene categories related to cell structure, translation, and metabolism that differed in expression across species in virgins, as well as categories associated with cell structure, synaptic and neuroendocrine signaling, and transcription and translation that varied among the focal regions in our study. Additionally, we identified genes that were differentially expressed across species after mating in each of our regions of interest. These include genes involved in regulating transcription, neuron structure, and synaptic plasticity. Finally, we identified modules of co-regulated genes that were strongly correlated with brain region in both species, and modules that were correlated with post-mating time points in prairie voles but not meadow voles. Conclusions These results reinforce the importance of pre-mating differences that confer the ability to form pair bonds in prairie voles but not promiscuous species such as meadow voles. Gene ontology analysis supports the hypothesis that pair-bond formation involves transcriptional regulation, and changes in neuronal structure. Together, our results expand knowledge of the genes involved in the pair bonding process and open new avenues of research in the molecular mechanisms of bond formation. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07720-0.
Collapse
Affiliation(s)
- Joel A Tripp
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, 78712, USA
| | - Alejandro Berrio
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, 78712, USA.,Present Address: Department of Biology, Duke University, Durham, NC, 27708, USA
| | - Lisa A McGraw
- Center for Translational Social Neuroscience, Department of Psychiatry and Behavioral Sciences, Yerkes National Primate Research Center, Emory University, Atlanta, GA, 30329, USA
| | - Mikhail V Matz
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, 78712, USA
| | - Jamie K Davis
- Centers for Disease Control and Prevention, Atlanta, GA, 30333, USA
| | - Kiyoshi Inoue
- Center for Translational Social Neuroscience, Department of Psychiatry and Behavioral Sciences, Yerkes National Primate Research Center, Emory University, Atlanta, GA, 30329, USA
| | - James W Thomas
- National Institutes of Health Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health, Rockville, MD, USA
| | - Larry J Young
- Center for Translational Social Neuroscience, Department of Psychiatry and Behavioral Sciences, Yerkes National Primate Research Center, Emory University, Atlanta, GA, 30329, USA
| | - Steven M Phelps
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
30
|
Zhou Y, Zhang L, Xu J, Zhang J, Yan X. Category encoding method to select feature genes for the classification of bulk and single-cell RNA-seq data. Stat Med 2021; 40:4077-4089. [PMID: 34028849 DOI: 10.1002/sim.9015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 02/26/2021] [Accepted: 04/13/2021] [Indexed: 11/08/2022]
Abstract
Bulk and single-cell RNA-seq (scRNA-seq) data are being used as alternatives to traditional technology in biology and medicine research. These data are used, for example, for the detection of differentially expressed (DE) genes. Several statistical methods have been developed for the classification of bulk and single-cell RNA-seq data. These feature genes are vitally important for the classification of bulk and single-cell RNA-seq data. The majority of genes are not DE and they are thus irrelevant for class distinction. To improve the classification performance and save the computation time, removal of irrelevant genes is necessary. Removal will aid the detection of the important feature genes. Widely used schemes in the literature, such as the BSS/WSS (BW) method, assume that data are normally distributed and may not be suitable for bulk and single-cell RNA-seq data. In this article, a category encoding (CAEN) method is proposed to select feature genes for bulk and single-cell RNA-seq data classification. This novel method encodes categories by employing the rank of sequence samples for each gene in each class. Correlation coefficients are considered for gene and class with the rank of sample and a new rank of category. The highest gene correlation coefficients are considered feature genes, which are the most effective for classifying bulk and single-cell RNA-seq dataset. The sure screening method was also established for rank consistency properties of the proposed CAEN method. Simulation studies show that the classifier using the proposed CAEN method performs better than, or at least as well as, the existing methods in most settings. Existing real datasets were analyzed, with the results demonstrating superior performance of the proposed method over current competitors. The application has been coded into an R package named "CAEN" to facilitate wide use.
Collapse
Affiliation(s)
- Yan Zhou
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Institute of Statistical Sciences, College of Mathematics and Statistics, Shenzhen University, Shenzhen, China
| | - Li Zhang
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Institute of Statistical Sciences, College of Mathematics and Statistics, Shenzhen University, Shenzhen, China
| | - Jinfeng Xu
- Department of Mathematics, Hong Kong University, Pokfulam, Hong Kong
| | - Jun Zhang
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, Institute of Statistical Sciences, College of Mathematics and Statistics, Shenzhen University, Shenzhen, China
| | - Xiaodong Yan
- Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, China
| |
Collapse
|
31
|
Zhu J, Yuan Z, Shu L, Liao W, Zhao M, Zhou Y. Selecting Classification Methods for Small Samples of Next-Generation Sequencing Data. Front Genet 2021; 12:642227. [PMID: 33747051 PMCID: PMC7969809 DOI: 10.3389/fgene.2021.642227] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 02/01/2021] [Indexed: 12/13/2022] Open
Abstract
Next-generation sequencing has emerged as an essential technology for the quantitative analysis of gene expression. In medical research, RNA sequencing (RNA-seq) data are commonly used to identify which type of disease a patient has. Because of the discrete nature of RNA-seq data, the existing statistical methods that have been developed for microarray data cannot be directly applied to RNA-seq data. Existing statistical methods usually model RNA-seq data by a discrete distribution, such as the Poisson, the negative binomial, or the mixture distribution with a point mass at zero and a Poisson distribution to further allow for data with an excess of zeros. Consequently, analytic tools corresponding to the above three discrete distributions have been developed: Poisson linear discriminant analysis (PLDA), negative binomial linear discriminant analysis (NBLDA), and zero-inflated Poisson logistic discriminant analysis (ZIPLDA). However, it is unclear what the real distributions would be for these classifications when applied to a new and real dataset. Considering that count datasets are frequently characterized by excess zeros and overdispersion, this paper extends the existing distribution to a mixture distribution with a point mass at zero and a negative binomial distribution and proposes a zero-inflated negative binomial logistic discriminant analysis (ZINBLDA) for classification. More importantly, we compare the above four classification methods from the perspective of model parameters, as an understanding of parameters is necessary for selecting the optimal method for RNA-seq data. Furthermore, we determine that the above four methods could transform into each other in some cases. Using simulation studies, we compare and evaluate the performance of these classification methods in a wide range of settings, and we also present a decision tree model created to help us select the optimal classifier for a new RNA-seq dataset. The results of the two real datasets coincide with the theory and simulation analysis results. The methods used in this work are implemented in the open-scource R scripts, with a source code freely available at https://github.com/FocusPaka/ZINBLDA.
Collapse
Affiliation(s)
- Jiadi Zhu
- Department of Mathematics and Statistics, Xidian University, Xi'an, China
| | - Ziyang Yuan
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, Shenzhen, China
| | - Lianjie Shu
- Faculty of Business Administration, University of Macau, Macau, China
| | - Wenhui Liao
- GuangDong University of Finance, Guangzhou, China
| | - Mingtao Zhao
- Institute of Statistics and Applied Mathematics, Anhui University of Finance and Economics, Bengbu, China
| | - Yan Zhou
- Shenzhen Key Laboratory of Advanced Machine Learning and Applications, College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, Shenzhen, China
| |
Collapse
|
32
|
Zhao X, Chen Y, Zhang L, Li Z, Wu X, Chen J, Wang F. Molecular cloning and biochemical characterization of a trehalose synthase from Myxococcus sp. strain V11. Protein Expr Purif 2021; 183:105865. [PMID: 33675938 DOI: 10.1016/j.pep.2021.105865] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/28/2021] [Accepted: 03/01/2021] [Indexed: 11/30/2022]
Abstract
The tresI gene of Myxococcus sp. strain V11 was cloned, and found to encode a trehalose synthase comprising 551 amino acids. The deduced molecular weight of the encoded TreS I protein 64.7 kDa and the isoelectric point (pI) was predicted to be 5.6. The catalytic cleft consists of the Asp202-Glu244-Asp310 catalytic triad and additional conserved residues. The recombinant (His)6-tag enzyme was expressed in Escherichia coli BL21(DE3) and purified by Ni2+-affinity chromatography, resulting in a specific activity of up to 172.7 U/mg. TLC and HPLC results confirmed that rTreS I can convert maltose into trehalose, with a yield of 61%. The KM and Vmax values of recombinant TreS I for maltose were 0.62 mM and 25.5 mM min-1 mg-1 protein, respectively. TreS I was optimally active at 35° and stable at temperatures of <25 °C. TreS I was stable within a narrow range of pH values, from 6.0 to 7.0. The enzymatic activity was slightly stimulated by Mg2+ and strongly inhibited by Fe3+, Co2+ and Cu2+. TreS I was also strongly inhibited by SDS and weakly by EDTA and TritonX-100.
Collapse
Affiliation(s)
- Xiaoyan Zhao
- College of Bioscience and Bioengineering, Jiangxi Agricultural University, Nanchang, 330045, PR China
| | - Yunda Chen
- College of Bioscience and Bioengineering, Jiangxi Agricultural University, Nanchang, 330045, PR China
| | - Lixia Zhang
- College of Bioscience and Bioengineering, Jiangxi Agricultural University, Nanchang, 330045, PR China
| | - Zhimin Li
- College of Bioscience and Bioengineering, Jiangxi Agricultural University, Nanchang, 330045, PR China
| | - Xiaoyu Wu
- College of Bioscience and Bioengineering, Jiangxi Agricultural University, Nanchang, 330045, PR China; Collaborative Innovation Center of Postharvest Key Technology and Quality Safety of Fruits and Vegetables in Jiangxi Province, Nanchang, 330045, PR China
| | - Jinyin Chen
- College of Bioscience and Bioengineering, Jiangxi Agricultural University, Nanchang, 330045, PR China; Collaborative Innovation Center of Postharvest Key Technology and Quality Safety of Fruits and Vegetables in Jiangxi Province, Nanchang, 330045, PR China
| | - Fei Wang
- College of Bioscience and Bioengineering, Jiangxi Agricultural University, Nanchang, 330045, PR China; Collaborative Innovation Center of Postharvest Key Technology and Quality Safety of Fruits and Vegetables in Jiangxi Province, Nanchang, 330045, PR China.
| |
Collapse
|
33
|
Schenkel LB, Molina JR, Swinger KK, Abo R, Blackwell DJ, Lu AZ, Cheung AE, Church WD, Kunii K, Kuplast-Barr KG, Majer CR, Minissale E, Mo JR, Niepel M, Reik C, Ren Y, Vasbinder MM, Wigle TJ, Richon VM, Keilhack H, Kuntz KW. A potent and selective PARP14 inhibitor decreases protumor macrophage gene expression and elicits inflammatory responses in tumor explants. Cell Chem Biol 2021; 28:1158-1168.e13. [PMID: 33705687 DOI: 10.1016/j.chembiol.2021.02.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 12/18/2020] [Accepted: 02/11/2021] [Indexed: 11/28/2022]
Abstract
PARP14 has been implicated by genetic knockout studies to promote protumor macrophage polarization and suppress the antitumor inflammatory response due to its role in modulating interleukin-4 (IL-4) and interferon-γ signaling pathways. Here, we describe structure-based design efforts leading to the discovery of a potent and highly selective PARP14 chemical probe. RBN012759 inhibits PARP14 with a biochemical half-maximal inhibitory concentration of 0.003 μM, exhibits >300-fold selectivity over all PARP family members, and its profile enables further study of PARP14 biology and disease association both in vitro and in vivo. Inhibition of PARP14 with RBN012759 reverses IL-4-driven protumor gene expression in macrophages and induces an inflammatory mRNA signature similar to that induced by immune checkpoint inhibitor therapy in primary human tumor explants. These data support an immune suppressive role of PARP14 in tumors and suggest potential utility of PARP14 inhibitors in the treatment of cancer.
Collapse
Affiliation(s)
- Laurie B Schenkel
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA; MOMA Therapeutics, Cambridge, MA 02142, USA
| | - Jennifer R Molina
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Kerren K Swinger
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA; Xilio Therapeutics, Waltham, MA 02451, USA
| | - Ryan Abo
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA; Obsidian Therapeutics, Cambridge, MA 02138, USA
| | - Danielle J Blackwell
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Alvin Z Lu
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Anne E Cheung
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA; A2Empowerment, Arlington, MA 02474, USA
| | - W David Church
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Kaiko Kunii
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Kristy G Kuplast-Barr
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Christina R Majer
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Elena Minissale
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Jan-Rung Mo
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Mario Niepel
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Christopher Reik
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA; Bain & Company, Boston, MA 02116, USA
| | - Yue Ren
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Melissa M Vasbinder
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Tim J Wigle
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Victoria M Richon
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA; Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Heike Keilhack
- Department of Biological Sciences, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA
| | - Kevin W Kuntz
- Department of Molecular Discovery, Ribon Therapeutics, Inc., Cambridge, MA 02140, USA.
| |
Collapse
|
34
|
Li Y, Zeng X, Lin CW, Tseng GC. Simultaneous estimation of cluster number and feature sparsity in high-dimensional cluster analysis. Biometrics 2021; 78:574-585. [PMID: 33621349 DOI: 10.1111/biom.13449] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 02/02/2021] [Accepted: 02/03/2021] [Indexed: 11/28/2022]
Abstract
Estimating the number of clusters (K) is a critical and often difficult task in cluster analysis. Many methods have been proposed to estimate K, including some top performers using resampling approach. When performing cluster analysis in high-dimensional data, simultaneous clustering and feature selection is needed for improved interpretation and performance. To our knowledge, little has been studied for simultaneous estimation of K and feature sparsity parameter in a high-dimensional exploratory cluster analysis. In this paper, we propose a resampling method to bridge this gap and evaluate its performance under the sparse K-means clustering framework. The proposed target function balances between sensitivity and specificity of clustering evaluation of pairwise subjects from clustering of full and subsampled data. Through extensive simulations, the method performs among the best over classical methods in estimating K in low-dimensional data. For high-dimensional simulation data, it also shows superior performance to simultaneously estimate K and feature sparsity parameter. Finally, we evaluated the methods in four microarray, two RNA-seq, one SNP, and two nonomics datasets. The proposed method achieves better clustering accuracy with fewer selected predictive genes in almost all real applications.
Collapse
Affiliation(s)
- Yujia Li
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Xiangrui Zeng
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Chien-Wei Lin
- Division of Biostatistics, Medical College of Wisconsin, Wauwatosa, Wisconsin
| | - George C Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
35
|
Lin KZ, Lei J, Roeder K. Exponential-Family Embedding With Application to Cell Developmental Trajectories for Single-Cell RNA-Seq Data. J Am Stat Assoc 2021; 116:457-470. [PMID: 34354320 PMCID: PMC8336573 DOI: 10.1080/01621459.2021.1886106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 09/11/2020] [Accepted: 11/29/2020] [Indexed: 10/22/2022]
Abstract
Scientists often embed cells into a lower-dimensional space when studying single-cell RNA-seq data for improved downstream analyses such as developmental trajectory analyses, but the statistical properties of such nonlinear embedding methods are often not well understood. In this article, we develop the exponential-family SVD (eSVD), a nonlinear embedding method for both cells and genes jointly with respect to a random dot product model using exponential-family distributions. Our estimator uses alternating minimization, which enables us to have a computationally efficient method, prove the identifiability conditions and consistency of our method, and provide statistically principled procedures to tune our method. All these qualities help advance the single-cell embedding literature, and we provide extensive simulations to demonstrate that the eSVD is competitive compared to other embedding methods. We apply the eSVD via Gaussian distributions where the standard deviations are proportional to the means to analyze a single-cell dataset of oligodendrocytes in mouse brains. Using the eSVD estimated embedding, we then investigate the cell developmental trajectories of the oligodendrocytes. While previous results are not able to distinguish the trajectories among the mature oligodendrocyte cell types, our diagnostics and results demonstrate there are two major developmental trajectories that diverge at mature oligodendrocytes. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplementary materials.
Collapse
Affiliation(s)
- Kevin Z. Lin
- Wharton Statistics Department, University of Pennsylvania, Philadelphia, PA
| | - Jing Lei
- Statistics & Data Science Department, Carnegie Mellon University, Pittsburgh, PA
| | - Kathryn Roeder
- Statistics & Data Science Department, Carnegie Mellon University, Pittsburgh, PA
| |
Collapse
|
36
|
Jiang Y, Li W, Lindsey-Boltz LA, Yang Y, Li Y, Sancar A. Super hotspots and super coldspots in the repair of UV-induced DNA damage in the human genome. J Biol Chem 2021; 296:100581. [PMID: 33771559 PMCID: PMC8081918 DOI: 10.1016/j.jbc.2021.100581] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 03/18/2021] [Accepted: 03/22/2021] [Indexed: 02/07/2023] Open
Abstract
The formation of UV-induced DNA damage and its repair are influenced by many factors that modulate lesion formation and the accessibility of repair machinery. However, it remains unknown which genomic sites are prioritized for immediate repair after UV damage induction, and whether these prioritized sites overlap with hotspots of UV damage. We identified the super hotspots subject to the earliest repair for (6-4) pyrimidine-pyrimidone photoproduct by using the eXcision Repair-sequencing (XR-seq) method. We further identified super coldspots for (6-4) pyrimidine-pyrimidone photoproduct repair and super hotspots for cyclobutane pyrimidine dimer repair by analyzing available XR-seq time-course data. By integrating datasets of XR-seq, Damage-seq, adductSeq, and cyclobutane pyrimidine dimer-seq, we show that neither repair super hotspots nor repair super coldspots overlap hotspots of UV damage. Furthermore, we demonstrate that repair super hotspots are significantly enriched in frequently interacting regions and superenhancers. Finally, we report our discovery of an enrichment of cytosine in repair super hotspots and super coldspots. These findings suggest that local DNA features together with large-scale chromatin features contribute to the orders of magnitude variability in the rates of UV damage repair.
Collapse
Affiliation(s)
- Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, USA; Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA; Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, USA.
| | - Wentao Li
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Laura A Lindsey-Boltz
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Yuchen Yang
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Yun Li
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, USA; Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA; Department of Computer Science, College of Arts and Sciences, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Aziz Sancar
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, USA; Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA.
| |
Collapse
|
37
|
Tarazona E, Lucas-Lledó JI, Carmona MJ, García-Roger EM. Gene expression in diapausing rotifer eggs in response to divergent environmental predictability regimes. Sci Rep 2020; 10:21366. [PMID: 33288800 PMCID: PMC7721884 DOI: 10.1038/s41598-020-77727-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 11/17/2020] [Indexed: 12/02/2022] Open
Abstract
In unpredictable environments in which reliable cues for predicting environmental variation are lacking, a diversifying bet-hedging strategy for diapause exit is expected to evolve, whereby only a portion of diapausing forms will resume development at the first occurrence of suitable conditions. This study focused on diapause termination in the rotifer Brachionus plicatilis s.s., addressing the transcriptional profile of diapausing eggs from environments differing in the level of predictability and the relationship of such profiles with hatching patterns. RNA-Seq analyses revealed significant differences in gene expression between diapausing eggs produced in the laboratory under combinations of two contrasting selective regimes of environmental fluctuation (predictable vs unpredictable) and two different diapause conditions (passing or not passing through forced diapause). The results showed that the selective regime was more important than the diapause condition in driving differences in the transcriptome profile. Most of the differentially expressed genes were upregulated in the predictable regime and mostly associated with molecular functions involved in embryo morphological development and hatching readiness. This was in concordance with observations of earlier, higher, and more synchronous hatching in diapausing eggs produced under the predictable regime.
Collapse
Affiliation(s)
- Eva Tarazona
- Institut Cavanilles de Biodiversitat I Biologia Evolutiva, Universitat de València, Valencia, Spain
| | - J Ignacio Lucas-Lledó
- Institut Cavanilles de Biodiversitat I Biologia Evolutiva, Universitat de València, Valencia, Spain
| | - María José Carmona
- Institut Cavanilles de Biodiversitat I Biologia Evolutiva, Universitat de València, Valencia, Spain
| | - Eduardo M García-Roger
- Institut Cavanilles de Biodiversitat I Biologia Evolutiva, Universitat de València, Valencia, Spain.
| |
Collapse
|
38
|
Yarahmadov T, Robinson S, Hanemian M, Pulver V, Kuhlemeier C. Identification of transcription factors controlling floral morphology in wild Petunia species with contrasting pollination syndromes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 104:289-301. [PMID: 32780443 PMCID: PMC7693086 DOI: 10.1111/tpj.14962] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 07/15/2020] [Indexed: 05/29/2023]
Abstract
Adaptation to different pollinators is an important driver of speciation in the angiosperms. Genetic approaches such as QTL mapping have been successfully used to identify the underlying speciation genes. However, these methods are often limited by widespread suppression of recombination due to divergence between species. While the mutations that caused the interspecific differences in floral color and scent have been elucidated in a variety of plant genera, the genes that are responsible for morphological differences remain mostly unknown. Differences in floral organ length determine the pollination efficiency of hawkmoths and hummingbirds, and therefore the genes that control these differences are potential speciation genes. Identifying such genes is challenging, especially in non-model species and when studying complex traits for which little prior genetic and biochemical knowledge is available. Here we combine transcriptomics with detailed growth analysis to identify candidate transcription factors underlying interspecific variation in the styles of Petunia flowers. Starting from a set of 2284 genes, stepwise filtering for expression in styles, differential expression between species, correlation with growth-related traits, allele-specific expression in interspecific hybrids, and/or high-impact polymorphisms resulted in a set of 43 candidate speciation genes. Validation by virus-induced gene silencing identified two MYB transcription factors, EOBI and EOBII, that were previously shown to regulate floral scent emission, a trait associated with pollination by hawkmoths.
Collapse
Affiliation(s)
- Tural Yarahmadov
- Institute of Plant SciencesUniversity of BernAltenbergrain 21BernCH‐3013Switzerland
- Department of BioMedical ResearchUniversity of BernBernCH‐3008Switzerland
| | - Sarah Robinson
- Institute of Plant SciencesUniversity of BernAltenbergrain 21BernCH‐3013Switzerland
- Sainsbury LaboratoryUniversity of CambridgeCambridgeCB2 1LRUK
| | - Mathieu Hanemian
- Institute of Plant SciencesUniversity of BernAltenbergrain 21BernCH‐3013Switzerland
- LIPMUniversité de ToulouseINRAECNRSCastanet‐TolosanFrance
| | - Valentin Pulver
- Institute of Plant SciencesUniversity of BernAltenbergrain 21BernCH‐3013Switzerland
| | - Cris Kuhlemeier
- Institute of Plant SciencesUniversity of BernAltenbergrain 21BernCH‐3013Switzerland
| |
Collapse
|
39
|
Denti F, Guindani M, Leisen F, Lijoi A, Wadsworth WD, Vannucci M. Two-group Poisson-Dirichlet mixtures for multiple testing. Biometrics 2020; 77:622-633. [PMID: 32535900 DOI: 10.1111/biom.13314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2019] [Revised: 05/21/2020] [Accepted: 05/22/2020] [Indexed: 11/26/2022]
Abstract
The simultaneous testing of multiple hypotheses is common to the analysis of high-dimensional data sets. The two-group model, first proposed by Efron, identifies significant comparisons by allocating observations to a mixture of an empirical null and an alternative distribution. In the Bayesian nonparametrics literature, many approaches have suggested using mixtures of Dirichlet Processes in the two-group model framework. Here, we investigate employing mixtures of two-parameter Poisson-Dirichlet Processes instead, and show how they provide a more flexible and effective tool for large-scale hypothesis testing. Our model further employs nonlocal prior densities to allow separation between the two mixture components. We obtain a closed-form expression for the exchangeable partition probability function of the two-group model, which leads to a straightforward Markov Chain Monte Carlo implementation. We compare the performance of our method for large-scale inference in a simulation study and illustrate its use on both a prostate cancer data set and a case-control microbiome study of the gastrointestinal tracts in children from underdeveloped countries who have been recently diagnosed with moderate-to-severe diarrhea.
Collapse
Affiliation(s)
- Francesco Denti
- Department of Statistics, University of California, Irvine, California
| | - Michele Guindani
- Department of Statistics, University of California, Irvine, California
| | - Fabrizio Leisen
- School of Mathematics, Statistics and Actuarial Sciences, University of Kent, Canterbury, UK
| | - Antonio Lijoi
- Department of Decision Sciences, Bocconi University, Milan, Italy.,Bocconi Institute of Data Science and Analytics (BIDSA), Milan, Italy
| | | | | |
Collapse
|
40
|
Vera JF, De Rooij M. A Latent Block Distance-Association Model for Profile by Profile Cross-Classified Categorical Data. MULTIVARIATE BEHAVIORAL RESEARCH 2020; 55:329-343. [PMID: 31352798 DOI: 10.1080/00273171.2019.1634995] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Distance association models constitute a useful tool for the analysis and graphical representation of cross-classified data in which distances between points inversely describe the association between two categorical variables. When the number of cells is large and the data counts result in sparse tables, the combination of clustering and representation reduces the number of parameters to be estimated and facilitates interpretation. In this article, a latent block distance-association model is proposed to apply block clustering to the outcomes of two categorical variables while the cluster centers are represented in a low dimensional space in terms of a distance-association model. This model is particularly useful for contingency tables in which both the rows and the columns are characterized as profiles of sets of response variables. The parameters are estimated under a Poisson sampling scheme using a generalized EM algorithm. The performance of the model is tested in a Monte Carlo experiment, and an empirical data set is analyzed to illustrate the model.
Collapse
Affiliation(s)
- J Fernando Vera
- Department of Statistics and O.R. Faculty of Sciences, University of Granada
| | - Mark De Rooij
- Methodology and Statistics Unit, Institute of Psychology, Leiden University
| |
Collapse
|
41
|
Butler III RR, Kozlova A, Zhang H, Zhang S, Streit M, Sanders AR, Laudanski K, Pang ZP, Gejman PV, Duan J. The Genetic Relevance of Human Induced Pluripotent Stem Cell-Derived Microglia to Alzheimer's Disease and Major Neuropsychiatric Disorders. MOLECULAR NEUROPSYCHIATRY 2020; 5:85-96. [PMID: 32399472 PMCID: PMC7206606 DOI: 10.1159/000501935] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 07/04/2019] [Indexed: 12/14/2022]
Abstract
Microglia are the primary innate immune cell type in the brain that have been implicated in the pathogenesis of several neurodegenerative and neuropsychiatric disorders, most notably Alzheimer's disease (AD) and schizophrenia. Microglia generated from human induced pluripotent stem cells (hiPSCs) represent a promising in vitro cellular model for studying the neuroimmune interactions involved in these disorders. Among several methods of generating -hiPSC-derived microglia (iMG) - varying in duration and resultant purity - a recent protocol by Brownjohn et al. [Stem Cell Reports. 2018 Apr;10(4):1294-307] is particularly simple and efficient. However, the replicability of this method, transcriptomic similarity of these iMG to primary adult microglia, and their genetic relevance to disease (i.e., enrichment of disease risk loci in genes preferentially expressed in these cells) remains unclear. Using two hiPSC lines, we demonstrated that Brownjohn's protocol can rapidly generate iMG that morphologically and functionally resembled microglia. The iMG cells we generated were found to be transcriptionally similar to previously reported iMG, as well as fetal and adult microglia. Furthermore, by using cell type-specific gene expression to partition disease heritability, we showed that iMG cells are genetically relevant to AD but found no significant enrichments of risk loci of Parkinson's disease, schizophrenia, major depressive disorder, bipolar disorder, autism spectrum disorder, or body mass index. Across a range of neuronal and immune cell types, we found only iMG, primary microglia, and microglia-like cell types exhibited a significant enrichment for AD heritability. Our results thus support the use of iMG as a human cellular model for understanding AD biology and underlying genetic factors, as well as for developing and efficiently screening new therapeutics.
Collapse
Affiliation(s)
- Robert R. Butler III
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, USA
- Department of Psychiatry of Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, USA
| | - Alena Kozlova
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, USA
- Department of Psychiatry of Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, USA
| | - Hanwen Zhang
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, USA
| | - Siwei Zhang
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, USA
- Department of Psychiatry of Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, USA
| | - Michael Streit
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, USA
| | - Alan R. Sanders
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, USA
- Department of Psychiatry of Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, USA
| | - Krzysztof Laudanski
- Department of Anesthesiology and Critical Care, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Institute for Translation Medicine and Therapeutics, The University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Zhiping P. Pang
- Department of Neuroscience and Cell Biology and Child Health Institute of New Jersey, Rutgers University, New Brunswick, New Jersey, USA
| | - Pablo V. Gejman
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, USA
- Department of Psychiatry of Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, USA
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, Illinois, USA
- Department of Psychiatry of Behavioral Neuroscience, The University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
42
|
Malovichko YV, Shtark OY, Vasileva EN, Nizhnikov AA, Antonets KS. Transcriptomic Insights into Mechanisms of Early Seed Maturation in the Garden Pea ( Pisum sativum L.). Cells 2020; 9:E779. [PMID: 32210065 PMCID: PMC7140803 DOI: 10.3390/cells9030779] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 03/20/2020] [Accepted: 03/21/2020] [Indexed: 02/07/2023] Open
Abstract
The garden pea (Pisum sativum L.) is a legume crop of immense economic value. Extensive breeding has led to the emergence of numerous pea varieties, of which some are distinguished by accelerated development in various stages of ontogenesis. One such trait is rapid seed maturation, which, despite novel insights into the genetic control of seed development in legumes, remains poorly studied. This article presents an attempt to dissect mechanisms of early maturation in the pea line Sprint-2 by means of whole transcriptome RNA sequencing in two developmental stages. By using a de novo assembly approach, we have obtained a reference transcriptome of 25,756 non-redundant entries expressed in pea seeds at either 10 or 20 days after pollination. Differential expression in Sprint-2 seeds has affected 13,056 transcripts. A comparison of the two pea lines with a common maturation rate demonstrates that while at 10 days after pollination, Sprint-2 seeds show development retardation linked to intensive photosynthesis, morphogenesis, and cell division, and those at 20 days show a rapid onset of desiccation marked by the cessation of translation and cell anabolism and accumulation of dehydration-protective and -storage moieties. Further inspection of certain transcript functional categories, including the chromatin constituent, transcription regulation, protein turnover, and hormonal regulation, has revealed transcriptomic trends unique to specific stages and cultivars. Among other remarkable features, Sprint-2 demonstrated an enhanced expression of transposable element-associated open reading frames and an altered expression of major maturation regulators and DNA methyltransferase genes. To the best of our knowledge, this is the first comparative transcriptomic study in which the issue of the seed maturation rate is addressed.
Collapse
Affiliation(s)
- Yury V. Malovichko
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), Podbelskogo sh., 3, Pushkin, 196608 St. Petersburg, Russia;
- Faculty of Biology, St. Petersburg State University, 199034 St. Petersburg, Russia;
| | - Oksana Y. Shtark
- Department of Biotechnology, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), Podbelskogo sh., 3, Pushkin, 196608 St. Petersburg, Russia;
| | - Ekaterina N. Vasileva
- Faculty of Biology, St. Petersburg State University, 199034 St. Petersburg, Russia;
- Department of Biotechnology, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), Podbelskogo sh., 3, Pushkin, 196608 St. Petersburg, Russia;
| | - Anton A. Nizhnikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), Podbelskogo sh., 3, Pushkin, 196608 St. Petersburg, Russia;
- Faculty of Biology, St. Petersburg State University, 199034 St. Petersburg, Russia;
| | - Kirill S. Antonets
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), Podbelskogo sh., 3, Pushkin, 196608 St. Petersburg, Russia;
- Faculty of Biology, St. Petersburg State University, 199034 St. Petersburg, Russia;
| |
Collapse
|
43
|
Wang C, Li J. SINC: a scale-invariant deep-neural-network classifier for bulk and single-cell RNA-seq data. Bioinformatics 2020; 36:1779-1784. [PMID: 31647523 DOI: 10.1093/bioinformatics/btz801] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 10/01/2019] [Accepted: 10/23/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Scaling by sequencing depth is usually the first step of analysis of bulk or single-cell RNA-seq data, but estimating sequencing depth accurately can be difficult, especially for single-cell data, risking the validity of downstream analysis. It is thus of interest to eliminate the use of sequencing depth and analyze the original count data directly. RESULTS We call an analysis method 'scale-invariant' (SI) if it gives the same result under different estimates of sequencing depth and hence can use the original count data without scaling. For the problem of classifying samples into pre-specified classes, such as normal versus cancerous, we develop a deep-neural-network based SI classifier named scale-invariant deep neural-network classifier (SINC). On nine bulk and single-cell datasets, the classification accuracy of SINC is better than or competitive to the best of eight other classifiers. SINC is easier to use and more reliable on data where proper sequencing depth is hard to determine. AVAILABILITY AND IMPLEMENTATION This source code of SINC is available at https://www.nd.edu/∼jli9/SINC.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chuanqi Wang
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
44
|
|
45
|
Koçhan N, Tutuncu GY, Smyth GK, Gandolfo LC, Giner G. qtQDA: quantile transformed quadratic discriminant analysis for high-dimensional RNA-seq data. PeerJ 2020; 7:e8260. [PMID: 31976167 PMCID: PMC6967023 DOI: 10.7717/peerj.8260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 11/20/2019] [Indexed: 11/26/2022] Open
Abstract
Classification on the basis of gene expression data derived from RNA-seq promises to become an important part of modern medicine. We propose a new classification method based on a model where the data is marginally negative binomial but dependent, thereby incorporating the dependence known to be present between measurements from different genes. The method, called qtQDA, works by first performing a quantile transformation (qt) then applying Gaussian quadratic discriminant analysis (QDA) using regularized covariance matrix estimates. We show that qtQDA has excellent performance when applied to real data sets and has advantages over some existing approaches. An R package implementing the method is also available on https://github.com/goknurginer/qtQDA.
Collapse
Affiliation(s)
- Necla Koçhan
- Department of Mathematics, Izmir University of Economics, Izmir, Turkey
| | - G Yazgi Tutuncu
- Department of Mathematics, Izmir University of Economics, Izmir, Turkey
| | - Gordon K Smyth
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia.,School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Luke C Gandolfo
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia.,School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Göknur Giner
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
46
|
Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2019; 20:295. [PMID: 31870412 PMCID: PMC6927135 DOI: 10.1186/s13059-019-1861-6] [Citation(s) in RCA: 219] [Impact Index Per Article: 43.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 10/15/2019] [Indexed: 12/23/2022] Open
Abstract
Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.
Collapse
Affiliation(s)
- F. William Townes
- Department of Biostatistics, Harvard University, Cambridge, MA USA
- Present Address: Department of Computer Science, Princeton University, Princeton, NJ USA
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Martin J. Aryee
- Department of Biostatistics, Harvard University, Cambridge, MA USA
- Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, MA USA
- Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA USA
- Department of Pathology, Harvard Medical School, Boston, MA USA
| | - Rafael A. Irizarry
- Department of Biostatistics, Harvard University, Cambridge, MA USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA
| |
Collapse
|
47
|
Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2019; 20:295. [PMID: 31870412 DOI: 10.1101/574574] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 10/15/2019] [Indexed: 05/24/2023] Open
Abstract
Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.
Collapse
Affiliation(s)
- F William Townes
- Department of Biostatistics, Harvard University, Cambridge, MA, USA
- Present Address: Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | - Martin J Aryee
- Department of Biostatistics, Harvard University, Cambridge, MA, USA
- Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, MA, USA
- Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard University, Cambridge, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
48
|
Jiang S, Xiao G, Koh AY, Kim J, Li Q, Zhan X. A Bayesian zero-inflated negative binomial regression model for the integrative analysis of microbiome data. Biostatistics 2019; 22:522-540. [PMID: 31844880 DOI: 10.1093/biostatistics/kxz050] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 10/07/2019] [Accepted: 10/09/2019] [Indexed: 12/13/2022] Open
Abstract
Microbiome omics approaches can reveal intriguing relationships between the human microbiome and certain disease states. Along with identification of specific bacteria taxa associated with diseases, recent scientific advancements provide mounting evidence that metabolism, genetics, and environmental factors can all modulate these microbial effects. However, the current methods for integrating microbiome data and other covariates are severely lacking. Hence, we present an integrative Bayesian zero-inflated negative binomial regression model that can both distinguish differentially abundant taxa with distinct phenotypes and quantify covariate-taxa effects. Our model demonstrates good performance using simulated data. Furthermore, we successfully integrated microbiome taxonomies and metabolomics in two real microbiome datasets to provide biologically interpretable findings. In all, we proposed a novel integrative Bayesian regression model that features bacterial differential abundance analysis and microbiome-covariate effects quantifications, which makes it suitable for general microbiome studies.
Collapse
Affiliation(s)
- Shuang Jiang
- Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA
| | - Guanghua Xiao
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Andrew Y Koh
- Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA and Department of Microbiology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Jiwoong Kim
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Qiwei Li
- Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, USA
| | - Xiaowei Zhan
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
49
|
Co-option of wing-patterning genes underlies the evolution of the treehopper helmet. Nat Ecol Evol 2019; 4:250-260. [DOI: 10.1038/s41559-019-1054-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 10/25/2019] [Indexed: 12/18/2022]
|
50
|
Paukszto L, Mikolajczyk A, Szeszko K, Smolinska N, Jastrzebski JP, Kaminski T. Transcription analysis of the response of the porcine adrenal cortex to a single subclinical dose of lipopolysaccharide from Salmonella Enteritidis. Int J Biol Macromol 2019; 141:1228-1245. [PMID: 31520703 DOI: 10.1016/j.ijbiomac.2019.09.067] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 09/03/2019] [Accepted: 09/09/2019] [Indexed: 12/20/2022]
Abstract
Lipopolysaccharide (LPS) is a bacterial endotoxin which can participate in the induction of inflammatory responses. LPS may also play a significant role in some neurodegenerative, oncological and metabolic disorders. The aim of the current study was to determine the effect of a subclinical low single dose of LPS from Salmonella Enteritidis administrated in vivo on the transcriptome of porcine adrenal cortex cells, especially gene expression levels, long non-coding RNA (lncRNA) profiles, alternative splicing events and RNA editing sites using RNA-seq technology. The subclinical dose of LPS changed the expression of 354 genes, 27 lncRNA loci and other unclassified RNAs. An analysis of alternative splicing events revealed 104 genes with differentially expressed splice junction sites, and the single nucleotide variant calling approach supported the identification of 376 canonical RNA editing candidates and 7249 allele-specific expression variants. The obtained results suggest that the RIG-I-like receptor signaling pathway, may play a more important role than the Toll-like signaling pathway after the administration of a subclinical dose of LPS. Single subclinical dose of LPS can affect the expression profiles of genes coding peptide hormones, steroidogenic enzymes and transcriptional factors, and modulate the endocrine functions of the gland.
Collapse
Affiliation(s)
- Lukasz Paukszto
- Department of Plant Physiology, Genetics and Biotechnology, Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Oczapowskiego 1A, 10-719 Olsztyn, Poland.
| | - Anita Mikolajczyk
- Department of Public Health, Faculty of Health Sciences, Collegium Medicum, University of Warmia and Mazury in Olsztyn, Warszawska 30, 10-082 Olsztyn, Poland.
| | - Karol Szeszko
- Department of Animal Anatomy and Physiology, Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Oczapowskiego 1A, 10-719 Olsztyn, Poland
| | - Nina Smolinska
- Department of Animal Anatomy and Physiology, Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Oczapowskiego 1A, 10-719 Olsztyn, Poland.
| | - Jan P Jastrzebski
- Department of Plant Physiology, Genetics and Biotechnology, Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Oczapowskiego 1A, 10-719 Olsztyn, Poland
| | - Tadeusz Kaminski
- Department of Animal Anatomy and Physiology, Faculty of Biology and Biotechnology, University of Warmia and Mazury in Olsztyn, Oczapowskiego 1A, 10-719 Olsztyn, Poland.
| |
Collapse
|