1
|
Qi G, Battle A. Computational methods for allele-specific expression in single cells. Trends Genet 2024:S0168-9525(24)00169-0. [PMID: 39127549 DOI: 10.1016/j.tig.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 08/12/2024]
Abstract
Allele-specific expression (ASE) is a powerful signal that can be used to investigate multiple molecular mechanisms, such as cis-regulatory effects and imprinting. Single-cell RNA-sequencing (scRNA-seq) enables ASE characterization at the resolution of individual cells. In this review, we highlight the computational methods for processing and analyzing single-cell ASE data. We first describe a bioinformatics pipeline to obtain ASE counts from raw reads synthesized from previous literature. We then discuss statistical methods for detecting allelic imbalance and its variability across conditions using scRNA-seq data. In addition, we describe other methods that use single-cell ASE to address specific biological questions. Finally, we discuss future directions and emphasize the need for an integrated, optimized bioinformatics pipeline, and further development of statistical methods for different technologies.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21205, USA.
| |
Collapse
|
2
|
Adduri A, Kim S. Ornaments for efficient allele-specific expression estimation with bias correction. Am J Hum Genet 2024; 111:1770-1781. [PMID: 39047729 PMCID: PMC11339617 DOI: 10.1016/j.ajhg.2024.06.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 06/22/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024] Open
Abstract
Allele-specific expression plays a crucial role in unraveling various biological mechanisms, including genomic imprinting and gene expression controlled by cis-regulatory variants. However, existing methods for quantification from RNA-sequencing (RNA-seq) reads do not adequately and efficiently remove various allele-specific read mapping biases, such as reference bias arising from reads containing the alternative allele that do not map to the reference transcriptome or ambiguous mapping bias caused by reads containing the reference allele that map differently from reads containing the alternative allele. We present Ornaments, a computational tool for rapid and accurate estimation of allele-specific transcript expression at unphased heterozygous loci from RNA-seq reads while correcting for allele-specific read mapping biases. Ornaments removes reference bias by mapping reads to a personalized transcriptome and ambiguous mapping bias by probabilistically assigning reads to multiple transcripts and variant loci they map to. Ornaments is a lightweight extension of kallisto, a popular tool for fast RNA-seq quantification, that improves the efficiency and accuracy of WASP, a popular tool for bias correction in allele-specific read mapping. In experiments with simulated and human lymphoblastoid cell-line RNA-seq reads with the genomes of the 1000 Genomes Project, we demonstrate that Ornaments improves the accuracy of WASP and kallisto, is nearly as efficient as kallisto, and is an order of magnitude faster than WASP per sample, with the additional cost of constructing a personalized index for multiple samples. Additionally, we show that Ornaments finds imprinted transcripts with higher sensitivity than WASP, which detects imprinted signals only at gene level.
Collapse
Affiliation(s)
- Abhinav Adduri
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Seyoung Kim
- Department of Epidemiology, University of Pittsburgh, Pittsburgh, PA 15261, USA.
| |
Collapse
|
3
|
Zhang J, Zhao H. eQTL studies: from bulk tissues to single cells. J Genet Genomics 2023; 50:925-933. [PMID: 37207929 PMCID: PMC10656365 DOI: 10.1016/j.jgg.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 05/02/2023] [Accepted: 05/04/2023] [Indexed: 05/21/2023]
Abstract
An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of specific genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to a better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detection of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.
Collapse
Affiliation(s)
- Jingfei Zhang
- Information Systems and Operations Management, Emory University, Atlanta, GA 30322, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 208034, USA.
| |
Collapse
|
4
|
Yoon JH, Kim S. Learning gene networks under SNP perturbation using SNP and allele-specific expression data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.23.563661. [PMID: 37961468 PMCID: PMC10634764 DOI: 10.1101/2023.10.23.563661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Allele-specific expression quantification from RNA-seq reads provides opportunities to study the control of gene regulatory networks by cis-acting and trans-acting genetic variants. Many existing methods performed a single-gene and single-SNP association analysis to identify expression quantitative trait loci (eQTLs), and placed the eQTLs against known gene networks for functional interpretation. Instead, we view eQTL data as a capture of the effects of perturbation of gene regulatory system by a large number of genetic variants and reconstruct a gene network perturbed by eQTLs. We introduce a statistical framework called CiTruss for simultaneously learning a gene network and cis-acting and trans-acting eQTLs that perturb this network, given population allele-specific expression and SNP data. CiTruss uses a multi-level conditional Gaussian graphical model to model trans-acting eQTLs perturbing the expression of both alleles in gene network at the top level and cis-acting eQTLs perturbing the expression of each allele at the bottom level. We derive a transformation of this model that allows efficient learning for large-scale human data. Our analysis of the GTEx and LG×SM advanced intercross line mouse data for multiple tissue types with CiTruss provides new insights into genetics of gene regulation. CiTruss revealed that gene networks consist of local subnetworks over proximally located genes and global subnetworks over genes scattered across genome, and that several aspects of gene regulation by eQTLs such as the impact of genetic diversity, pleiotropy, tissue-specific gene regulation, and local and long-range linkage disequilibrium among eQTLs can be explained through these local and global subnetworks.
Collapse
Affiliation(s)
- Jun Ho Yoon
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, United States of America
| | | |
Collapse
|
5
|
Qi G, Strober BJ, Popp JM, Keener R, Ji H, Battle A. Single-cell allele-specific expression analysis reveals dynamic and cell-type-specific regulatory effects. Nat Commun 2023; 14:6317. [PMID: 37813843 PMCID: PMC10562474 DOI: 10.1038/s41467-023-42016-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 09/27/2023] [Indexed: 10/11/2023] Open
Abstract
Differential allele-specific expression (ASE) is a powerful tool to study context-specific cis-regulation of gene expression. Such effects can reflect the interaction between genetic or epigenetic factors and a measured context or condition. Single-cell RNA sequencing (scRNA-seq) allows the measurement of ASE at individual-cell resolution, but there is a lack of statistical methods to analyze such data. We present Differential Allelic Expression using Single-Cell data (DAESC), a powerful method for differential ASE analysis using scRNA-seq from multiple individuals, with statistical behavior confirmed through simulation. DAESC accounts for non-independence between cells from the same individual and incorporates implicit haplotype phasing. Application to data from 105 induced pluripotent stem cell (iPSC) lines identifies 657 genes dynamically regulated during endoderm differentiation, with enrichment for changes in chromatin state. Application to a type-2 diabetes dataset identifies several differentially regulated genes between patients and controls in pancreatic endocrine cells. DAESC is a powerful method for single-cell ASE analysis and can uncover novel insights on gene regulation.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Benjamin J Strober
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Joshua M Popp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Rebecca Keener
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Hongkai Ji
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA.
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, 21205, USA.
| |
Collapse
|
6
|
Wu EY, Singh NP, Choi K, Zakeri M, Vincent M, Churchill GA, Ackert-Bicknell CL, Patro R, Love MI. SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty. Genome Biol 2023; 24:165. [PMID: 37438847 PMCID: PMC10337143 DOI: 10.1186/s13059-023-03003-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 06/29/2023] [Indexed: 07/14/2023] Open
Abstract
Detecting allelic imbalance at the isoform level requires accounting for inferential uncertainty, caused by multi-mapping of RNA-seq reads. Our proposed method, SEESAW, uses Salmon and Swish to offer analysis at various levels of resolution, including gene, isoform, and aggregating isoforms to groups by transcription start site. The aggregation strategies strengthen the signal for transcripts with high uncertainty. The SEESAW suite of methods is shown to have higher power than other allelic imbalance methods when there is isoform-level allelic imbalance. We also introduce a new test for detecting imbalance that varies across a covariate, such as time.
Collapse
Affiliation(s)
- Euphy Y Wu
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA
| | - Noor P Singh
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | - Mohsen Zakeri
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | | | | | - Cheryl L Ackert-Bicknell
- Department of Orthopedics, School of Medicine, University of Colorado, Anschutz Campus, Aurora, CO, USA
| | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Michael I Love
- Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
- Department of Genetics, University of North Carolina-Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
7
|
Little P, Liu S, Zhabotynsky V, Li Y, Lin DY, Sun W. A computational method for cell type-specific expression quantitative trait loci mapping using bulk RNA-seq data. Nat Commun 2023; 14:3030. [PMID: 37231002 PMCID: PMC10212972 DOI: 10.1038/s41467-023-38795-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 05/16/2023] [Indexed: 05/27/2023] Open
Abstract
Mapping cell type-specific gene expression quantitative trait loci (ct-eQTLs) is a powerful way to investigate the genetic basis of complex traits. A popular method for ct-eQTL mapping is to assess the interaction between the genotype of a genetic locus and the abundance of a specific cell type using a linear model. However, this approach requires transforming RNA-seq count data, which distorts the relation between gene expression and cell type proportions and results in reduced power and/or inflated type I error. To address this issue, we have developed a statistical method called CSeQTL that allows for ct-eQTL mapping using bulk RNA-seq count data while taking advantage of allele-specific expression. We validated the results of CSeQTL through simulations and real data analysis, comparing CSeQTL results to those obtained from purified bulk RNA-seq data or single cell RNA-seq data. Using our ct-eQTL findings, we were able to identify cell types relevant to 21 categories of human traits.
Collapse
Affiliation(s)
- Paul Little
- Biostatistics Program, Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| | - Si Liu
- Biostatistics Program, Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Vasyl Zhabotynsky
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Wei Sun
- Biostatistics Program, Public Health Science Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| |
Collapse
|
8
|
Zhang J, Zhao H. eQTL Studies: from Bulk Tissues to Single Cells. ARXIV 2023:arXiv:2302.11662v1. [PMID: 36866231 PMCID: PMC9980190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
An expression quantitative trait locus (eQTL) is a chromosomal region where genetic variants are associated with the expression levels of certain genes that can be both nearby or distant. The identifications of eQTLs for different tissues, cell types, and contexts have led to better understanding of the dynamic regulations of gene expressions and implications of functional genes and variants for complex traits and diseases. Although most eQTL studies to date have been performed on data collected from bulk tissues, recent studies have demonstrated the importance of cell-type-specific and context-dependent gene regulations in biological processes and disease mechanisms. In this review, we discuss statistical methods that have been developed to enable the detections of cell-type-specific and context-dependent eQTLs from bulk tissues, purified cell types, and single cells. We also discuss the limitations of the current methods and future research opportunities.
Collapse
Affiliation(s)
- Jingfei Zhang
- Information Systems and Operations Management, Emory University
| | - Hongyu Zhao
- Department of Biostatistics, Yale University
| |
Collapse
|
9
|
Genetic regulators of cytokine responses upon BCG vaccination in children from West Africa. J Genet Genomics 2023:S1673-8527(23)00008-5. [PMID: 36681271 DOI: 10.1016/j.jgg.2023.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 12/21/2022] [Accepted: 01/03/2023] [Indexed: 01/19/2023]
Abstract
Genetic variation is a key factor influencing cytokine production capacity, but which genetic loci regulate cytokine production before and after vaccination, particularly in African population is unknown. Here, we aimed to identify single-nucleotide polymorphisms (SNPs) controlling cytokine responses (cQTLs) after microbial stimulation in infants of West-African ancestry, comprising of low-birth-weight neonates randomized to bacillus Calmette-Guérin (BCG) vaccine-at-birth (intervention) or to the usual delayed BCG (control). Genome-wide cytokine QTL mapping revealed 12 independent cQTLs loci, of which the LINC01082-LINC00917 locus influenced more than half of the cytokine-stimulation pairs assessed. Furthermore, nine distinct cQTLs were found among infants randomized to BCG. Functional validation confirmed that several complement genes affect cytokine response after BCG vaccination. We observed a limited overlap of common cQTLs between the West-African infants and cohorts of Western European individuals. These data reveal strong population-specific genetic effects on cytokine production and may indicate new opportunities for therapeutic intervention and vaccine development in African populations.
Collapse
|
10
|
Lee IH, Kong SW. ADGR: Admixture-Informed Differential Gene Regulation. Genes (Basel) 2023; 14:147. [PMID: 36672888 PMCID: PMC9859415 DOI: 10.3390/genes14010147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 12/15/2022] [Accepted: 01/03/2023] [Indexed: 01/06/2023] Open
Abstract
The regulatory elements in proximal and distal regions of genes are involved in the regulation of gene expression. Risk alleles in intronic and intergenic regions may alter gene expression by modifying the binding affinity and stability of diverse DNA-binding proteins implicated in gene expression regulation. By focusing on the local ancestral structure of coding and regulatory regions using the paired whole-genome sequence and tissue-wide transcriptome datasets from the Genotype-Tissue Expression project, we investigated the impact of genetic variants, in aggregate, on tissue-specific gene expression regulation. Local ancestral origins of the coding region, immediate and distant upstream regions, and distal regulatory region were determined using RFMix with the reference panel from the 1000 Genomes Project. For each tissue, inter-individual variation of gene expression levels explained by concordant or discordant local ancestry between coding and regulatory regions was estimated. Compared to European, African descent showed more frequent change in local ancestral structure, with shorter haplotype blocks. The expression level of the Adenosine Deaminase Like (ADAL) gene was significantly associated with admixed ancestral structure in the regulatory region across multiple tissue types. Further validations are required to understand the impact of the local ancestral structure of regulatory regions on gene expression regulation in humans and other species.
Collapse
Affiliation(s)
- In-Hee Lee
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, USA
| | - Sek Won Kong
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA 02215, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
11
|
Bankier S, Michoel T. eQTLs as causal instruments for the reconstruction of hormone linked gene networks. Front Endocrinol (Lausanne) 2022; 13:949061. [PMID: 36060942 PMCID: PMC9428692 DOI: 10.3389/fendo.2022.949061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 07/25/2022] [Indexed: 11/17/2022] Open
Abstract
Hormones act within in highly dynamic systems and much of the phenotypic response to variation in hormone levels is mediated by changes in gene expression. The increase in the number and power of large genetic association studies has led to the identification of hormone linked genetic variants. However, the biological mechanisms underpinning the majority of these loci are poorly understood. The advent of affordable, high throughput next generation sequencing and readily available transcriptomic databases has shown that many of these genetic variants also associate with variation in gene expression levels as expression Quantitative Trait Loci (eQTLs). In addition to further dissecting complex genetic variation, eQTLs have been applied as tools for causal inference. Many hormone networks are driven by transcription factors, and many of these genes can be linked to eQTLs. In this mini-review, we demonstrate how causal inference and gene networks can be used to describe the impact of hormone linked genetic variation upon the transcriptome within an endocrinology context.
Collapse
Affiliation(s)
- Sean Bankier
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | | |
Collapse
|