1
|
Wei J, Wu BJ, Daoud SS. Whole-Exome Sequencing (WES) Reveals Novel Sex-Specific Gene Variants in Non-Alcoholic Steatohepatitis (MASH). Genes (Basel) 2024; 15:357. [PMID: 38540416 PMCID: PMC10969913 DOI: 10.3390/genes15030357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 03/06/2024] [Accepted: 03/11/2024] [Indexed: 06/14/2024] Open
Abstract
Non-alcoholic steatohepatitis (NASH, also known as MASH) is a severe form of non-alcoholic fatty liver disease (NAFLD, also known as MASLD). Emerging data indicate that the progression of the disease to MASH is higher in postmenopausal women and that genetic susceptibility increases the risk of MASH-related cirrhosis. This study aimed to investigate the association between genetic polymorphisms in MASH and sexual dimorphism. We applied whole-exome sequencing (WES) to identify gene variants in 8 age-adjusted matched pairs of livers from both male and female patients. Sequencing alignment, variant calling, and annotation were performed using standard methods. Polymerase chain reaction (PCR) coupled with Sanger sequencing and immunoblot analysis were used to validate specific gene variants. cBioPortal and Gene Set Enrichment Analysis (GSEA) were used for actionable target analysis. We identified 148,881 gene variants, representing 57,121 and 50,150 variants in the female and male cohorts, respectively, of which 251 were highly significant and MASH sex-specific (p < 0.0286). Polymorphisms in CAPN14, SLC37A3, BAZ1A, SRP54, MYH11, ABCC1, and RNFT1 were highly expressed in male liver samples. In female samples, Polymorphisms in RGSL1, SLC17A2, HFE, NLRC5, ACTN4, SBF1, and ALPK2 were identified. A heterozygous variant 1151G>T located on 18q21.32 for ALPK2 (rs3809983) was validated by Sanger sequencing and expressed only in female samples. Immunoblot analysis confirmed that the protein level of β-catenin in female samples was 2-fold higher than normal, whereas ALPK2 expression was 0.5-fold lower than normal. No changes in the protein levels of either ALPK2 or β-catenin were observed in male samples. Our study suggests that the perturbation of canonical Wnt/β-catenin signaling observed in postmenopausal women with MASH could be the result of polymorphisms in ALPK2.
Collapse
Affiliation(s)
| | | | - Sayed S. Daoud
- Department of Pharmaceutical Sciences, College of Pharmacy and Pharmaceutical Sciences, Washington State University Health Sciences, Spokane, WA 99202, USA; (J.W.); (B.J.W.)
| |
Collapse
|
2
|
Letcher B, Hunt M, Iqbal Z. Gramtools enables multiscale variation analysis with genome graphs. Genome Biol 2021; 22:259. [PMID: 34488837 PMCID: PMC8420074 DOI: 10.1186/s13059-021-02474-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 08/20/2021] [Indexed: 11/24/2022] Open
Abstract
Genome graphs allow very general representations of genetic variation; depending on the model and implementation, variation at different length-scales (single nucleotide polymorphisms (SNPs), structural variants) and on different sequence backgrounds can be incorporated with different levels of transparency. We implement a model which handles this multiscale variation and develop a JSON extension of VCF (jVCF) allowing for variant calls on multiple references, both implemented in our software gramtools. We find gramtools outperforms existing methods for genotyping SNPs overlapping large deletions in M. tuberculosis and is able to genotype on multiple alternate backgrounds in P. falciparum, revealing previously hidden recombination.
Collapse
Affiliation(s)
| | - Martin Hunt
- EMBL-EBI, Hinxton, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | | |
Collapse
|
3
|
AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models. ENTROPY 2021; 23:e23050530. [PMID: 33925812 PMCID: PMC8146440 DOI: 10.3390/e23050530] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 04/19/2021] [Accepted: 04/22/2021] [Indexed: 12/28/2022]
Abstract
Recently, the scientific community has witnessed a substantial increase in the generation of protein sequence data, triggering emergent challenges of increasing importance, namely efficient storage and improved data analysis. For both applications, data compression is a straightforward solution. However, in the literature, the number of specific protein sequence compressors is relatively low. Moreover, these specialized compressors marginally improve the compression ratio over the best general-purpose compressors. In this paper, we present AC2, a new lossless data compressor for protein (or amino acid) sequences. AC2 uses a neural network to mix experts with a stacked generalization approach and individual cache-hash memory models to the highest-context orders. Compared to the previous compressor (AC), we show gains of 2–9% and 6–7% in reference-free and reference-based modes, respectively. These gains come at the cost of three times slower computations. AC2 also improves memory usage against AC, with requirements about seven times lower, without being affected by the sequences’ input size. As an analysis application, we use AC2 to measure the similarity between each SARS-CoV-2 protein sequence with each viral protein sequence from the whole UniProt database. The results consistently show higher similarity to the pangolin coronavirus, followed by the bat and human coronaviruses, contributing with critical results to a current controversial subject. AC2 is available for free download under GPLv3 license.
Collapse
|
4
|
Eizenga JM, Novak AM, Sibbesen JA, Heumos S, Ghaffaari A, Hickey G, Chang X, Seaman JD, Rounthwaite R, Ebler J, Rautiainen M, Garg S, Paten B, Marschall T, Sirén J, Garrison E. Pangenome Graphs. Annu Rev Genomics Hum Genet 2020; 21:139-162. [PMID: 32453966 DOI: 10.1146/annurev-genom-120219-080406] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely address the sequence and variation described in large collections of related genomes. These approaches often use graphical models of the pangenome to support algorithms for sequence alignment, visualization, functional genomics, and association studies. The additional information provided to these methods by the pangenome allows them to achieve superior performance on a variety of bioinformatic tasks, including read alignment, variant calling, and genotyping. Pangenome graphs stand to become a ubiquitous tool in genomics. Although it is unclear whether they will replace linearreference genomes, their ability to harmoniously relate multiple sequence and coordinate systems will make them useful irrespective of which pangenomic models become most common in the future.
Collapse
Affiliation(s)
- Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Jonas A Sibbesen
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Simon Heumos
- Quantitative Biology Center, University of Tübingen, 72076 Tübingen, Germany
| | - Ali Ghaffaari
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Xian Chang
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Josiah D Seaman
- Royal Botanic Gardens, Kew, Richmond TW9 3AB, United Kingdom.,School of Biological and Chemical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom
| | - Robin Rounthwaite
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Jana Ebler
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
| | - Mikko Rautiainen
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany
| | - Shilpa Garg
- Departments of Genetics and Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02215, USA.,Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany.,Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| | - Erik Garrison
- Genomics Institute, University of California, Santa Cruz, California 95064, USA;
| |
Collapse
|