1
|
Bravo JI, Mizrahi CR, Kim S, Zhang L, Suh Y, Benayoun BA. An eQTL-based approach reveals candidate regulators of LINE-1 RNA levels in lymphoblastoid cells. PLoS Genet 2024; 20:e1011311. [PMID: 38848448 PMCID: PMC11189215 DOI: 10.1371/journal.pgen.1011311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 06/20/2024] [Accepted: 05/21/2024] [Indexed: 06/09/2024] Open
Abstract
Long interspersed element 1 (LINE-1; L1) are a family of transposons that occupy ~17% of the human genome. Though a small number of L1 copies remain capable of autonomous transposition, the overwhelming majority of copies are degenerate and immobile. Nevertheless, both mobile and immobile L1s can exert pleiotropic effects (promoting genome instability, inflammation, or cellular senescence) on their hosts, and L1's contributions to aging and aging diseases is an area of active research. However, because of the cell type-specific nature of transposon control, the catalogue of L1 regulators remains incomplete. Here, we employ an eQTL approach leveraging transcriptomic and genomic data from the GEUVADIS and 1000Genomes projects to computationally identify new candidate regulators of L1 RNA levels in lymphoblastoid cell lines. To cement the role of candidate genes in L1 regulation, we experimentally modulate the levels of top candidates in vitro, including IL16, STARD5, HSD17B12, and RNF5, and assess changes in TE family expression by Gene Set Enrichment Analysis (GSEA). Remarkably, we observe subtle but widespread upregulation of TE family expression following IL16 and STARD5 overexpression. Moreover, a short-term 24-hour exposure to recombinant human IL16 was sufficient to transiently induce subtle, but widespread, upregulation of L1 subfamilies. Finally, we find that many L1 expression-associated genetic variants are co-associated with aging traits across genome-wide association study databases. Our results expand the catalogue of genes implicated in L1 RNA control and further suggest that L1-derived RNA contributes to aging processes. Given the ever-increasing availability of paired genomic and transcriptomic data, we anticipate this new approach to be a starting point for more comprehensive computational scans for regulators of transposon RNA levels.
Collapse
Affiliation(s)
- Juan I. Bravo
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, California, United States of America
- Graduate program in the Biology of Aging, University of Southern California, Los Angeles, California, United States of America
| | - Chanelle R. Mizrahi
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, California, United States of America
- USC Gerontology Enriching MSTEM to Enhance Diversity in Aging Program, University of Southern California, Los Angeles, California, United States of America
| | - Seungsoo Kim
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, New York, United States of America
| | - Lucia Zhang
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, California, United States of America
- Quantitative and Computational Biology Department, USC Dornsife College of Letters, Arts and Sciences, Los Angeles, California, United States of America
| | - Yousin Suh
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, New York, United States of America
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, New York, United States of America
| | - Bérénice A. Benayoun
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, California, United States of America
- Molecular and Computational Biology Department, USC Dornsife College of Letters, Arts and Sciences, Los Angeles, California, United States of America
- Biochemistry and Molecular Medicine Department, USC Keck School of Medicine, Los Angeles, California, United States of America
- USC Norris Comprehensive Cancer Center, Epigenetics and Gene Regulation, Los Angeles, California, United States of America
- USC Stem Cell Initiative, Los Angeles, California, United States of America
| |
Collapse
|
2
|
Bravo JI, Mizrahi CR, Kim S, Zhang L, Suh Y, Benayoun BA. An eQTL-based Approach Reveals Candidate Regulators of LINE-1 RNA Levels in Lymphoblastoid Cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.15.553416. [PMID: 37645920 PMCID: PMC10461994 DOI: 10.1101/2023.08.15.553416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Long interspersed element 1 (L1) are a family of autonomous, actively mobile transposons that occupy ~17% of the human genome. A number of pleiotropic effects induced by L1 (promoting genome instability, inflammation, or cellular senescence) have been observed, and L1's contributions to aging and aging diseases is an area of active research. However, because of the cell type-specific nature of transposon control, the catalogue of L1 regulators remains incomplete. Here, we employ an eQTL approach leveraging transcriptomic and genomic data from the GEUVADIS and 1000Genomes projects to computationally identify new candidate regulators of L1 RNA levels in lymphoblastoid cell lines. To cement the role of candidate genes in L1 regulation, we experimentally modulate the levels of top candidates in vitro, including IL16, STARD5, HSDB17B12, and RNF5, and assess changes in TE family expression by Gene Set Enrichment Analysis (GSEA). Remarkably, we observe subtle but widespread upregulation of TE family expression following IL16 and STARD5 overexpression. Moreover, a short-term 24-hour exposure to recombinant human IL16 was sufficient to transiently induce subtle, but widespread, upregulation of L1 subfamilies. Finally, we find that many L1 expression-associated genetic variants are co-associated with aging traits across genome-wide association study databases. Our results expand the catalogue of genes implicated in L1 RNA control and further suggest that L1-derived RNA contributes to aging processes. Given the ever-increasing availability of paired genomic and transcriptomic data, we anticipate this new approach to be a starting point for more comprehensive computational scans for transposon transcriptional regulators.
Collapse
Affiliation(s)
- Juan I. Bravo
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA 90089, USA
- Graduate program in the Biology of Aging, University of Southern California, Los Angeles, CA 90089, USA
| | - Chanelle R. Mizrahi
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA 90089, USA
- USC Gerontology Enriching MSTEM to Enhance Diversity in Aging Program, University of Southern California, Los Angeles, CA 90089, USA
| | - Seungsoo Kim
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Lucia Zhang
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA 90089, USA
- Quantitative and Computational Biology Department, USC Dornsife College of Letters, Arts and Sciences, Los Angeles, CA 90089, USA
| | - Yousin Suh
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY 10032, USA
- Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Bérénice A. Benayoun
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA 90089, USA
- Molecular and Computational Biology Department, USC Dornsife College of Letters, Arts and Sciences, Los Angeles, CA 90089, USA
- Biochemistry and Molecular Medicine Department, USC Keck School of Medicine, Los Angeles, CA 90089, USA
- USC Norris Comprehensive Cancer Center, Epigenetics and Gene Regulation, Los Angeles, CA 90089, USA
- USC Stem Cell Initiative, Los Angeles, CA 90089, USA
| |
Collapse
|
3
|
Zhang C, Cui X, Lian S, Xiao R, Qiao H, Li S, Lou Y, Feng Y, Zhuang L, Du J, Liu X. Intelligent algorithm for dynamic functional brain network complexity from CN to AD. INT J INTELL SYST 2022. [DOI: 10.1002/int.22737] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Chenghui Zhang
- School of Computer Science Qufu Normal University Rizhao China
- Department of Psychology and Behavioral Sciences Zhejiang University Hangzhou China
| | - Xinchun Cui
- School of Computer Science Qufu Normal University Rizhao China
- Guangxi Key Laboratory of Cryptography and Information Security Guilin China
| | - Shujun Lian
- School of Management Qufu Normal University Rizhao China
| | - Ruyi Xiao
- School of Computer Science Qufu Normal University Rizhao China
| | - Hong Qiao
- School of Business Shandong Normal University Jinan China
| | - Shancang Li
- Department of Computer Science University of the West of England Bristol UK
| | - Yue Lou
- Department of Neurology Zhejiang Hospital Hangzhou China
| | - Yue Feng
- Department of Radiology Zhejiang Hospital Hangzhou China
| | - Liying Zhuang
- Department of Neurology Zhejiang Hospital Hangzhou China
| | - Jianzong Du
- Respiratory Medicine Zhejiang Hospital Hangzhou China
| | - Xiaoli Liu
- Department of Neurology Zhejiang Hospital Hangzhou China
| |
Collapse
|
4
|
Du J, Lin D, Yuan R, Chen X, Liu X, Yan J. Graph Embedding Based Novel Gene Discovery Associated With Diabetes Mellitus. Front Genet 2021; 12:779186. [PMID: 34899863 PMCID: PMC8657768 DOI: 10.3389/fgene.2021.779186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 10/20/2021] [Indexed: 11/25/2022] Open
Abstract
Diabetes mellitus is a group of complex metabolic disorders which has affected hundreds of millions of patients world-widely. The underlying pathogenesis of various types of diabetes is still unclear, which hinders the way of developing more efficient therapies. Although many genes have been found associated with diabetes mellitus, more novel genes are still needed to be discovered towards a complete picture of the underlying mechanism. With the development of complex molecular networks, network-based disease-gene prediction methods have been widely proposed. However, most existing methods are based on the hypothesis of guilt-by-association and often handcraft node features based on local topological structures. Advances in graph embedding techniques have enabled automatically global feature extraction from molecular networks. Inspired by the successful applications of cutting-edge graph embedding methods on complex diseases, we proposed a computational framework to investigate novel genes associated with diabetes mellitus. There are three main steps in the framework: network feature extraction based on graph embedding methods; feature denoising and regeneration using stacked autoencoder; and disease-gene prediction based on machine learning classifiers. We compared the performance by using different graph embedding methods and machine learning classifiers and designed the best workflow for predicting genes associated with diabetes mellitus. Functional enrichment analysis based on Human Phenotype Ontology (HPO), KEGG, and GO biological process and publication search further evaluated the predicted novel genes.
Collapse
Affiliation(s)
| | | | | | | | | | - Jing Yan
- Zhejiang Hospital, Hangzhou, China.,Zhejiang Provincial Key Lab of Geriatrics, Zhejiang Hospital, Hangzhou, China
| |
Collapse
|
5
|
Liu H, Hou L, Xu S, Li H, Chen X, Gao J, Wang Z, Han B, Liu X, Wan S. Discovering Cerebral Ischemic Stroke Associated Genes Based on Network Representation Learning. Front Genet 2021; 12:728333. [PMID: 34539754 PMCID: PMC8442767 DOI: 10.3389/fgene.2021.728333] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 07/26/2021] [Indexed: 11/13/2022] Open
Abstract
Cerebral ischemic stroke (IS) is a complex disease caused by multiple factors including vascular risk factors, genetic factors, and environment factors, which accentuates the difficulty in discovering corresponding disease-related genes. Identifying the genes associated with IS is critical for understanding the biological mechanism of IS, which would be significantly beneficial to the diagnosis and clinical treatment of cerebral IS. However, existing methods to predict IS-related genes are mainly based on the hypothesis of guilt-by-association (GBA). These methods cannot capture the global structure information of the whole protein-protein interaction (PPI) network. Inspired by the success of network representation learning (NRL) in the field of network analysis, we apply NRL to the discovery of disease-related genes and launch the framework to identify the disease-related genes of cerebral IS. The utilized framework contains three main parts: capturing the topological information of the PPI network with NRL, denoising the gene feature with the participation of a stacked autoencoder (SAE), and optimizing a support vector machine (SVM) classifier to identify IS-related genes. Superior to the existing methods on IS-related gene prediction, our framework presents more accurate results. The case study also shows that the proposed method can identify IS-related genes.
Collapse
Affiliation(s)
- Haijie Liu
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Liping Hou
- Department of Clinical Laboratory, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Shanhu Xu
- Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - He Li
- Department of Automation, College of Information Science and Engineering, Tianjin Tianshi College, Tianjin, China
| | - Xiuju Chen
- Department of Neurology, Tianjin Nankai Hospital, Tianjin, China
| | - Juan Gao
- Department of Neurology, Baoding No. 1 Central Hospital, Baoding, China
| | - Ziwen Wang
- Graduate School of Chengde Medical College, Chengde, China
| | - Bo Han
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Xiaoli Liu
- Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Shu Wan
- Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
6
|
Zhang T, Choi J, Dilshat R, Einarsdóttir BÓ, Kovacs MA, Xu M, Malasky M, Chowdhury S, Jones K, Bishop DT, Goldstein AM, Iles MM, Landi MT, Law MH, Shi J, Steingrímsson E, Brown KM. Cell-type-specific meQTLs extend melanoma GWAS annotation beyond eQTLs and inform melanocyte gene-regulatory mechanisms. Am J Hum Genet 2021; 108:1631-1646. [PMID: 34293285 PMCID: PMC8456160 DOI: 10.1016/j.ajhg.2021.06.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Accepted: 06/23/2021] [Indexed: 01/09/2023] Open
Abstract
Although expression quantitative trait loci (eQTLs) have been powerful in identifying susceptibility genes from genome-wide association study (GWAS) findings, most trait-associated loci are not explained by eQTLs alone. Alternative QTLs, including DNA methylation QTLs (meQTLs), are emerging, but cell-type-specific meQTLs using cells of disease origin have been lacking. Here, we established an meQTL dataset by using primary melanocytes from 106 individuals and identified 1,497,502 significant cis-meQTLs. Multi-QTL colocalization with meQTLs, eQTLs, and mRNA splice-junction QTLs from the same individuals together with imputed methylome-wide and transcriptome-wide association studies identified candidate susceptibility genes at 63% of melanoma GWAS loci. Among the three molecular QTLs, meQTLs were the single largest contributor. To compare melanocyte meQTLs with those from malignant melanomas, we performed meQTL analysis on skin cutaneous melanomas from The Cancer Genome Atlas (n = 444). A substantial proportion of meQTL probes (45.9%) in primary melanocytes is preserved in melanomas, while a smaller fraction of eQTL genes is preserved (12.7%). Integration of melanocyte multi-QTLs and melanoma meQTLs identified candidate susceptibility genes at 72% of melanoma GWAS loci. Beyond GWAS annotation, meQTL-eQTL colocalization in melanocytes suggested that 841 unique genes potentially share a causal variant with a nearby methylation probe in melanocytes. Finally, melanocyte trans-meQTLs identified a hotspot for rs12203592, a cis-eQTL of a transcription factor, IRF4, with 131 candidate target CpGs. Motif enrichment and IRF4 ChIP-seq analysis demonstrated that these target CpGs are enriched in IRF4 binding sites, suggesting an IRF4-mediated regulatory network. Our study highlights the utility of cell-type-specific meQTLs.
Collapse
Affiliation(s)
- Tongwu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Jiyeon Choi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Ramile Dilshat
- Department of Biochemistry and Molecular Biology, BioMedical Center, Faculty of Medicine, University of Iceland, Sturlugata 8, 101 Reykjavik, Iceland
| | - Berglind Ósk Einarsdóttir
- Department of Biochemistry and Molecular Biology, BioMedical Center, Faculty of Medicine, University of Iceland, Sturlugata 8, 101 Reykjavik, Iceland
| | - Michael A Kovacs
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Mai Xu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Michael Malasky
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Salma Chowdhury
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Kristine Jones
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - D Timothy Bishop
- Leeds Institute for Data Analytics, School of Medicine, University of Leeds, Leeds LS9 7TF, UK
| | - Alisa M Goldstein
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Mark M Iles
- Leeds Institute for Data Analytics, School of Medicine, University of Leeds, Leeds LS9 7TF, UK
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Matthew H Law
- Statistical Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia; School of Biomedical Sciences, Faculty of Health, and Institute of Health and Biomedical Innovation, Queensland University of Technology, Kelvin Grove, QLD 4059, Australia
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Eiríkur Steingrímsson
- Department of Biochemistry and Molecular Biology, BioMedical Center, Faculty of Medicine, University of Iceland, Sturlugata 8, 101 Reykjavik, Iceland
| | - Kevin M Brown
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA.
| |
Collapse
|
7
|
Quan W, Liu B, Wang Y. Fast and SNP-aware short read alignment with SALT. BMC Bioinformatics 2021; 22:172. [PMID: 34433415 PMCID: PMC8386087 DOI: 10.1186/s12859-021-04088-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 03/17/2021] [Indexed: 11/23/2022] Open
Abstract
Background DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore, rapidly and accurately mapping reads to a reference genome is a significant topic in bioinformatics. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly). However, such a linear reference genome represents the genome of only one or a few individuals and thus lacks information on variations in the population. This limitation can introduce bias and impact the sensitivity and accuracy of mapping. Recently, a number of aligners have begun to map reads to populations of genomes, which can be represented by a reference genome and a large number of genetic variants. However, compared to linear reference aligners, an aligner that can store and index all genetic variants has a high cost in memory (RAM) space and leads to extremely long run time. Aligning reads to a graph-model-based index that includes all types of variants is ultimately an NP-hard problem in theory. By contrast, considering only single nucleotide polymorphism (SNP) information will reduce the complexity of the index and improve the speed of sequence alignment. Results The SNP-aware alignment tool (SALT) is a fast, memory-efficient, and SNP-aware short read alignment tool. SALT uses 5.8 GB of RAM to index a human reference genome (GRCh38) and incorporates 12.8M UCSC common SNPs. Compared with a state-of-the-art aligner, SALT has a similar speed but higher accuracy. Conclusions Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates an SNP database. We benchmarked SALT using simulated and real datasets. The results demonstrate that SALT can efficiently map reads to the reference genome with significantly improved accuracy. Incorporating SNP information can improve the accuracy of read alignment and can reveal novel variants. The source code is freely available at https://github.com/weiquan/SALT.
Collapse
Affiliation(s)
- Wei Quan
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi Street, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi Street, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, 92 West Dazhi Street, Harbin, China.
| |
Collapse
|
8
|
Wang T, Liu Y, Ruan J, Dong X, Wang Y, Peng J. A pipeline for RNA-seq based eQTL analysis with automated quality control procedures. BMC Bioinformatics 2021; 22:403. [PMID: 34433407 PMCID: PMC8386049 DOI: 10.1186/s12859-021-04307-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 07/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Advances in the expression quantitative trait loci (eQTL) studies have provided valuable insights into the mechanism of diseases and traits-associated genetic variants. However, it remains challenging to evaluate and control the quality of multi-source heterogeneous eQTL raw data for researchers with limited computational background. There is an urgent need to develop a powerful and user-friendly tool to automatically process the raw datasets in various formats and perform the eQTL mapping afterward. RESULTS In this work, we present a pipeline for eQTL analysis, termed eQTLQC, featured with automated data preprocessing for both genotype data and gene expression data. Our pipeline provides a set of quality control and normalization approaches, and utilizes automated techniques to reduce manual intervention. We demonstrate the utility and robustness of this pipeline by performing eQTL case studies using multiple independent real-world datasets with RNA-seq data and whole genome sequencing (WGS) based genotype data. CONCLUSIONS eQTLQC provides a reliable computational workflow for eQTL analysis. It provides standard quality control and normalization as well as eQTL mapping procedures for eQTL raw data in multiple formats. The source code, demo data, and instructions are freely available at https://github.com/stormlovetao/eQTLQC .
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Road, Chang’an District, Xi’an, China
- School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi St., Harbin, China
| | - Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi St., Harbin, China
| | - Junpeng Ruan
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Road, Chang’an District, Xi’an, China
| | - Xianjun Dong
- Brigham and Women’s Hospital, Harvard Medical School, 75 Francis St., Boston, USA
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, West Dazhi St., Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, 1 Dongxiang Road, Chang’an District, Xi’an, China
| |
Collapse
|
9
|
Proteome-wide Systems Genetics to Identify Functional Regulators of Complex Traits. Cell Syst 2021; 12:5-22. [PMID: 33476553 DOI: 10.1016/j.cels.2020.10.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/15/2020] [Accepted: 10/07/2020] [Indexed: 02/08/2023]
Abstract
Proteomic technologies now enable the rapid quantification of thousands of proteins across genetically diverse samples. Integration of these data with systems-genetics analyses is a powerful approach to identify new regulators of economically important or disease-relevant phenotypes in various populations. In this review, we summarize the latest proteomic technologies and discuss technical challenges for their use in population studies. We demonstrate how the analysis of correlation structure and loci mapping can be used to identify genetic factors regulating functional protein networks and complex traits. Finally, we provide an extensive summary of the use of proteome-wide systems genetics throughout fungi, plant, and animal kingdoms and discuss the power of this approach to identify candidate regulators and drug targets in large human consortium studies.
Collapse
|
10
|
Quan W, Guan D, Quan G, Liu B, Wang Y. Short Read Alignment Based on Maximal Approximate Match Seeds. Front Mol Biosci 2020; 7:572934. [PMID: 33251246 PMCID: PMC7674947 DOI: 10.3389/fmolb.2020.572934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 10/09/2020] [Indexed: 11/13/2022] Open
Abstract
Sequence alignment is a critical step in many critical genomic studies, such as variant calling, quantitative transcriptome analysis (RNA-seq), and metagenomic sequence classification. However, the alignment performance is largely affected by repetitive sequences in the reference genome, which extensively exist in species from bacteria to mammals. Aligning repeating sequences might lead to tremendous candidate locations, bringing about a challenging computational burden. Thus, most alignment tools prefer to simply discard highly repetitive seeds, but this may cause the true alignment to be missed. Using maximal approximate matches (MAMs) as seeds is an option, but MEMs seeds may fail due to sequencing errors or genomic variations in MEMs seeds. Here, we propose a novel sequence alignment algorithm, named MAM, which can efficiently align short DNA sequences. MAM first builds a modified Burrows-Wheeler transform (BWT) structure of a reference genome to accelerate approximate seed matching. Then, MAM uses maximal approximate matches (MAMs) seeds to reduce the candidate locations. Finally, MAM applies an affine-gap-penalty dynamic programming to extend MAMs seeds. Experimental results on simulated and real sequencing datasets show that MAM achieves better performance in speed than other state-of-the-art alignment tools. The source code is available at https://github.com/weiquan/mam.
Collapse
Affiliation(s)
- Wei Quan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Dengfeng Guan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
- Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Guangri Quan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
- *Correspondence: Yadong Wang
| |
Collapse
|
11
|
Wang T, Peng Q, Liu B, Liu Y, Wang Y. Disease Module Identification Based on Representation Learning of Complex Networks Integrated From GWAS, eQTL Summaries, and Human Interactome. Front Bioeng Biotechnol 2020; 8:418. [PMID: 32435638 PMCID: PMC7218106 DOI: 10.3389/fbioe.2020.00418] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 04/14/2020] [Indexed: 12/18/2022] Open
Abstract
The study of disease-relevant gene modules is one of the main methods to discover disease pathway and potential drug targets. Recent studies have found that most disease proteins tend to form many separate connected components and scatter across the protein-protein interaction network. However, most of the research on discovering disease modules are biased toward well-studied seed genes, which tend to extend seed genes into a single connected subnetwork. In this paper, we propose N2V-HC, an algorithm framework aiming to unbiasedly discover the scattered disease modules based on deep representation learning of integrated multi-layer biological networks. Our method first predicts disease associated genes based on summary data of Genome-wide Association Studies (GWAS) and expression Quantitative Trait Loci (eQTL) studies, and generates an integrated network on the basis of human interactome. The features of nodes in the network are then extracted by deep representation learning. Hierarchical clustering with dynamic tree cut methods are applied to discover the modules that are enriched with disease associated genes. The evaluation on real networks and simulated networks show that N2V-HC performs better than existing methods in network module discovery. Case studies on Parkinson's disease and Alzheimer's disease, show that N2V-HC can be used to discover biological meaningful modules related to the pathways underlying complex diseases.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qidi Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|