1
|
Liang Q, Abraham A, Capra JA, Kostka D. Disease-specific prioritization of non-coding GWAS variants based on chromatin accessibility. HGG ADVANCES 2024; 5:100310. [PMID: 38773771 PMCID: PMC11259938 DOI: 10.1016/j.xhgg.2024.100310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 05/15/2024] [Accepted: 05/16/2024] [Indexed: 05/24/2024] Open
Abstract
Non-protein-coding genetic variants are a major driver of the genetic risk for human disease; however, identifying which non-coding variants contribute to diseases and their mechanisms remains challenging. In silico variant prioritization methods quantify a variant's severity, but for most methods, the specific phenotype and disease context of the prediction remain poorly defined. For example, many commonly used methods provide a single, organism-wide score for each variant, while other methods summarize a variant's impact in certain tissues and/or cell types. Here, we propose a complementary disease-specific variant prioritization scheme, which is motivated by the observation that variants contributing to disease often operate through specific biological mechanisms. We combine tissue/cell-type-specific variant scores (e.g., GenoSkyline, FitCons2, DNA accessibility) into disease-specific scores with a logistic regression approach and apply it to ∼25,000 non-coding variants spanning 111 diseases. We show that this disease-specific aggregation significantly improves the association of common non-coding genetic variants with disease (average precision: 0.151, baseline = 0.09), compared with organism-wide scores (GenoCanyon, LINSIGHT, GWAVA, Eigen, CADD; average precision: 0.129, baseline = 0.09). Further on, disease similarities based on data-driven aggregation weights highlight meaningful disease groups, and it provides information about tissues and cell types that drive these similarities. We also show that so-learned similarities are complementary to genetic similarities as quantified by genetic correlation. Overall, our approach demonstrates the strengths of disease-specific variant prioritization, leads to improvement in non-coding variant prioritization, and enables interpretable models that link variants to disease via specific tissues and/or cell types.
Collapse
Affiliation(s)
- Qianqian Liang
- Department of Computational & Systems Biology and Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA; Department of Human Genetics, University of Pittsburgh School of Public Health, Pittsburgh, PA, USA
| | - Abin Abraham
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - John A Capra
- Department of Epidemiology & Biostatistics and Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Dennis Kostka
- Department of Computational & Systems Biology and Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
2
|
Lee CL, Chuang CK, Chiu HC, Chang YH, Tu YR, Lo YT, Lin HY, Lin SP. Application of whole exome sequencing in the diagnosis of muscular disorders: a study of Taiwanese pediatric patients. Front Genet 2024; 15:1365729. [PMID: 38818036 PMCID: PMC11137626 DOI: 10.3389/fgene.2024.1365729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 04/23/2024] [Indexed: 06/01/2024] Open
Abstract
Background Muscular dystrophies and congenital myopathies encompass various inherited muscular disorders that present diagnostic challenges due to clinical complexity and genetic heterogeneity. Methods This study aimed to investigate the use of whole exome sequencing (WES) in diagnosing muscular disorders in pediatric patients in Taiwan. Out of 161 pediatric patients suspected to have genetic/inherited myopathies, 115 received a molecular diagnosis through conventional tests, single gene testing, and gene panels. The remaining 46 patients were divided into three groups: Group 1 (multiplex ligation-dependent probe amplification-negative Duchenne muscular dystrophy) with three patients (6.5%), Group 2 (various forms of muscular dystrophies) with 21 patients (45.7%), and Group 3 (congenital myopathies) with 22 patients (47.8%). Results WES analysis of these groups found pathogenic variants in 100.0% (3/3), 57.1% (12/21), and 68.2% (15/22) of patients in Groups 1 to 3, respectively. WES had a diagnostic yield of 65.2% (30 patients out of 46), detecting 30 pathogenic or potentially pathogenic variants across 28 genes. Conclusion WES enables the diagnosis of rare diseases with symptoms and characteristics similar to congenital myopathies and muscular dystrophies, such as muscle weakness. Consequently, this approach facilitates targeted therapy implementation and appropriate genetic counseling.
Collapse
Affiliation(s)
- Chung-Lin Lee
- Department of Pediatrics, MacKay Memorial Hospital, Taipei, Taiwan
- Institute of Clinical Medicine, National Yang-Ming Chiao-Tung University, Taipei, Taiwan
- Department of Rare Disease Center, MacKay Memorial Hospital, Taipei, Taiwan
- Department of Medicine, Mackay Medical College, Taipei, Taiwan
- Mackay Junior College of Medicine, Nursing and Management, Taipei, Taiwan
| | - Chih-Kuang Chuang
- Division of Genetics and Metabolism, Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
- College of Medicine, Fu-Jen Catholic University, Taipei, Taiwan
| | - Huei-Ching Chiu
- Department of Pediatrics, MacKay Memorial Hospital, Taipei, Taiwan
| | - Ya-Hui Chang
- Department of Pediatrics, MacKay Memorial Hospital, Taipei, Taiwan
- Department of Rare Disease Center, MacKay Memorial Hospital, Taipei, Taiwan
| | - Yuan-Rong Tu
- Division of Genetics and Metabolism, Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
| | - Yun-Ting Lo
- Department of Rare Disease Center, MacKay Memorial Hospital, Taipei, Taiwan
| | - Hsiang-Yu Lin
- Department of Pediatrics, MacKay Memorial Hospital, Taipei, Taiwan
- Department of Rare Disease Center, MacKay Memorial Hospital, Taipei, Taiwan
- Department of Medicine, Mackay Medical College, Taipei, Taiwan
- Mackay Junior College of Medicine, Nursing and Management, Taipei, Taiwan
- Division of Genetics and Metabolism, Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
- Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, Taiwan
| | - Shuan-Pei Lin
- Department of Pediatrics, MacKay Memorial Hospital, Taipei, Taiwan
- Department of Rare Disease Center, MacKay Memorial Hospital, Taipei, Taiwan
- Department of Medicine, Mackay Medical College, Taipei, Taiwan
- Division of Genetics and Metabolism, Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
- Department of Infant and Child Care, National Taipei University of Nursing and Health Sciences, Taipei, Taiwan
| |
Collapse
|
3
|
Wang C, Chen C, Lei B, Qin S, Zhang Y, Li K, Zhang S, Liu Y. Constructing eRNA-mediated gene regulatory networks to explore the genetic basis of muscle and fat-relevant traits in pigs. Genet Sel Evol 2024; 56:28. [PMID: 38594607 PMCID: PMC11003151 DOI: 10.1186/s12711-024-00897-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 04/03/2024] [Indexed: 04/11/2024] Open
Abstract
BACKGROUND Enhancer RNAs (eRNAs) play a crucial role in transcriptional regulation. While significant progress has been made in understanding epigenetic regulation mediated by eRNAs, research on the construction of eRNA-mediated gene regulatory networks (eGRN) and the identification of critical network components that influence complex traits is lacking. RESULTS Here, employing the pig as a model, we conducted a comprehensive study using H3K27ac histone ChIP-seq and RNA-seq data to construct eRNA expression profiles from multiple tissues of two distinct pig breeds, namely Enshi Black (ES) and Duroc. In addition to revealing the regulatory landscape of eRNAs at the tissue level, we developed an innovative network construction and refinement method by integrating RNA-seq, ChIP-seq, genome-wide association study (GWAS) signals and enhancer-modulating effects of single nucleotide polymorphisms (SNPs) measured by self-transcribing active regulatory region sequencing (STARR-seq) experiments. Using this approach, we unraveled eGRN that significantly influence the growth and development of muscle and fat tissues, and identified several novel genes that affect adipocyte differentiation in a cell line model. CONCLUSIONS Our work not only provides novel insights into the genetic basis of economic pig traits, but also offers a generalizable approach to elucidate the eRNA-mediated transcriptional regulation underlying a wide spectrum of complex traits for diverse organisms.
Collapse
Affiliation(s)
- Chao Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Choulin Chen
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Bowen Lei
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China
| | - Shenghua Qin
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
| | - Yuanyuan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- School of Life Sciences, Henan University, Kaifeng, 475004, People's Republic of China
- Shenzhen Research Institute of Henan University, Shenzhen, 518000, People's Republic of China
| | - Kui Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China
| | - Song Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China.
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China.
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China.
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518124, People's Republic of China.
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education & Key Lab of Swine Genetics and Breeding of Ministry of Agriculture and Rural Affairs, Huazhong Agricultural University, Wuhan, 430070, People's Republic of China.
- Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan, 528226, People's Republic of China.
| |
Collapse
|
4
|
Wang X, Zhai Y, Zheng H. Deciphering the cellular heterogeneity of the insect brain with single-cell RNA sequencing. INSECT SCIENCE 2024; 31:314-327. [PMID: 37702319 DOI: 10.1111/1744-7917.13270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 07/27/2023] [Accepted: 07/31/2023] [Indexed: 09/14/2023]
Abstract
Insects show highly complicated adaptive and sophisticated behaviors, including spatial orientation skills, learning ability, and social interaction. These behaviors are controlled by the insect brain, the central part of the nervous system. The tiny insect brain consists of millions of highly differentiated and interconnected cells forming a complex network. Decades of research has gone into an understanding of which parts of the insect brain possess particular behaviors, but exactly how they modulate these functional consequences needs to be clarified. Detailed description of the brain and behavior is required to decipher the complexity of cell types, as well as their connectivity and function. Single-cell RNA-sequencing (scRNA-seq) has emerged recently as a breakthrough technology to understand the transcriptome at cellular resolution. With scRNA-seq, it is possible to uncover the cellular heterogeneity of brain cells and elucidate their specific functions and state. In this review, we first review the basic structure of insect brains and the links to insect behaviors mainly focusing on learning and memory. Then the scRNA applications on insect brains are introduced by representative studies. Single-cell RNA-seq has allowed researchers to classify cell subpopulations within different insect brain regions, pinpoint single-cell developmental trajectories, and identify gene regulatory networks. These developments empower the advances in neuroscience and shed light on the intricate problems in understanding insect brain functions and behaviors.
Collapse
Affiliation(s)
- Xiaofei Wang
- Institute of Plant Protection, Shandong Academy of Agricultural Sciences, Jinan, China
- College of Food Science and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Yifan Zhai
- Institute of Plant Protection, Shandong Academy of Agricultural Sciences, Jinan, China
- Key Laboratory of Natural Enemies Insects, Ministry of Agriculture and Rural Affairs, Jinan, China
- Shandong Provincial Engineering Technology Research Center on Biocontrol of Crops Diseases and In-sect Pests, Jinan, China
| | - Hao Zheng
- Institute of Plant Protection, Shandong Academy of Agricultural Sciences, Jinan, China
- College of Food Science and Nutritional Engineering, China Agricultural University, Beijing, China
| |
Collapse
|
5
|
Li H, Yu Z, Du F, Song L, Gao Y, Shi F. sscNOVA: a semi-supervised convolutional neural network for predicting functional regulatory variants in autoimmune diseases. Front Immunol 2024; 15:1323072. [PMID: 38380333 PMCID: PMC10876991 DOI: 10.3389/fimmu.2024.1323072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/15/2024] [Indexed: 02/22/2024] Open
Abstract
Genome-wide association studies (GWAS) have identified thousands of variants in the human genome with autoimmune diseases. However, identifying functional regulatory variants associated with autoimmune diseases remains challenging, largely because of insufficient experimental validation data. We adopt the concept of semi-supervised learning by combining labeled and unlabeled data to develop a deep learning-based algorithm framework, sscNOVA, to predict functional regulatory variants in autoimmune diseases and analyze the functional characteristics of these regulatory variants. Compared to traditional supervised learning methods, our approach leverages more variants' data to explore the relationship between functional regulatory variants and autoimmune diseases. Based on the experimentally curated testing dataset and evaluation metrics, we find that sscNOVA outperforms other state-of-the-art methods. Furthermore, we illustrate that sscNOVA can help to improve the prioritization of functional regulatory variants from lead single-nucleotide polymorphisms and the proxy variants in autoimmune GWAS data.
Collapse
Affiliation(s)
- Haibo Li
- School of Information Engineering, Ningxia University, Yinchuan, China
| | - Zhenhua Yu
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| | - Fang Du
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| | - Lijuan Song
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| | - Yang Gao
- School of Medical Technology, North Minzu University, Yinchuan, China
| | - Fangyuan Shi
- School of Information Engineering, Ningxia University, Yinchuan, China
- Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan, Ningxia University, Yinchuan, China
| |
Collapse
|
6
|
Traniello IM, Bukhari SA, Dibaeinia P, Serrano G, Avalos A, Ahmed AC, Sankey AL, Hernaez M, Sinha S, Zhao SD, Catchen J, Robinson GE. Single-cell dissection of aggression in honeybee colonies. Nat Ecol Evol 2023; 7:1232-1244. [PMID: 37264201 DOI: 10.1038/s41559-023-02090-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 05/09/2023] [Indexed: 06/03/2023]
Abstract
Understanding how genotypic variation results in phenotypic variation is especially difficult for collective behaviour because group phenotypes arise from complex interactions among group members. A genome-wide association study identified hundreds of genes associated with colony-level variation in honeybee aggression, many of which also showed strong signals of positive selection, but the influence of these 'colony aggression genes' on brain function was unknown. Here we use single-cell (sc) transcriptomics and gene regulatory network (GRN) analyses to test the hypothesis that genetic variation for colony aggression influences individual differences in brain gene expression and/or gene regulation. We compared soldiers, which respond to territorial intrusion with stinging attacks, and foragers, which do not. Colony environment showed stronger influences on soldier-forager differences in brain gene regulation compared with brain gene expression. GRN plasticity was strongly associated with colony aggression, with larger differences in GRN dynamics detected between soldiers and foragers from more aggressive relative to less aggressive colonies. The regulatory dynamics of subnetworks composed of genes associated with colony aggression genes were more strongly correlated with each other across different cell types and brain regions relative to other genes, especially in brain regions involved with olfaction and vision and multimodal sensory integration, which are known to mediate bee aggression. These results show how group genetics can shape a collective phenotype by modulating individual brain gene regulatory network architecture.
Collapse
Affiliation(s)
- Ian M Traniello
- Neuroscience Program, University of Illinois at Urbana-Champaign (UIUC), Urbana, IL, USA.
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA.
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| | | | | | - Guillermo Serrano
- Computational Biology Program, CIMA University of Navarra, Pamplona, Spain
| | - Arian Avalos
- Honey Bee Breeding, Genetics and Physiology Research Laboratory, Agricultural Research Services, United States Department of Agriculture, Baton Rouge, LA, USA
| | - Amy Cash Ahmed
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
| | - Alison L Sankey
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
| | - Mikel Hernaez
- Computational Biology Program, CIMA University of Navarra, Pamplona, Spain
| | - Saurabh Sinha
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
- Department of Computer Science, UIUC, Urbana, IL, USA
| | - Sihai Dave Zhao
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
- Department of Statistics, UIUC, Urbana, IL, USA
| | - Julian Catchen
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA
- Department of Evolution, Ecology and Behavior, UIUC, Urbana, IL, USA
| | - Gene E Robinson
- Neuroscience Program, University of Illinois at Urbana-Champaign (UIUC), Urbana, IL, USA.
- Carl R Woese Institute for Genomic Biology, UIUC, Urbana, IL, USA.
- Department of Entomology, UIUC, Urbana, IL, USA.
| |
Collapse
|
7
|
Baur B, Shin J, Schreiber J, Zhang S, Zhang Y, Manjunath M, Song JS, Stafford Noble W, Roy S. Leveraging epigenomes and three-dimensional genome organization for interpreting regulatory variation. PLoS Comput Biol 2023; 19:e1011286. [PMID: 37428809 PMCID: PMC10358954 DOI: 10.1371/journal.pcbi.1011286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 06/20/2023] [Indexed: 07/12/2023] Open
Abstract
Understanding the impact of regulatory variants on complex phenotypes is a significant challenge because the genes and pathways that are targeted by such variants and the cell type context in which regulatory variants operate are typically unknown. Cell-type-specific long-range regulatory interactions that occur between a distal regulatory sequence and a gene offer a powerful framework for examining the impact of regulatory variants on complex phenotypes. However, high-resolution maps of such long-range interactions are available only for a handful of cell types. Furthermore, identifying specific gene subnetworks or pathways that are targeted by a set of variants is a significant challenge. We have developed L-HiC-Reg, a Random Forests regression method to predict high-resolution contact counts in new cell types, and a network-based framework to identify candidate cell-type-specific gene networks targeted by a set of variants from a genome-wide association study (GWAS). We applied our approach to predict interactions in 55 Roadmap Epigenomics Mapping Consortium cell types, which we used to interpret regulatory single nucleotide polymorphisms (SNPs) in the NHGRI-EBI GWAS catalogue. Using our approach, we performed an in-depth characterization of fifteen different phenotypes including schizophrenia, coronary artery disease (CAD) and Crohn's disease. We found differentially wired subnetworks consisting of known as well as novel gene targets of regulatory SNPs. Taken together, our compendium of interactions and the associated network-based analysis pipeline leverages long-range regulatory interactions to examine the context-specific impact of regulatory variation in complex phenotypes.
Collapse
Affiliation(s)
- Brittany Baur
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Jacob Schreiber
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
| | - Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Yi Zhang
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Mohith Manjunath
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jun S Song
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - William Stafford Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
8
|
Exploration of Tools for the Interpretation of Human Non-Coding Variants. Int J Mol Sci 2022; 23:ijms232112977. [PMID: 36361767 PMCID: PMC9654743 DOI: 10.3390/ijms232112977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/17/2022] [Accepted: 10/23/2022] [Indexed: 02/01/2023] Open
Abstract
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Collapse
|
9
|
Schipper M, Posthuma D. "Demystifying non-coding GWAS variants: an overview of computational tools and methods.". Hum Mol Genet 2022; 31:R73-R83. [PMID: 35972862 PMCID: PMC9585674 DOI: 10.1093/hmg/ddac198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/11/2022] [Accepted: 08/11/2022] [Indexed: 02/01/2023] Open
Abstract
Genome-wide association studies (GWAS) have found the majority of disease-associated variants to be non-coding. Major efforts into the charting of the non-coding regulatory landscapes have allowed for the development of tools and methods which aim to aid in the identification of causal variants and their mechanism of action. In this review, we give an overview of current tools and methods for the analysis of non-coding GWAS variants in disease. We provide a workflow that allows for the accumulation of in silico evidence to generate novel hypotheses on mechanisms underlying disease and prioritize targets for follow-up study using non-coding GWAS variants. Lastly, we discuss the need for comprehensive benchmarks and novel tools for the analysis of non-coding variants.
Collapse
Affiliation(s)
- Marijn Schipper
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam, De Boelelaan 1105 1081HV Amsterdam, The Netherlands
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, VU University Amsterdam, De Boelelaan 1105 1081HV Amsterdam, The Netherlands
| |
Collapse
|
10
|
Cao Z, Huang Y, Duan R, Jin P, Qin ZS, Zhang S. Disease category-specific annotation of variants using an ensemble learning framework. Brief Bioinform 2021; 23:6394995. [PMID: 34643213 DOI: 10.1093/bib/bbab438] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 09/03/2021] [Accepted: 09/22/2021] [Indexed: 02/01/2023] Open
Abstract
Understanding the impact of non-coding sequence variants on complex diseases is an essential problem. We present a novel ensemble learning framework-CASAVA, to predict genomic loci in terms of disease category-specific risk. Using disease-associated variants identified by GWAS as training data, and diverse sequencing-based genomics and epigenomics profiles as features, CASAVA provides risk prediction of 24 major categories of diseases throughout the human genome. Our studies showed that CASAVA scores at a genomic locus provide a reasonable prediction of the disease-specific and disease category-specific risk prediction for non-coding variants located within the locus. Taking MHC2TA and immune system diseases as an example, we demonstrate the potential of CASAVA in revealing variant-disease associations. A website (http://zhanglabtools.org/CASAVA) has been built to facilitate easily access to CASAVA scores.
Collapse
Affiliation(s)
- Zhen Cao
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanting Huang
- Department of Computer Science, Emory University, Atlanta, GA 30322, USA
| | - Ran Duan
- Department of Software Engineering, Yunnan University, Kunming 650500, China
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Zhaohui S Qin
- Department of Computer Science, Emory University, Atlanta, GA 30322, USA.,Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.,Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
11
|
Jin Y, Jiang J, Wang R, Qin ZS. Systematic Evaluation of DNA Sequence Variations on in vivo Transcription Factor Binding Affinity. Front Genet 2021; 12:667866. [PMID: 34567058 PMCID: PMC8458901 DOI: 10.3389/fgene.2021.667866] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 08/02/2021] [Indexed: 02/01/2023] Open
Abstract
The majority of the single nucleotide variants (SNVs) identified by genome-wide association studies (GWAS) fall outside of the protein-coding regions. Elucidating the functional implications of these variants has been a major challenge. A possible mechanism for functional non-coding variants is that they disrupted the canonical transcription factor (TF) binding sites that affect the in vivo binding of the TF. However, their impact varies since many positions within a TF binding motif are not well conserved. Therefore, simply annotating all variants located in putative TF binding sites may overestimate the functional impact of these SNVs. We conducted a comprehensive survey to study the effect of SNVs on the TF binding affinity. A sequence-based machine learning method was used to estimate the change in binding affinity for each SNV located inside a putative motif site. From the results obtained on 18 TF binding motifs, we found that there is a substantial variation in terms of a SNV’s impact on TF binding affinity. We found that only about 20% of SNVs located inside putative TF binding sites would likely to have significant impact on the TF-DNA binding.
Collapse
Affiliation(s)
- Yutong Jin
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| | - Jiahui Jiang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| | - Ruixuan Wang
- College of Environmental Sciences and Engineering, Peking University, Beijing, China
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, United States
| |
Collapse
|
12
|
Xie X, Kendzior MC, Ge X, Mainzer LS, Sinha S. VarSAn: associating pathways with a set of genomic variants using network analysis. Nucleic Acids Res 2021; 49:8471-8487. [PMID: 34313777 PMCID: PMC8421213 DOI: 10.1093/nar/gkab624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 05/18/2021] [Accepted: 07/20/2021] [Indexed: 02/01/2023] Open
Abstract
There is a pressing need today to mechanistically interpret sets of genomic variants associated with diseases. Here we present a tool called ‘VarSAn’ that uses a network analysis algorithm to identify pathways relevant to a given set of variants. VarSAn analyzes a configurable network whose nodes represent variants, genes and pathways, using a Random Walk with Restarts algorithm to rank pathways for relevance to the given variants, and reports P-values for pathway relevance. It treats non-coding and coding variants differently, properly accounts for the number of pathways impacted by each variant and identifies relevant pathways even if many variants do not directly impact genes of the pathway. We use VarSAn to identify pathways relevant to variants related to cancer and several other diseases, as well as drug response variation. We find VarSAn's pathway ranking to be complementary to the standard approach of enrichment tests on genes related to the query set. We adopt a novel benchmarking strategy to quantify its advantage over this baseline approach. Finally, we use VarSAn to discover key pathways, including the VEGFA-VEGFR2 pathway, related to de novo variants in patients of Hypoplastic Left Heart Syndrome, a rare and severe congenital heart defect.
Collapse
Affiliation(s)
- Xiaoman Xie
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Matthew C Kendzior
- National Center for Supercomputing Applications, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Xiyu Ge
- Department of Molecular and Integrative Physiology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Liudmila S Mainzer
- National Center for Supercomputing Applications, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.,Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL, 61801, USA.,Cancer Center of Illinois, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
13
|
Yang H, Zhuang Z, Pan W. A graph convolutional neural network for gene expression data analysis with multiple gene networks. Stat Med 2021; 40:5547-5564. [PMID: 34258781 DOI: 10.1002/sim.9140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 04/07/2021] [Accepted: 06/21/2021] [Indexed: 02/01/2023]
Abstract
Spectral graph convolutional neural networks (GCN) are proposed to incorporate important information contained in graphs such as gene networks. In a standard spectral GCN, there is only one gene network to describe the relationships among genes. However, for genomic applications, due to condition- or tissue-specific gene function and regulation, multiple gene networks may be available; it is unclear how to apply GCNs to disease classification with multiple networks. Besides, which gene networks may provide more effective prior information for a given learning task is unknown a priori and is not straightforward to discover in many cases. A deep multiple graph convolutional neural network is therefore developed here to meet the challenge. The new approach not only computes a feature of a gene as the weighted average of those of itself and its neighbors through spectral GCNs, but also extracts features from gene-specific expression (or other feature) profiles via a feed-forward neural networks (FNN). We also provide two measures, the importance of a given gene and the relative importance score of each gene network, for the genes' and gene networks' contributions, respectively, to the learning task. To evaluate the new method, we conduct real data analyses using several breast cancer and diffuse large B-cell lymphoma datasets and incorporating multiple gene networks obtained from "GIANT 2.0" Compared with the standard FNN, GCN, and random forest, the new method not only yields high classification accuracy but also prioritizes the most important genes confirmed to be highly associated with cancer, strongly suggesting the usefulness of the new method in incorporating multiple gene networks.
Collapse
Affiliation(s)
- Hu Yang
- School of Information, Central University of Finance and Economics, Beijing, China
| | - Zhong Zhuang
- Department of EECE, University of Minnesota, Minneapolis, Minnesota, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
14
|
Lu Y, Wu Y, Liu Y, Li Y, Jing R, Li M. Prediction of disease-associated functional variants in noncoding regions through a comprehensive analysis by integrating datasets and features. Hum Mutat 2021; 42:667-684. [PMID: 33822436 DOI: 10.1002/humu.24203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 02/01/2021] [Accepted: 03/31/2021] [Indexed: 02/01/2023]
Abstract
One of the greatest challenges in human genetics is deciphering the link between functional variants in noncoding sequences and the pathophysiology of complex diseases. To address this issue, many methods have been developed to sort functional single-nucleotide variants (SNVs) for neutral SNVs in noncoding regions. In this study, we integrated well-established features and commonly used datasets and merged them into large-scale datasets based on a random forest model, which yielded promising performance and outperformed some cutting-edge approaches. Our analyses of feature importance and data coverage also provide certain clues for future research in enhancing the prediction of functional noncoding SNVs.
Collapse
Affiliation(s)
- Yu Lu
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Yiming Wu
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Yuan Liu
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Yizhou Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| | - Runyu Jing
- College of Cybersecurity, Sichuan University, Chengdu, Sichuan, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
15
|
Discover novel disease-associated genes based on regulatory networks of long-range chromatin interactions. Methods 2020; 189:22-33. [PMID: 33096239 DOI: 10.1016/j.ymeth.2020.10.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 08/29/2020] [Accepted: 10/18/2020] [Indexed: 02/01/2023] Open
Abstract
Identifying genes and non-coding genetic variants that are genetically associated with complex diseases and the underlying mechanisms is one of the most important questions in functional genomics. Due to the limited statistical power and the lack of mechanistic modeling, traditional genome-wide association studies (GWAS) is restricted to fully address this question. Based on multi-omics data integration, cell-type specific regulatory networks can be built to improve GWAS analysis. In this study, we developed a new computational infrastructure, APRIL, to incorporate 3D chromatin interactions into regulatory network construction, which can extend the networks to include long-range cis-regulatory links between non-coding GWAS SNPs and target genes. Combinatorial transcription factors that co-regulate groups of genes are also inferred to further expand the networks with trans-regulation. A suite of machine learning predictions and statistical tests are incorporated in APRIL to predict novel disease-associated genes based on the expanded regulatory networks. Important features of non-coding regulatory elements and genetic variants are prioritized in network-based predictions, providing systems-level insights on the mechanisms of transcriptional dysregulation associated with complex diseases.
Collapse
|
16
|
Zhu C, Miller M, Zeng Z, Wang Y, Mahlich Y, Aptekmann A, Bromberg Y. Computational Approaches for Unraveling the Effects of Variation in the Human Genome and Microbiome. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-030320-041014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The past two decades of analytical efforts have highlighted how much more remains to be learned about the human genome and, particularly, its complex involvement in promoting disease development and progression. While numerous computational tools exist for the assessment of the functional and pathogenic effects of genome variants, their precision is far from satisfactory, particularly for clinical use. Accumulating evidence also suggests that the human microbiome's interaction with the human genome plays a critical role in determining health and disease states. While numerous microbial taxonomic groups and molecular functions of the human microbiome have been associated with disease, the reproducibility of these findings is lacking. The human microbiome–genome interaction in healthy individuals is even less well understood. This review summarizes the available computational methods built to analyze the effect of variation in the human genome and microbiome. We address the applicability and precision of these methods across their possible uses. We also briefly discuss the exciting, necessary, and now possible integration of the two types of data to improve the understanding of pathogenicity mechanisms.
Collapse
Affiliation(s)
- Chengsheng Zhu
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Yanran Wang
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Yannick Mahlich
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Ariel Aptekmann
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey 08873, USA;,
- Department of Genetics, Rutgers University, Piscataway, New Jersey 08854, USA
| |
Collapse
|
17
|
Liu S, Yu Y, Zhang S, Cole JB, Tenesa A, Wang T, McDaneld TG, Ma L, Liu GE, Fang L. Epigenomics and genotype-phenotype association analyses reveal conserved genetic architecture of complex traits in cattle and human. BMC Biol 2020; 18:80. [PMID: 32620158 PMCID: PMC7334855 DOI: 10.1186/s12915-020-00792-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 05/12/2020] [Indexed: 02/01/2023] Open
Abstract
Background Lack of comprehensive functional annotations across a wide range of tissues and cell types severely hinders the biological interpretations of phenotypic variation, adaptive evolution, and domestication in livestock. Here we used a combination of comparative epigenomics, genome-wide association study (GWAS), and selection signature analysis, to shed light on potential adaptive evolution in cattle. Results We cross-mapped 8 histone marks of 1300 samples from human to cattle, covering 178 unique tissues/cell types. By uniformly analyzing 723 RNA-seq and 40 whole genome bisulfite sequencing (WGBS) datasets in cattle, we validated that cross-mapped histone marks captured tissue-specific expression and methylation, reflecting tissue-relevant biology. Through integrating cross-mapped tissue-specific histone marks with large-scale GWAS and selection signature results, we for the first time detected relevant tissues and cell types for 45 economically important traits and artificial selection in cattle. For instance, immune tissues are significantly associated with health and reproduction traits, multiple tissues for milk production and body conformation traits (reflecting their highly polygenic architecture), and thyroid for the different selection between beef and dairy cattle. Similarly, we detected relevant tissues for 58 complex traits and diseases in humans and observed that immune and fertility traits in humans significantly correlated with those in cattle in terms of relevant tissues, which facilitated the identification of causal genes for such traits. For instance, PIK3CG, a gene highly specifically expressed in mononuclear cells, was significantly associated with both age-at-menopause in human and daughter-still-birth in cattle. ICAM, a T cell-specific gene, was significantly associated with both allergic diseases in human and metritis in cattle. Conclusion Collectively, our results highlighted that comparative epigenomics in conjunction with GWAS and selection signature analyses could provide biological insights into the phenotypic variation and adaptive evolution. Cattle may serve as a model for human complex traits, by providing additional information beyond laboratory model organisms, particularly when more novel phenotypes become available in the near future.
Collapse
Affiliation(s)
- Shuli Liu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA.,College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ying Yu
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Shengli Zhang
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - John B Cole
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA
| | - Albert Tenesa
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK.,The Roslin Institute, University of Edinburgh, Edinburgh, EH25 9RG, UK
| | - Ting Wang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, 63110, USA
| | - Tara G McDaneld
- US Meat Animal Research Center, Agricultural Research Service, USDA, Clay Center, NE, 68933, USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, 20742, USA.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA.
| | - Lingzhao Fang
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, BARC-East, Beltsville, MD, 20705, USA. .,MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK. .,Department of Animal and Avian Sciences, University of Maryland, College Park, MD, 20742, USA.
| |
Collapse
|
18
|
Ma X, Sun P, Gong M. An integrative framework of heterogeneous genomic data for cancer dynamic modules based on matrix decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 19:305-316. [PMID: 32750874 DOI: 10.1109/tcbb.2020.3004808] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Cancer progression is dynamic, and tracking dynamic modules is promising for cancer diagnosis and therapy. Accumulated genomic data provide us an opportunity to investigate the underlying mechanisms of cancers. However, as far as we know, no algorithm has been designed for dynamic modules by integrating heterogeneous omics data. To address this issue, we propose an integrative framework for dynamic module detection based on regularized nonnegative matrix factorization method (DrNMF) by integrating the gene expression and protein interaction network. To remove the heterogeneity of genomic data, we divide the samples of expression profiles into groups to construct gene co-expression networks. To characterize the dynamics of modules, the temporal smoothness framework is adopted, in which the gene co-expression network at the previous stage and protein interaction network are incorporated into the objective function of DrNMF via regularization. The experimental results demonstrate that DrNMF is superior to state-of-the-art methods in terms of accuracy. For breast cancer data, the obtained dynamic modules are more enriched by the known pathways, and can be used to predict the stages of cancers and survival time of patients. The proposed model and algorithm provide an effective integrative analysis of heterogeneous genomic data for cancer progression.
Collapse
|
19
|
Ohnmacht J, May P, Sinkkonen L, Krüger R. Missing heritability in Parkinson's disease: the emerging role of non-coding genetic variation. J Neural Transm (Vienna) 2020; 127:729-748. [PMID: 32248367 PMCID: PMC7242266 DOI: 10.1007/s00702-020-02184-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 03/24/2020] [Indexed: 02/01/2023]
Abstract
Parkinson's disease (PD) is a neurodegenerative disorder caused by a complex interplay of genetic and environmental factors. For the stratification of PD patients and the development of advanced clinical trials, including causative treatments, a better understanding of the underlying genetic architecture of PD is required. Despite substantial efforts, genome-wide association studies have not been able to explain most of the observed heritability. The majority of PD-associated genetic variants are located in non-coding regions of the genome. A systematic assessment of their functional role is hampered by our incomplete understanding of genotype-phenotype correlations, for example through differential regulation of gene expression. Here, the recent progress and remaining challenges for the elucidation of the role of non-coding genetic variants is reviewed with a focus on PD as a complex disease with multifactorial origins. The function of gene regulatory elements and the impact of non-coding variants on them, and the means to map these elements on a genome-wide level, will be delineated. Moreover, examples of how the integration of functional genomic annotations can serve to identify disease-associated pathways and to prioritize disease- and cell type-specific regulatory variants will be given. Finally, strategies for functional validation and considerations for suitable model systems are outlined. Together this emphasizes the contribution of rare and common genetic variants to the complex pathogenesis of PD and points to remaining challenges for the dissection of genetic complexity that may allow for better stratification, improved diagnostics and more targeted treatments for PD in the future.
Collapse
Affiliation(s)
- Jochen Ohnmacht
- LCSB, University of Luxembourg, Belvaux, Luxembourg
- Department of Life Sciences and Medicine (DLSM), University of Luxembourg, Belvaux, Luxembourg
| | - Patrick May
- LCSB, University of Luxembourg, Belvaux, Luxembourg
| | - Lasse Sinkkonen
- Department of Life Sciences and Medicine (DLSM), University of Luxembourg, Belvaux, Luxembourg
| | - Rejko Krüger
- LCSB, University of Luxembourg, Belvaux, Luxembourg.
- Luxembourg Institute of Health (LIH), Transversal Translational Medicine, Strassen, Luxembourg.
- Parkinson Research Clinic, Centre Hospitalier de Luxembourg (CHL), Luxembourg, Luxembourg.
| |
Collapse
|
20
|
Yao Y, Ramsey SA. CERENKOV3: Clustering and molecular network-derived features improve computational prediction of functional noncoding SNPs. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020; 25:535-546. [PMID: 31797625 PMCID: PMC6897322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Identification of causal noncoding single nucleotide polymorphisms (SNPs) is important for maximizing the knowledge dividend from human genome-wide association studies (GWAS). Recently, diverse machine learning-based methods have been used for functional SNP identification; however, this task remains a fundamental challenge in computational biology. We report CERENKOV3, a machine learning pipeline that leverages clustering-derived and molecular network-derived features to improve prediction accuracy of regulatory SNPs (rSNPs) in the context of post-GWAS analysis. The clustering-derived feature, locus size (number of SNPs in the locus), derives from our locus partitioning procedure and represents the sizes of clusters based on SNP locations. We generated two molecular network-derived features from representation learning on a network representing SNP-gene and gene-gene relations. Based on empirical studies using a ground-truth SNP dataset, CERENKOV3 significantly improves rSNP recognition performance in AUPRC, AUROC, and AVGRANK (a locus-wise rank-based measure of classification accuracy we previously proposed).
Collapse
Affiliation(s)
- Yao Yao
- School of Electrical Engineering and Computer Science, Oregon State University
| | - Stephen A. Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University,Department of Biomedical Sciences, Oregon State University Corvallis, OR 97330, USA
| |
Collapse
|
21
|
Golicz AA, Bayer PE, Bhalla PL, Batley J, Edwards D. Pangenomics Comes of Age: From Bacteria to Plant and Animal Applications. Trends Genet 2019; 36:132-145. [PMID: 31882191 DOI: 10.1016/j.tig.2019.11.006] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 11/09/2019] [Accepted: 11/12/2019] [Indexed: 02/01/2023]
Abstract
The pangenome refers to a collection of genomic sequence found in the entire species or population rather than in a single individual; the sequence can be core, present in all individuals, or accessory (variable or dispensable), found in a subset of individuals only. While pangenomic studies were first undertaken in bacterial species, developments in genome sequencing and assembly approaches have allowed construction of pangenomes for eukaryotic organisms, fungi, plants, and animals, including two large-scale human pangenome projects. Analysis of the these pangenomes revealed key differences, most likely stemming from divergent evolutionary histories, but also surprising similarities.
Collapse
Affiliation(s)
- Agnieszka A Golicz
- Plant Molecular Biology and Biotechnology Laboratory, Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, VIC, Australia.
| | - Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Crawley, WA, Australia
| | - Prem L Bhalla
- Plant Molecular Biology and Biotechnology Laboratory, Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, VIC, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Crawley, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Crawley, WA, Australia.
| |
Collapse
|
22
|
Pachganov S, Murtazalieva K, Zarubin A, Sokolov D, Chartier DR, Tatarinova TV. TransPrise: a novel machine learning approach for eukaryotic promoter prediction. PeerJ 2019; 7:e7990. [PMID: 31695967 PMCID: PMC6827441 DOI: 10.7717/peerj.7990] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/04/2019] [Indexed: 02/01/2023] Open
Abstract
As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise-an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. Our pipeline consists of two parts: the binary classifier operates the first, and if a sequence is classified as TSS-containing the regression step follows, where the precise location of TSS is being identified. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise classification and regression models with the TSSPlant approach for the well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. The Matthews correlation coefficient value for TransPrise is 0.79, more than two times larger than the 0.31 for TSSPlant classification models. This represents a high level of prediction accuracy. Additionally, the mean absolute error for the regression model is 29.19 nt, allowing for accurate prediction of TSS location. TransPrise was also tested in Homo sapiens, where mean absolute error of the regression model was 47.986 nt. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at (http://compubioverne.group/). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.
Collapse
Affiliation(s)
- Stepan Pachganov
- Ugra Research Institute of Information Technologies, Khanty-Mansiysk, Russia
| | - Khalimat Murtazalieva
- Vavilov Institute for General Genetics, Moscow, Russia.,Institute of Bioinformatics, Moscow, Russia
| | - Aleksei Zarubin
- Tomsk National Research Medical Center of the Russian Academy of Sciences, Research Institute of Medical Genetics, Tomsk, Russia
| | | | - Duane R Chartier
- International Center for Art Intelligence, Inc., Los Angeles, CA, United States of America
| | - Tatiana V Tatarinova
- Vavilov Institute for General Genetics, Moscow, Russia.,Department of Biology, University of La Verne, La Verne, CA, United States of America.,A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Siberian Federal University, Krasnoyarsk, Russia
| |
Collapse
|
23
|
Rojano E, Seoane P, Ranea JAG, Perkins JR. Regulatory variants: from detection to predicting impact. Brief Bioinform 2019; 20:1639-1654. [PMID: 29893792 PMCID: PMC6917219 DOI: 10.1093/bib/bby039] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 04/18/2018] [Indexed: 02/01/2023] Open
Abstract
Variants within non-coding genomic regions can greatly affect disease. In recent years, increasing focus has been given to these variants, and how they can alter regulatory elements, such as enhancers, transcription factor binding sites and DNA methylation regions. Such variants can be considered regulatory variants. Concurrently, much effort has been put into establishing international consortia to undertake large projects aimed at discovering regulatory elements in different tissues, cell lines and organisms, and probing the effects of genetic variants on regulation by measuring gene expression. Here, we describe methods and techniques for discovering disease-associated non-coding variants using sequencing technologies. We then explain the computational procedures that can be used for annotating these variants using the information from the aforementioned projects, and prediction of their putative effects, including potential pathogenicity, based on rule-based and machine learning approaches. We provide the details of techniques to validate these predictions, by mapping chromatin-chromatin and chromatin-protein interactions, and introduce Clustered Regularly Interspaced Short Palindromic Repeats-Associated Protein 9 (CRISPR-Cas9) technology, which has already been used in this field and is likely to have a big impact on its future evolution. We also give examples of regulatory variants associated with multiple complex diseases. This review is aimed at bioinformaticians interested in the characterization of regulatory variants, molecular biologists and geneticists interested in understanding more about the nature and potential role of such variants from a functional point of views, and clinicians who may wish to learn about variants in non-coding genomic regions associated with a given disease and find out what to do next to uncover how they impact on the underlying mechanisms.
Collapse
Affiliation(s)
- Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - Juan A G Ranea
- CIBER de Enfermedades Raras, ISCIII, Madrid, Spain and Department of Molecular Biology and Biochemistry, University of Malaga (UMA), 29010 Malaga, Spain
| | - James R Perkins
- Research laboratory, IBIMA-Regional University Hospital of Malaga, UMA, Malaga 29009, Spain
| |
Collapse
|
24
|
Conte F, Fiscon G, Licursi V, Bizzarri D, D'Antò T, Farina L, Paci P. A paradigm shift in medicine: A comprehensive review of network-based approaches. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194416. [PMID: 31382052 DOI: 10.1016/j.bbagrm.2019.194416] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 07/19/2019] [Accepted: 07/28/2019] [Indexed: 02/01/2023]
Abstract
Network medicine is a rapidly evolving new field of medical research, which combines principles and approaches of systems biology and network science, holding the promise to uncovering the causes and to revolutionize the diagnosis and treatments of human diseases. This new paradigm reflects the fact that human diseases are not caused by single molecular defects, but driven by complex interactions among a variety of molecular mediators. The complexity of these interactions embraces different types of information: from the cellular-molecular level of protein-protein interactions to correlational studies of gene expression and regulation, to metabolic and disease pathways up to drug-disease relationships. The analysis of these complex networks can reveal new disease genes and/or disease pathways and identify possible targets for new drug development, as well as new uses for existing drugs. In this review, we offer a comprehensive overview of network types and algorithms used in the framework of network medicine. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Federica Conte
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| | - Giulia Fiscon
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy.
| | - Valerio Licursi
- Biology and Biotechnology Department "Charles Darwin" (BBCD), Sapienza University of Rome, Rome, Italy
| | - Daniele Bizzarri
- Department of Internal Medicine and Medical Specialties, Sapienza University of Rome, Rome, Italy
| | - Tommaso D'Antò
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Lorenzo Farina
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Paola Paci
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| |
Collapse
|
25
|
van der Wijst MGP, de Vries DH, Brugge H, Westra HJ, Franke L. An integrative approach for building personalized gene regulatory networks for precision medicine. Genome Med 2018; 10:96. [PMID: 30567569 PMCID: PMC6299585 DOI: 10.1186/s13073-018-0608-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Only a small fraction of patients respond to the drug prescribed to treat their disease, which means that most are at risk of unnecessary exposure to side effects through ineffective drugs. This inter-individual variation in drug response is driven by differences in gene interactions caused by each patient's genetic background, environmental exposures, and the proportions of specific cell types involved in disease. These gene interactions can now be captured by building gene regulatory networks, by taking advantage of RNA velocity (the time derivative of the gene expression state), the ability to study hundreds of thousands of cells simultaneously, and the falling price of single-cell sequencing. Here, we propose an integrative approach that leverages these recent advances in single-cell data with the sensitivity of bulk data to enable the reconstruction of personalized, cell-type- and context-specific gene regulatory networks. We expect this approach will allow the prioritization of key driver genes for specific diseases and will provide knowledge that opens new avenues towards improved personalized healthcare.
Collapse
Affiliation(s)
- Monique G P van der Wijst
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Dylan H de Vries
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Harm Brugge
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Harm-Jan Westra
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Lude Franke
- Department of Genetics, 5th floor ERIBA building, Antonius Deusinglaan 1, 9713AV Groningen, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
| |
Collapse
|
26
|
Capriotti E, Ozturk K, Carter H. Integrating molecular networks with genetic variant interpretation for precision medicine. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2018; 11:e1443. [PMID: 30548534 PMCID: PMC6450710 DOI: 10.1002/wsbm.1443] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 10/23/2018] [Accepted: 10/30/2018] [Indexed: 02/01/2023]
Abstract
More reliable and cheaper sequencing technologies have revealed the vast mutational landscapes characteristic of many phenotypes. The analysis of such genetic variants has led to successful identification of altered proteins underlying many Mendelian disorders. Nevertheless the simple one‐variant one‐phenotype model valid for many monogenic diseases does not capture the complexity of polygenic traits and disorders. Although experimental and computational approaches have improved detection of functionally deleterious variants and important interactions between gene products, the development of comprehensive models relating genotype and phenotypes remains a challenge in the field of genomic medicine. In this context, a new view of the pathologic state as significant perturbation of the network of interactions between biomolecules is crucial for the identification of biochemical pathways associated with complex phenotypes. Seminal studies in systems biology combined the analysis of genetic variation with protein–protein interaction networks to demonstrate that even as biological systems evolve to be robust to genetic variation, their topologies create disease vulnerabilities. More recent analyses model the impact of genetic variants as changes to the “wiring” of the interactome to better capture heterogeneity in genotype–phenotype relationships. These studies lay the foundation for using networks to predict variant effects at scale using machine‐learning or algorithmic approaches. A wealth of databases and resources for the annotation of genotype–phenotype relationships have been developed to support developments in this area. This overview describes how study of the molecular interactome has generated insights linking the organization of biological systems to disease mechanism, and how this information can enable precision medicine. This article is categorized under:
Translational, Genomic, and Systems Medicine > Translational Medicine Biological Mechanisms > Cell Signaling Models of Systems Properties and Processes > Mechanistic Models Analytical and Computational Methods > Computational Methods
Collapse
Affiliation(s)
- Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Kivilcim Ozturk
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, California
| | - Hannah Carter
- Department of Medicine and Institute for Genomic Medicine, University of California, San Diego, La Jolla, California
| |
Collapse
|
27
|
Malkowska M, Zubek J, Plewczynski D, Wyrwicz LS. ShapeGTB: the role of local DNA shape in prioritization of functional variants in human promoters with machine learning. PeerJ 2018; 6:e5742. [PMID: 30519505 PMCID: PMC6275119 DOI: 10.7717/peerj.5742] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 09/13/2018] [Indexed: 02/01/2023] Open
Abstract
Motivation The identification of functional sequence variations in regulatory DNA regions is one of the major challenges of modern genetics. Here, we report results of a combined multifactor analysis of properties characterizing functional sequence variants located in promoter regions of genes. Results We demonstrate that GC-content of the local sequence fragments and local DNA shape features play significant role in prioritization of functional variants and outscore features related to histone modifications, transcription factors binding sites, or evolutionary conservation descriptors. Those observations allowed us to build specialized machine learning classifier identifying functional single nucleotide polymorphisms within promoter regions—ShapeGTB. We compared our method with more general tools predicting pathogenicity of all non-coding variants. ShapeGTB outperformed them by a wide margin (average precision 0.93 vs. 0.47–0.55). On the external validation set based on ClinVar database it displayed worse performance but was still competitive with other methods (average precision 0.47 vs. 0.23–0.42). Such results suggest unique characteristics of mutations located within promoter regions and are a promising signal for the development of more accurate variant prioritization tools in the future.
Collapse
Affiliation(s)
- Maja Malkowska
- Laboratory of Bioinformatics and Biostatistics, Maria Sklodowska-Curie Memorial Cancer Centre and Institute of Oncology, Warsaw, Poland
| | - Julian Zubek
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Lucjan S Wyrwicz
- Laboratory of Bioinformatics and Biostatistics, Maria Sklodowska-Curie Memorial Cancer Centre and Institute of Oncology, Warsaw, Poland
| |
Collapse
|
28
|
Awany D, Allali I, Chimusa ER. Tantalizing dilemma in risk prediction from disease scoring statistics. Brief Funct Genomics 2018; 18:211-219. [PMID: 30605512 PMCID: PMC6609536 DOI: 10.1093/bfgp/ely040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 08/17/2018] [Accepted: 11/29/2018] [Indexed: 02/01/2023] Open
Abstract
Over the past decade, human host genome-wide association studies (GWASs) have contributed greatly to our understanding of the impact of host genetics on phenotypes. Recently, the microbiome has been recognized as a complex trait in host genetic variation, leading to microbiome GWAS (mGWASs). For these, many different statistical methods and software tools have been developed for association mapping. Applications of these methods and tools have revealed several important findings; however, the establishment of causal factors and the direction of causality in the interactive role between human genetic polymorphisms, the microbiome and the host phenotypes are still a huge challenge. Here, we review disease scoring approaches in host and mGWAS and their underlying statistical methods and tools. We highlight the challenges in pinpointing the genetic-associated causal factors in host and mGWAS and discuss the role of multi-omic approach in disease scoring statistics that may provide a better understanding of human phenotypic variation by enabling further system biological experiment to establish causality.
Collapse
Affiliation(s)
- Denis Awany
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, South Africa
| | - Imane Allali
- Computational Biology Division, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, South Africa
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, South Africa
| |
Collapse
|
29
|
Zhang E, Ma X. Regularized Multi-View Subspace Clustering for Common Modules Across Cancer Stages. Molecules 2018; 23:molecules23051016. [PMID: 29701681 PMCID: PMC6102576 DOI: 10.3390/molecules23051016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Revised: 04/23/2018] [Accepted: 04/23/2018] [Indexed: 02/01/2023] Open
Abstract
Discovering the common modules that are co-expressed across various stages can lead to an improved understanding of the underlying molecular mechanisms of cancers. There is a shortage of efficient tools for integrative analysis of gene expression and protein interaction networks for discovering common modules associated with cancer progression. To address this issue, we propose a novel regularized multi-view subspace clustering (rMV-spc) algorithm to obtain a representation matrix for each stage and a joint representation matrix that balances the agreement across various stages. To avoid the heterogeneity of data, the protein interaction network is incorporated into the objective of rMV-spc via regularization. Based on the interior point algorithm, we solve the optimization problem to obtain the common modules. By using artificial networks, we demonstrate that the proposed algorithm outperforms state-of-the-art methods in terms of accuracy. Furthermore, the rMV-spc discovers common modules in breast cancer networks based on the breast data, and these modules serve as biomarkers to predict stages of breast cancer. The proposed model and algorithm effectively integrate heterogeneous data for dynamic modules.
Collapse
Affiliation(s)
- Enli Zhang
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China.
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an 710071, Shaanxi, China.
| |
Collapse
|