1
|
Chanda P, Costa E, Hu J, Sukumar S, Van Hemert J, Walia R. Information Theory in Computational Biology: Where We Stand Today. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E627. [PMID: 33286399 PMCID: PMC7517167 DOI: 10.3390/e22060627] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 05/31/2020] [Accepted: 06/03/2020] [Indexed: 12/30/2022]
Abstract
"A Mathematical Theory of Communication" was published in 1948 by Claude Shannon to address the problems in the field of data compression and communication over (noisy) communication channels. Since then, the concepts and ideas developed in Shannon's work have formed the basis of information theory, a cornerstone of statistical learning and inference, and has been playing a key role in disciplines such as physics and thermodynamics, probability and statistics, computational sciences and biological sciences. In this article we review the basic information theory based concepts and describe their key applications in multiple major areas of research in computational biology-gene expression and transcriptomics, alignment-free sequence comparison, sequencing and error correction, genome-wide disease-gene association mapping, metabolic networks and metabolomics, and protein sequence, structure and interaction analysis.
Collapse
Affiliation(s)
- Pritam Chanda
- Corteva Agriscience™, Indianapolis, IN 46268, USA
- Computer and Information Science, Indiana University-Purdue University, Indianapolis, IN 46202, USA
| | - Eduardo Costa
- Corteva Agriscience™, Mogi Mirim, Sao Paulo 13801-540, Brazil
| | - Jie Hu
- Corteva Agriscience™, Indianapolis, IN 46268, USA
| | | | | | - Rasna Walia
- Corteva Agriscience™, Johnston, IA 50131, USA
| |
Collapse
|
2
|
Abstract
Genome-wide association studies are moving to genome-wide interaction studies, as the genetic background of many diseases appears to be more complex than previously supposed. Thus, many statistical approaches have been proposed to detect gene-gene (GxG) interactions, among them numerous information theory-based methods, inspired by the concept of entropy. These are suggested as particularly powerful and, because of their nonlinearity, as better able to capture nonlinear relationships between genetic variants and/or variables. However, the introduced entropy-based estimators differ to a surprising extent in their construction and even with respect to the basic definition of interactions. Also, not every entropy-based measure for interaction is accompanied by a proper statistical test. To shed light on this, a systematic review of the literature is presented answering the following questions: (1) How are GxG interactions defined within the framework of information theory? (2) Which entropy-based test statistics are available? (3) Which underlying distribution do the test statistics follow? (4) What are the given strengths and limitations of these test statistics?
Collapse
Affiliation(s)
| | - Inke R König
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, Lübeck, Germany
- Corresponding author. Inke R. Konig, Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany. Tel.: ++49 451 500 50610; Fax: ++49 451 500 50604; E-Mail:
| |
Collapse
|
3
|
Shi R, Li J, He J, Meng Q, Qian Z, Shi D, Liu Q, Cai Y, Li X, Chen X. Association of with-no-lysine kinase 1 and Serine/Threonine kinase 39 gene polymorphisms and haplotypes with essential hypertension in Tibetans. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2018; 59:151-160. [PMID: 28945285 DOI: 10.1002/em.22140] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Revised: 08/24/2017] [Accepted: 09/01/2017] [Indexed: 06/07/2023]
Abstract
Tibetans have a higher essential hypertension prevalence compared with other ethnics in China. The reason might be due to their unique environmental influence, as well as genetic factor. However, limited studies focus on Tibetan genetics and its association with hypertension. The aim of this study was to investigate the association between With-No-Lysine (K) Kinase 1 (WNK1), Serine/Threonine kinase 39(STK39) genes variants and hypertension in the Tibetan population. 204 Tibetan hypertensive patients and 305 normotensive controls were recruited in an epidemiological survey conducted at 2 sites in the Ganzi Tibetan autonomous region. Patients were genotyped for nineteen WNK1 candidate tag single nucleotide polymorphisms (SNPs) and three STK39 SNPs, and haplotype analysis was performed. Results showed that the allele A in rs1468326 was overrepresented in hypertensive patients versus control (53.4% vs 42.9%, P < 0.05). The multivariable-adjusted odds ratio (OR) for hypertension among CA + AA genotypes carriers was 1.60 (95% CI: 1.02-2.62, P < 0.05), and they also had a higher systolic blood pressure (136.5 ± 28.6 vs 131.7 ± 24.8 mmHg, P < 0.05). However, the TT genotype ratio in rs6749447 was lower in hypertensives (5.4% vs 10.8%, P < 0.05), and the hypertension risk for the TT genotype carriers in rs6749447 decreased after adjustment (OR 0.49, 95% CI 0.19-0.95, P < 0.05). Subjects with haplotype AGACAGGAATCGT showed 1.57 times higher risk of hypertension (95% CI 1.02-2.41, P < 0.05). In conclusion, SNP rs1468326 of WNK1, rs6749447 of STK39, and WNK1 haplotype AGACAGGAATCGT were associated with hypertension in Tibetan individuals. Environ. Mol. Mutagen. 59:151-160, 2018. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Rufeng Shi
- Department of Cardiology, West China Hospital, Sichuan University, Chengdu, Sichuan, 610000, PRC
| | - Jiangbo Li
- Department of Cardiology, West China Hospital, Sichuan University, Chengdu, Sichuan, 610000, PRC
| | - Jiyun He
- Department of Cardiology, West China Hospital, Sichuan University, Chengdu, Sichuan, 610000, PRC
| | - Qingtao Meng
- Department of Cardiology, West China Hospital, Sichuan University, Chengdu, Sichuan, 610000, PRC
| | - Zhiping Qian
- Ganzi Tibetan Autonomous Prefecture People's Hospital, Kangding 626000, Tibetan Autonomous Prefecture, PRC
| | - Di Shi
- Department of Cardiology, West China Hospital, Sichuan University, Chengdu, Sichuan, 610000, PRC
| | - Qi Liu
- Department of Cardiology, West China Hospital, Sichuan University, Chengdu, Sichuan, 610000, PRC
| | - Yali Cai
- Department of Cardiology, West China Hospital, Sichuan University, Chengdu, Sichuan, 610000, PRC
| | - Xinran Li
- Department of Cardiology, West China Hospital, Sichuan University, Chengdu, Sichuan, 610000, PRC
| | - Xiaoping Chen
- Department of Cardiology, West China Hospital, Sichuan University, Chengdu, Sichuan, 610000, PRC
| |
Collapse
|
4
|
Mielniczuk J, Teisseyre P. A deeper look at two concepts of measuring gene-gene interactions: logistic regression and interaction information revisited. Genet Epidemiol 2017; 42:187-200. [PMID: 29265411 DOI: 10.1002/gepi.22108] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 10/23/2017] [Accepted: 11/15/2017] [Indexed: 11/09/2022]
Abstract
Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures.
Collapse
Affiliation(s)
- Jan Mielniczuk
- Institute of Computer Science, Polish Academy of Sciences, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Poland
| | - Paweł Teisseyre
- Institute of Computer Science, Polish Academy of Sciences, Poland
| |
Collapse
|