1
|
Kolesnikov NA, Kharkov VN, Vagaitseva KV, Zarubin AA, Stepanov VA. Blocks identical by descent in the genomes of the indigenous population of Siberia demonstrate genetic links between populations. Vavilovskii Zhurnal Genet Selektsii 2023; 27:55-62. [PMID: 36923483 PMCID: PMC10009479 DOI: 10.18699/vjgb-23-08] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 01/17/2023] [Accepted: 01/24/2023] [Indexed: 03/18/2023] Open
Abstract
The gene pool of the indigenous population of Siberia is a unique system for studying population and evolutionary genetic processes, analyzing genetic diversity, and reconstructing the genetic history of populations. High ethnic diversity is a feature of Siberia, as one of the regions of the peripheral settlement of modern human. The vast expanses of this region and the small number of aboriginal populations contributed to the formation of significant territorial and genetic subdivision. About 40 indigenous peoples are settled on the territory of the Siberian historical and ethnographic province. Within the framework of this work, a large-scale population study of the gene pool of the indigenous peoples of Siberia was carried out for the first time at the level of high-density biochips. This makes it possible to fill in a significant gap in the genogeographic picture of the Eurasian population. For this, DNA fragments were analyzed, which had been inherited without recombination by each pair of individuals from their recent common ancestor, that is, segments (blocks) identical by descent (IBD). The distribution of IBD blocks in the populations of Siberia is in good agreement with the geographical proximity of the populations and their linguistic affiliation. Among the Siberian populations, the Chukchi, Koryaks, and Nivkhs form a separate cluster from the main Siberian group, with the Chukchi and Koryaks being more closely related. Separate subclusters of Evenks and Yakuts, Kets and Chulyms are formed within the Siberian cluster. Analysis of SNPs that fell into more IBD segments of the analyzed populations made it possible to compile a list of 5358 genes. According to the calculation results, biological processes enriched with these genes are associated with the detection of a chemical stimulus involved in the sensory perception of smell. Enriched for the genes found, molecular pathways are associated with the metabolism of linoleic, arachidonic, tyrosic acids and by olfactory transduction. At the same time, an analysis of the literature data showed that some of the selected genes, which were found in a larger number of IBD blocks in several populations at once, can play a role in genetic adaptation to environmental factors.
Collapse
Affiliation(s)
- N A Kolesnikov
- Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
| | - V N Kharkov
- Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
| | - K V Vagaitseva
- Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
| | - A A Zarubin
- Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
| | - V A Stepanov
- Research Institute of Medical Genetics, Tomsk National Research Medical Center of the Russian Academy of Sciences, Tomsk, Russia
| |
Collapse
|
2
|
Kling D, Phillips C, Kennett D, Tillmar A. Investigative genetic genealogy: Current methods, knowledge and practice. Forensic Sci Int Genet 2021; 52:102474. [PMID: 33592389 DOI: 10.1016/j.fsigen.2021.102474] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/12/2021] [Accepted: 01/27/2021] [Indexed: 12/15/2022]
Abstract
Investigative genetic genealogy (IGG) has emerged as a new, rapidly growing field of forensic science. We describe the process whereby dense SNP data, commonly comprising more than half a million markers, are employed to infer distant relationships. By distant we refer to degrees of relatedness exceeding that of first cousins. We review how methods of relationship matching and SNP analysis on an enlarged scale are used in a forensic setting to identify a suspect in a criminal investigation or a missing person. There is currently a strong need in forensic genetics not only to understand the underlying models to infer relatedness but also to fully explore the DNA technologies and data used in IGG. This review brings together many of the topics and examines their effectiveness and operational limits, while suggesting future directions for their forensic validation. We further investigated the methods used by the major direct-to-consumer (DTC) genetic ancestry testing companies as well as submitting a questionnaire where providers of forensic genetic genealogy summarized their operation/services. Although most of the DTC market, and genetic genealogy in general, has undisclosed, proprietary algorithms we review the current knowledge where information has been discussed and published more openly.
Collapse
Affiliation(s)
- Daniel Kling
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Forensic Sciences, Oslo University Hospital, Oslo, Norway.
| | - Christopher Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Santiago de Compostela, Spain.
| | - Debbie Kennett
- Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, United Kingdom
| | - Andreas Tillmar
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Biomedical and Clinical Sciences, Faculty of Medicine and Health Sciences, Linköping University, Linköping, Sweden
| |
Collapse
|
3
|
Orzechowski P, Pańszczyk A, Huang X, Moore JH. runibic: a Bioconductor package for parallel row-based biclustering of gene expression data. Bioinformatics 2018; 34:4302-4304. [PMID: 29939213 PMCID: PMC6289127 DOI: 10.1093/bioinformatics/bty512] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 05/27/2018] [Accepted: 06/22/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation Biclustering is an unsupervised technique of simultaneous clustering of rows and columns of input matrix. With multiple biclustering algorithms proposed, UniBic remains one of the most accurate methods developed so far. Results In this paper we introduce a Bioconductor package called runibic with parallel implementation of UniBic. For the convenience the algorithm was reimplemented, parallelized and wrapped within an R package called runibic. The package includes: (i) a couple of times faster parallel version of the original sequential algorithm, (ii) much more efficient memory management, (iii) modularity which allows to build new methods on top of the provided one and (iv) integration with the modern Bioconductor packages such as SummarizedExperiment, ExpressionSet and biclust. Availability and implementation The package is implemented in R and is available from Bioconductor (starting from version 3.6) at the following URL http://bioconductor.org/packages/runibic with installation instructions and tutorial. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Patryk Orzechowski
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
- Department of Automatics and Biomedical Engineering, AGH University of Science and Technology, Krakow, Poland
| | - Artur Pańszczyk
- Department of Automatics and Biomedical Engineering, AGH University of Science and Technology, Krakow, Poland
| | - Xiuzhen Huang
- Department of Computer Science, Arkansas State University, Jonesboro, AR, USA
| | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
4
|
Bao F, Deng Y, Du M, Ren Z, Zhang Q, Zhao Y, Suo J, Zhang Z, Wang M, Dai Q. Probabilistic natural mapping of gene-level tests for genome-wide association studies. Brief Bioinform 2018; 19:545-553. [PMID: 28200018 DOI: 10.1093/bib/bbx002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Indexed: 11/14/2022] Open
Abstract
Genome-wide association studies (GWASs) generally focus on a single marker, which limits the elucidation of the genetic architecture of complex traits. Herein, we present a new computational framework, termed probabilistic natural mapping (PALM), for performing gene-level association tests. PALM robustly reveals the inherent genomic structures of genes and generates feature representations that can be seamlessly incorporated into conventional statistic tests. Our approach substantially improves the effectiveness of uncovering associations derived from a subgroup of variants with weak effects, which represents a known challenge associated with existing methods. We applied PALM in a gastric cancer GWAS and identified two additional gastric cancer-associated susceptibility genes, NOC3L and RUNDC2A. The robust susceptibility discoveries of PALM are widely supported by existing studies from other biological perspectives. PALM will be useful for further GWAS analytical strategies that use gene-level analyses.
Collapse
Affiliation(s)
- Feng Bao
- Department of Automation, Tsinghua University, China
| | - Yue Deng
- School of Pharmacy, University of California, San Francisco, USA
| | - Mulong Du
- Department of Environmental Genomics, Nanjing Medical University, China
| | - Zhiquan Ren
- Department of Automation, Tsinghua University, China
| | - Qingzhao Zhang
- School of Economics and Wang Yanan Institute for Studies in Economics, Xiamen University, China
| | - Yanyu Zhao
- Department of Biomedical Engineering, Boston University, USA
| | - Jinli Suo
- Department of Automation, Tsinghua University, China
| | - Zhengdong Zhang
- Department of Genetic Toxicology, Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, and a PI in Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center For Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China
| | - Meilin Wang
- Department of Genetic Toxicology, Key Laboratory of Modern Toxicology of Ministry of Education, School of Public Health, and a PI in Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center For Cancer Personalized Medicine, Nanjing Medical University, Nanjing, China
| | - Qionghai Dai
- Department of Automation, Tsinghua University, China
| |
Collapse
|
5
|
Povysil G, Hochreiter S. IBD Sharing between Africans, Neandertals, and Denisovans. Genome Biol Evol 2018; 8:3406-3416. [PMID: 28158547 PMCID: PMC5381509 DOI: 10.1093/gbe/evw234] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2016] [Indexed: 12/03/2022] Open
Abstract
Interbreeding between ancestors of humans and other hominins outside of Africa has been studied intensively, while their common history within Africa still lacks proper attention. However, shedding light on human evolution in this time period about which little is known, is essential for understanding subsequent events outside of Africa. We investigate the genetic relationships of humans, Neandertals, and Denisovans by identifying very short DNA segments in the 1000 Genomes Phase 3 data that these hominins share identical by descent (IBD). By focusing on low frequency and rare variants, we identify very short IBD segments with high confidence. These segments reveal events from a very distant past because shorter IBD segments are presumably older than longer ones. We extracted two types of very old IBD segments that are not only shared among humans, but also with Neandertals and/or Denisovans. The first type contains longer segments that are found primarily in Asians and Europeans where more segments are found in South Asians than in East Asians for both Neandertal and Denisovan. These longer segments indicate complex admixture events outside of Africa. The second type consists of shorter segments that are shared mainly by Africans and therefore may indicate events involving ancestors of humans and other ancient hominins within Africa. Our results from the autosomes are further supported by an analysis of chromosome X, on which segments that are shared by Africans and match the Neandertal and/or Denisovan genome were even more prominent. Our results indicate that interbreeding with other hominins was a common feature of human evolution starting already long before ancestors of modern humans left Africa.
Collapse
Affiliation(s)
- Gundula Povysil
- Institute of Bioinformatics, Johannes Kepler University Linz, Austria
| | - Sepp Hochreiter
- Institute of Bioinformatics, Johannes Kepler University Linz, Austria
| |
Collapse
|
6
|
Clevert DA, Unterthiner T, Povysil G, Hochreiter S. Rectified factor networks for biclustering of omics data. Bioinformatics 2017; 33:i59-i66. [PMID: 28881961 PMCID: PMC5870657 DOI: 10.1093/bioinformatics/btx226] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Biclustering has become a major tool for analyzing large datasets given as matrix of samples times features and has been successfully applied in life sciences and e-commerce for drug design and recommender systems, respectively. actor nalysis for cluster cquisition (FABIA), one of the most successful biclustering methods, is a generative model that represents each bicluster by two sparse membership vectors: one for the samples and one for the features. However, FABIA is restricted to about 20 code units because of the high computational complexity of computing the posterior. Furthermore, code units are sometimes insufficiently decorrelated and sample membership is difficult to determine. We propose to use the recently introduced unsupervised Deep Learning approach Rectified Factor Networks (RFNs) to overcome the drawbacks of existing biclustering methods. RFNs efficiently construct very sparse, non-linear, high-dimensional representations of the input via their posterior means. RFN learning is a generalized alternating minimization algorithm based on the posterior regularization method which enforces non-negative and normalized posterior means. Each code unit represents a bicluster, where samples for which the code unit is active belong to the bicluster and features that have activating weights to the code unit belong to the bicluster. RESULTS On 400 benchmark datasets and on three gene expression datasets with known clusters, RFN outperformed 13 other biclustering methods including FABIA. On data of the 1000 Genomes Project, RFN could identify DNA segments which indicate, that interbreeding with other hominins starting already before ancestors of modern humans left Africa. AVAILABILITY AND IMPLEMENTATION https://github.com/bioinf-jku/librfn. CONTACT djork-arne.clevert@bayer.com or hochreit@bioinf.jku.at.
Collapse
Affiliation(s)
| | - Thomas Unterthiner
- Institute of Bioinformatics, Johannes Kepler University Linz, Linz, Austria
| | - Gundula Povysil
- Institute of Bioinformatics, Johannes Kepler University Linz, Linz, Austria
| | - Sepp Hochreiter
- Institute of Bioinformatics, Johannes Kepler University Linz, Linz, Austria
| |
Collapse
|
7
|
Liu XQ, Fazio J, Hu P, Paterson AD. Identity-by-descent mapping for diastolic blood pressure in unrelated Mexican Americans. BMC Proc 2016; 10:263-267. [PMID: 27980647 PMCID: PMC5133517 DOI: 10.1186/s12919-016-0041-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Population-based identity by descent (IBD) mapping is a statistical method for detection of genetic loci that share an ancestral segment among “unrelated” pairs of individuals for a disease. As a complementary method to genome-wide association studies, IBD mapping is robust to allelic heterogeneity and may identify rare inherited variants when combined with sequence data. Our objective is to identify the causal genes for diastolic blood pressure (DBP). We applied a population-based IBD mapping method to 105 unrelated individuals selected from the family data provided for the Genetic Analysis Workshop 19. Using the genome-wide association study data (ie, the microarray data), chromosome 3 was scanned for IBD sharing segments among all pairs of these individuals. At the chromosomal region with the most significant relationship between IBD sharing and DBP, the whole genome sequence data were examined to identify the risk variants for DBP. The most significant chromosomal region that was identified to have a relationship between the IBD sharing and DBP was at 3q12.3 (p = 0.0016), although it did not achieve the chromosome-wide significance level (p = 0.00012). This chromosomal region contains 1 gene, ZPLD1, which has been reported to be associated with cerebral cavernous malformations, a disease with enlarged small blood vessels (capillaries) in the brain. Although 24 deleterious variants were identified at this region, no significant association was found between these variants and DBP (p = 0.40). We presented a mapping strategy which combined a population-based IBD mapping method with sequence data analyses. One gene was located at a chromosomal region identified by this method for DBP. However, further study with a large sample size is needed to assess this result.
Collapse
Affiliation(s)
- Xiao-Qing Liu
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Manitoba, Winnipeg, MB R3E 3P4 Canada ; Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 3P4 Canada ; The Children's Hospital Research Institute of Manitoba, Winnipeg, MB R3E 3P4 Canada
| | - Jillian Fazio
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of Manitoba, Winnipeg, MB R3E 3P4 Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 3P4 Canada ; George and Fay Yee Centre for Healthcare Innovation, University of Manitoba, Winnipeg, MB R3A 1R9 Canada
| | - Andrew D Paterson
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4 Canada ; Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5G 0A4 Canada
| |
Collapse
|
8
|
Bunte K, Leppäaho E, Saarinen I, Kaski S. Sparse group factor analysis for biclustering of multiple data sources. ACTA ACUST UNITED AC 2016; 32:2457-63. [PMID: 27153643 DOI: 10.1093/bioinformatics/btw207] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Accepted: 04/10/2016] [Indexed: 11/13/2022]
Abstract
MOTIVATION Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. RESULTS Our simulation studies show that the proposed method reliably infers biclusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction accuracy. Moreover, the predictions are based on several biclusters which provide insight into the data sources, in this case on gene expression, DNA methylation, protein abundance, exome sequence, functional connectivity fingerprints and drug sensitivity. AVAILABILITY AND IMPLEMENTATION http://research.cs.aalto.fi/pml/software/GFAsparse/ CONTACTS : kerstin.bunte@googlemail.com or samuel.kaski@aalto.fi.
Collapse
Affiliation(s)
- Kerstin Bunte
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland
| | - Eemeli Leppäaho
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland
| | - Inka Saarinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland
| | - Samuel Kaski
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland
| |
Collapse
|
9
|
Fedorova L, Qiu S, Dutta R, Fedorov A. Atlas of Cryptic Genetic Relatedness Among 1000 Human Genomes. Genome Biol Evol 2016; 8:777-90. [PMID: 26907499 PMCID: PMC4824066 DOI: 10.1093/gbe/evw034] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
A novel computational method for detecting identical-by-descent (IBD) chromosomal segments between sequenced genomes is presented. It utilizes the distribution patterns of very rare genetic variants (vrGVs), which have minor allele frequencies <0.2%. Contrary to the existing probabilistic approaches our method is rather deterministic, because it considers a group of very rare events which cannot happen together only by chance. This method has been applied for exhaustive computational search of shared IBD segments among 1,092 sequenced individuals from 14 populations. It demonstrated that clusters of vrGVs are unique and powerful markers of genetic relatedness, that uncover IBD chromosomal segments between and within populations, irrespective of whether divergence was recent or occurred hundreds-to-thousands of years ago. We found that several IBD segments are shared by practically any possible pair of individuals belonging to the same population. Moreover, shared short IBD segments (median size 183 kb) were found in 10% of inter-continental human pairs, each comprising of a person from sub-Saharan Africa and a person from Southern Europe. The shortest shared IBD segments (median size 54 kb) were found in 0.42% of inter-continental pairs composed of individuals from Chinese/Japanese populations and Africans from Kenya and Nigeria. Knowledge of inheritance of IBD segments is important in clinical case–control and cohort studies, since unknown distant familial relationships could compromise interpretation of collected data. Clusters of vrGVs should be useful markers for familial relationship and common multifactorial disorders.
Collapse
Affiliation(s)
| | - Shuhao Qiu
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Department of Medicine, University of Toledo
| | - Rajib Dutta
- Program in Biomedical Sciences, University of Toledo
| | - Alexei Fedorov
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Department of Medicine, University of Toledo
| |
Collapse
|
10
|
Rodriguez-Flores JL, Fakhro K, Agosto-Perez F, Ramstetter MD, Arbiza L, Vincent TL, Robay A, Malek JA, Suhre K, Chouchane L, Badii R, Al-Nabet Al-Marri A, Abi Khalil C, Zirie M, Jayyousi A, Salit J, Keinan A, Clark AG, Crystal RG, Mezey JG. Indigenous Arabs are descendants of the earliest split from ancient Eurasian populations. Genome Res 2016; 26:151-62. [PMID: 26728717 PMCID: PMC4728368 DOI: 10.1101/gr.191478.115] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 12/15/2015] [Indexed: 12/26/2022]
Abstract
An open question in the history of human migration is the identity of the earliest Eurasian populations that have left contemporary descendants. The Arabian Peninsula was the initial site of the out-of-Africa migrations that occurred between 125,000 and 60,000 yr ago, leading to the hypothesis that the first Eurasian populations were established on the Peninsula and that contemporary indigenous Arabs are direct descendants of these ancient peoples. To assess this hypothesis, we sequenced the entire genomes of 104 unrelated natives of the Arabian Peninsula at high coverage, including 56 of indigenous Arab ancestry. The indigenous Arab genomes defined a cluster distinct from other ancestral groups, and these genomes showed clear hallmarks of an ancient out-of-Africa bottleneck. Similar to other Middle Eastern populations, the indigenous Arabs had higher levels of Neanderthal admixture compared to Africans but had lower levels than Europeans and Asians. These levels of Neanderthal admixture are consistent with an early divergence of Arab ancestors after the out-of-Africa bottleneck but before the major Neanderthal admixture events in Europe and other regions of Eurasia. When compared to worldwide populations sampled in the 1000 Genomes Project, although the indigenous Arabs had a signal of admixture with Europeans, they clustered in a basal, outgroup position to all 1000 Genomes non-Africans when considering pairwise similarity across the entire genome. These results place indigenous Arabs as the most distant relatives of all other contemporary non-Africans and identify these people as direct descendants of the first Eurasian populations established by the out-of-Africa migrations.
Collapse
Affiliation(s)
- Juan L Rodriguez-Flores
- Department of Genetic Medicine, Weill Cornell Medical College, New York, New York 10065, USA
| | - Khalid Fakhro
- Sidra Medical and Research Center, Doha, Qatar; Department of Genetic Medicine, Weill Cornell Medical College-Qatar, Doha, Qatar
| | - Francisco Agosto-Perez
- Department of Genetic Medicine, Weill Cornell Medical College, New York, New York 10065, USA; Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA
| | - Monica D Ramstetter
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA
| | - Leonardo Arbiza
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA
| | - Thomas L Vincent
- Department of Genetic Medicine, Weill Cornell Medical College, New York, New York 10065, USA
| | - Amal Robay
- Department of Genetic Medicine, Weill Cornell Medical College-Qatar, Doha, Qatar
| | - Joel A Malek
- Department of Genetic Medicine, Weill Cornell Medical College-Qatar, Doha, Qatar
| | - Karsten Suhre
- Bioinformatics Core, Weill Cornell Medical College-Qatar, Doha, Qatar
| | - Lotfi Chouchane
- Department of Genetic Medicine, Weill Cornell Medical College-Qatar, Doha, Qatar
| | - Ramin Badii
- Laboratory Medicine and Pathology, Hamad Medical Corporation, Doha, Qatar
| | | | - Charbel Abi Khalil
- Department of Genetic Medicine, Weill Cornell Medical College-Qatar, Doha, Qatar
| | - Mahmoud Zirie
- Department of Medicine, Hamad Medical Corporation, Doha, Qatar
| | - Amin Jayyousi
- Department of Medicine, Hamad Medical Corporation, Doha, Qatar
| | - Jacqueline Salit
- Department of Genetic Medicine, Weill Cornell Medical College, New York, New York 10065, USA
| | - Alon Keinan
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA
| | - Andrew G Clark
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA
| | - Ronald G Crystal
- Department of Genetic Medicine, Weill Cornell Medical College, New York, New York 10065, USA
| | - Jason G Mezey
- Department of Genetic Medicine, Weill Cornell Medical College, New York, New York 10065, USA; Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14850, USA
| |
Collapse
|
11
|
Epps CW, Keyghobadi N. Landscape genetics in a changing world: disentangling historical and contemporary influences and inferring change. Mol Ecol 2015; 24:6021-40. [DOI: 10.1111/mec.13454] [Citation(s) in RCA: 163] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Revised: 10/29/2015] [Accepted: 11/02/2015] [Indexed: 12/15/2022]
Affiliation(s)
- Clinton W. Epps
- Oregon State University; Nash Hall Room 104 Corvallis OR 97331 USA
| | - Nusha Keyghobadi
- Department of Biology; Western University; London ON N6A 5B7 Canada
| |
Collapse
|
12
|
Upton A, Trelles O, Cornejo-García JA, Perkins JR. Review: High-performance computing to detect epistasis in genome scale data sets. Brief Bioinform 2015; 17:368-79. [PMID: 26272945 DOI: 10.1093/bib/bbv058] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Indexed: 11/14/2022] Open
Abstract
It is becoming clear that most human diseases have a complex etiology that cannot be explained by single nucleotide polymorphisms (SNPs) or simple additive combinations; the general consensus is that they are caused by combinations of multiple genetic variations. The limited success of some genome-wide association studies is partly a result of this focus on single genetic markers. A more promising approach is to take into account epistasis, by considering the association of multiple SNP interactions with disease. However, as genomic data continues to grow in resolution, and genome and exome sequencing become more established, the number of combinations of variants to consider increases rapidly. Two potential solutions should be considered: the use of high-performance computing, which allows us to consider a larger number of variables, and heuristics to make the solution more tractable, essential in the case of genome sequencing. In this review, we look at different computational methods to analyse epistatic interactions within disease-related genetic data sets created by microarray technology. We also review efforts to use epistatic analysis results to produce biomarkers for diagnostic tests and give our views on future directions in this field in light of advances in sequencing technology and variants in non-coding regions.
Collapse
|
13
|
Klambauer G, Wischenbart M, Mahr M, Unterthiner T, Mayr A, Hochreiter S. Rchemcpp: a web service for structural analoging in ChEMBL, Drugbank and the Connectivity Map. Bioinformatics 2015; 31:3392-4. [PMID: 26088801 DOI: 10.1093/bioinformatics/btv373] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 06/11/2015] [Indexed: 01/27/2023] Open
Abstract
UNLABELLED We have developed Rchempp, a web service that identifies structurally similar compounds (structural analogs) in large-scale molecule databases. The service allows compounds to be queried in the widely used ChEMBL, DrugBank and the Connectivity Map databases. Rchemcpp utilizes the best performing similarity functions, i.e. molecule kernels, as measures for structural similarity. Molecule kernels have proven superior performance over other similarity measures and are currently excelling at machine learning challenges. To considerably reduce computational time, and thereby make it feasible as a web service, a novel efficient prefiltering strategy has been developed, which maintains the sensitivity of the method. By exploiting information contained in public databases, the web service facilitates many applications crucial for the drug development process, such as prioritizing compounds after screening or reducing adverse side effects during late phases. Rchemcpp was used in the DeepTox pipeline that has won the Tox21 Data Challenge and is frequently used by researchers in pharmaceutical companies. AVAILABILITY AND IMPLEMENTATION The web service and the R package are freely available via http://shiny.bioinf.jku.at/Analoging/ and via Bioconductor. CONTACT hochreit@bioinf.jku.at SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Günter Klambauer
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Martin Wischenbart
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Michael Mahr
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Thomas Unterthiner
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Andreas Mayr
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| | - Sepp Hochreiter
- Institute of Bioinformatics, Johannes Kepler University Linz, Altenbergerstr. 69, 4040 Linz, Austria
| |
Collapse
|
14
|
Park DS, Baran Y, Hormozdiari F, Eng C, Torgerson DG, Burchard EG, Zaitlen N. PIGS: improved estimates of identity-by-descent probabilities by probabilistic IBD graph sampling. BMC Bioinformatics 2015; 16 Suppl 5:S9. [PMID: 25860540 PMCID: PMC4402697 DOI: 10.1186/1471-2105-16-s5-s9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Identifying segments in the genome of different individuals that are identical-by-descent (IBD) is a fundamental element of genetics. IBD data is used for numerous applications including demographic inference, heritability estimation, and mapping disease loci. Simultaneous detection of IBD over multiple haplotypes has proven to be computationally difficult. To overcome this, many state of the art methods estimate the probability of IBD between each pair of haplotypes separately. While computationally efficient, these methods fail to leverage the clique structure of IBD resulting in less powerful IBD identification, especially for small IBD segments. We develop a hybrid approach (PIGS), which combines the computational efficiency of pairwise methods with the power of multiway methods. It leverages the IBD graph structure to compute the probability of IBD conditional on all pairwise estimates simultaneously. We show via extensive simulations and analysis of real data that our method produces a substantial increase in the number of identified small IBD segments.
Collapse
|
15
|
Al-Khudhair A, Qiu S, Wyse M, Chowdhury S, Cheng X, Bekbolsynov D, Saha-Mandal A, Dutta R, Fedorova L, Fedorov A. Inference of distant genetic relations in humans using "1000 genomes". Genome Biol Evol 2015; 7:481-92. [PMID: 25573959 PMCID: PMC4350174 DOI: 10.1093/gbe/evv003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Nucleotide sequence differences on the whole-genome scale have been computed for 1,092 people from 14 populations publicly available by the 1000 Genomes Project. Total number of differences in genetic variants between 96,464 human pairs has been calculated. The distributions of these differences for individuals within European, Asian, or African origin were characterized by narrow unimodal peaks with mean values of 3.8, 3.5, and 5.1 million, respectively, and standard deviations of 0.1–0.03 million. The total numbers of genomic differences between pairs of all known relatives were found to be significantly lower than their respective population means and in reverse proportion to the distance of their consanguinity. By counting the total number of genomic differences it is possible to infer familial relations for people that share down to 6% of common loci identical-by-descent. Detection of familial relations can be radically improved when only very rare genetic variants are taken into account. Counting of total number of shared very rare single nucleotide polymorphisms (SNPs) from whole-genome sequences allows establishing distant familial relations for persons with eighth and ninth degrees of relationship. Using this analysis we predicted 271 distant familial pairwise relations among 1,092 individuals that have not been declared by 1000 Genomes Project. Particularly, among 89 British and 97 Chinese individuals we found three British–Chinese pairs with distant genetic relationships. Individuals from these pairs share identical-by-descent DNA fragments that represent 0.001%, 0.004%, and 0.01% of their genomes. With affordable whole-genome sequencing techniques, very rare SNPs should become important genetic markers for familial relationships and population stratification.
Collapse
Affiliation(s)
- Ahmed Al-Khudhair
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo
| | - Shuhao Qiu
- Program in Biomedical Sciences, University of Toledo Department of Medicine, University of Toledo
| | - Meghan Wyse
- Program in Biomedical Sciences, University of Toledo
| | | | - Xi Cheng
- Program in Biomedical Sciences, University of Toledo
| | | | - Arnab Saha-Mandal
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo
| | - Rajib Dutta
- Program in Biomedical Sciences, University of Toledo Department of Medicine, University of Toledo
| | | | - Alexei Fedorov
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo Department of Medicine, University of Toledo
| |
Collapse
|