1
|
Tagami D, Bisschop G, Kelleher J. tstrait: a quantitative trait simulator for ancestral recombination graphs. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae334. [PMID: 38796683 DOI: 10.1093/bioinformatics/btae334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/24/2024] [Indexed: 05/28/2024]
Abstract
SUMMARY Ancestral recombination graphs (ARGs) encode the ensemble of correlated genealogical trees arising from recombination in a compact and efficient structure and are of fundamental importance in population and statistical genetics. Recent breakthroughs have made it possible to simulate and infer ARGs at biobank scale, and there is now intense interest in using ARG-based methods across a broad range of applications, particularly in genome-wide association studies (GWAS). Sophisticated methods exist to simulate ARGs using population genetics models, but there is currently no software to simulate quantitative traits directly from these ARGs. To apply existing quantitative trait simulators users must export genotype data, losing important information about ancestral processes and producing prohibitively large files when applied to the biobank-scale datasets currently of interest in GWAS. We present tstrait, an open-source Python library to simulate quantitative traits on ARGs, and show how this user-friendly software can quickly simulate phenotypes for biobank-scale datasets on a laptop computer. AVAILABILITY AND IMPLEMENTATION tstrait is available for download on the Python Package Index. Full documentation with examples and workflow templates is available on https://tskit.dev/tstrait/docs/, and the development version is maintained on GitHub (https://github.com/tskit-dev/tstrait).
Collapse
Affiliation(s)
- Daiki Tagami
- Department of Statistics, University of Oxford, Oxford OX1 3LB, United Kingdom
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
| | - Gertjan Bisschop
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, United Kingdom
| |
Collapse
|
2
|
Tagami D, Bisschop G, Kelleher J. tstrait: a quantitative trait simulator for ancestral recombination graphs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.13.584790. [PMID: 38559118 PMCID: PMC10980058 DOI: 10.1101/2024.03.13.584790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Summary Ancestral recombination graphs (ARGs) encode the ensemble of correlated genealogical trees arising from recombination in a compact and efficient structure, and are of fundamental importance in population and statistical genetics. Recent breakthroughs have made it possible to simulate and infer ARGs at biobank scale, and there is now intense interest in using ARG-based methods across a broad range of applications, particularly in genome-wide association studies (GWAS). Sophisticated methods exist to simulate ARGs using population genetics models, but there is currently no software to simulate quantitative traits directly from these ARGs. To apply existing quantitative trait simulators users must export genotype data, losing important information about ancestral processes and producing prohibitively large files when applied to the biobank-scale datasets currently of interest in GWAS. We present tstrait, an open-source Python library to simulate quantitative traits on ARGs, and show how this user-friendly software can quickly simulate phenotypes for biobank-scale datasets on a laptop computer. Availability and Implementation tstrait is available for download on the Python Package Index. Full documentation with examples and workflow templates is available on https://tskit.dev/tstrait/docs/, and the development version is maintained on GitHub (https://github.com/tskit-dev/tstrait). Contact daiki.tagami@hertford.ox.ac.uk.
Collapse
Affiliation(s)
- Daiki Tagami
- Department of Statistics, University of Oxford, 24-29 St Giles’, Oxford OX1 3LB, United Kingdom
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Old Road Campus, Oxford OX3 7LF, United Kingdom
| | - Gertjan Bisschop
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Old Road Campus, Oxford OX3 7LF, United Kingdom
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Old Road Campus, Oxford OX3 7LF, United Kingdom
| |
Collapse
|
3
|
Lewanski AL, Grundler MC, Bradburd GS. The era of the ARG: An introduction to ancestral recombination graphs and their significance in empirical evolutionary genomics. PLoS Genet 2024; 20:e1011110. [PMID: 38236805 PMCID: PMC10796009 DOI: 10.1371/journal.pgen.1011110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open
Abstract
In the presence of recombination, the evolutionary relationships between a set of sampled genomes cannot be described by a single genealogical tree. Instead, the genomes are related by a complex, interwoven collection of genealogies formalized in a structure called an ancestral recombination graph (ARG). An ARG extensively encodes the ancestry of the genome(s) and thus is replete with valuable information for addressing diverse questions in evolutionary biology. Despite its potential utility, technological and methodological limitations, along with a lack of approachable literature, have severely restricted awareness and application of ARGs in evolution research. Excitingly, recent progress in ARG reconstruction and simulation have made ARG-based approaches feasible for many questions and systems. In this review, we provide an accessible introduction and exploration of ARGs, survey recent methodological breakthroughs, and describe the potential for ARGs to further existing goals and open avenues of inquiry that were previously inaccessible in evolutionary genomics. Through this discussion, we aim to more widely disseminate the promise of ARGs in evolutionary genomics and encourage the broader development and adoption of ARG-based inference.
Collapse
Affiliation(s)
- Alexander L. Lewanski
- Department of Integrative Biology, Michigan State University, East Lansing, Michigan, United States of America
- W.K. Kellogg Biological Station, Michigan State University, Hickory Corners, Michigan, United States of America
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, Michigan, United States of America
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Michael C. Grundler
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gideon S. Bradburd
- W.K. Kellogg Biological Station, Michigan State University, Hickory Corners, Michigan, United States of America
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
4
|
Zhu K, Shi J, Yang R, Zhou C, Liu Z. Evidence based on Mendelian randomization: Causal relationship between mitochondrial biological function and lung cancer and its subtypes. Neoplasia 2023; 46:100950. [PMID: 37976568 PMCID: PMC10685044 DOI: 10.1016/j.neo.2023.100950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 11/10/2023] [Accepted: 11/10/2023] [Indexed: 11/19/2023]
Abstract
OBJECTIVE This study aimed to investigate the causal relationship between mitochondrial biological function and lung cancer, including its subtypes, via MR. METHODS SNPs significantly associated with lung cancer and its subtypes were employed as instrumental variables. MR-Egger regression, simple mode, weighted mode, simple median, and weighted median, were utilized to determine the causal relationship between the exposure factor and the occurrence of lung cancer and its subtypes. RESULTS NADH dehydrogenase (ubiquinone) flavoprotein 2 and transmembrane protein 70 were found to have a causal relationship with lung adenocarcinoma, acting as protective factors. The causal relationship between mitochondrial import inner membrane translocase subunit and NADH dehydrogenase (ubiquinone) iron-sulfur protein 4 and small-cell lung cancer was established as a risk factor. NADH dehydrogenase (ubiquinone) 1 beta subcomplex subunit 8 exhibited a causal relationship with small-cell lung cancer, acting as a protective factor. Furthermore, NAD-dependent protein deacylase sirtuin-5 was causally linked to lung squamous cell carcinoma, serving as a protective factor. A funnel plot demonstrated the symmetrical distribution of the SNPs. Thew pleiotroy test (P > 0.05) and "leave-one-out" test validated the relative stability of the results. CONCLUSION This study established a causal relationship between mitochondrial biological function and lung cancer, including its subtypes.
Collapse
Affiliation(s)
- Kangle Zhu
- Nanjing Drum Tower Hospital Clinical College of Nanjing Medical University, Nanjing, Jiangsu 211166, China; Department of Thoracic surgery, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, Jiangsu 210008, China
| | - Jingwei Shi
- Nanjing Drum Tower Hospital Clinical College of Nanjing Medical University, Nanjing, Jiangsu 211166, China; Department of Thoracic surgery, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, Jiangsu 210008, China
| | - Rusong Yang
- Nanjing Drum Tower Hospital Clinical College of Nanjing Medical University, Nanjing, Jiangsu 211166, China; Department of Thoracic surgery, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, Jiangsu 210008, China
| | - Chu Zhou
- Nanjing Drum Tower Hospital Clinical College of Nanjing Medical University, Nanjing, Jiangsu 211166, China; Department of Thoracic surgery, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, Jiangsu 210008, China
| | - Zhengcheng Liu
- Nanjing Drum Tower Hospital Clinical College of Nanjing Medical University, Nanjing, Jiangsu 211166, China; Department of Thoracic surgery, Nanjing Drum Tower Hospital, Affiliated Hospital of Medical School, Nanjing University, Nanjing, Jiangsu 210008, China.
| |
Collapse
|
5
|
Tanigawa Y, Kellis M. Power of inclusion: Enhancing polygenic prediction with admixed individuals. Am J Hum Genet 2023; 110:1888-1902. [PMID: 37890495 PMCID: PMC10645553 DOI: 10.1016/j.ajhg.2023.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 09/22/2023] [Accepted: 09/22/2023] [Indexed: 10/29/2023] Open
Abstract
Admixed individuals offer unique opportunities for addressing limited transferability in polygenic scores (PGSs), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals for developing more equitable PGS models.
Collapse
Affiliation(s)
- Yosuke Tanigawa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|