1
|
Yoo S, Garg E, Elliott LT, Hung RJ, Halevy AR, Brooks JD, Bull SB, Gagnon F, Greenwood C, Lawless JF, Paterson AD, Sun L, Zawati MH, Lerner-Ellis J, Abraham R, Birol I, Bourque G, Garant JM, Gosselin C, Li J, Whitney J, Thiruvahindrapuram B, Herbrick JA, Lorenti M, Reuter MS, Adeoye OO, Liu S, Allen U, Bernier FP, Biggs CM, Cheung AM, Cowan J, Herridge M, Maslove DM, Modi BP, Mooser V, Morris SK, Ostrowski M, Parekh RS, Pfeffer G, Suchowersky O, Taher J, Upton J, Warren RL, Yeung R, Aziz N, Turvey SE, Knoppers BM, Lathrop M, Jones S, Scherer SW, Strug LJ. HostSeq: a Canadian whole genome sequencing and clinical data resource. BMC Genom Data 2023; 24:26. [PMID: 37131148 PMCID: PMC10152008 DOI: 10.1186/s12863-023-01128-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 02/22/2023] [Indexed: 05/04/2023] Open
Abstract
HostSeq was launched in April 2020 as a national initiative to integrate whole genome sequencing data from 10,000 Canadians infected with SARS-CoV-2 with clinical information related to their disease experience. The mandate of HostSeq is to support the Canadian and international research communities in their efforts to understand the risk factors for disease and associated health outcomes and support the development of interventions such as vaccines and therapeutics. HostSeq is a collaboration among 13 independent epidemiological studies of SARS-CoV-2 across five provinces in Canada. Aggregated data collected by HostSeq are made available to the public through two data portals: a phenotype portal showing summaries of major variables and their distributions, and a variant search portal enabling queries in a genomic region. Individual-level data is available to the global research community for health research through a Data Access Agreement and Data Access Compliance Office approval. Here we provide an overview of the collective project design along with summary level information for HostSeq. We highlight several statistical considerations for researchers using the HostSeq platform regarding data aggregation, sampling mechanism, covariate adjustment, and X chromosome analysis. In addition to serving as a rich data source, the diversity of study designs, sample sizes, and research objectives among the participating studies provides unique opportunities for the research community.
Collapse
Affiliation(s)
- S Yoo
- The Hospital for Sick Children, Toronto, ON, Canada
- University of Ottawa, Ottawa, ON, Canada
| | - E Garg
- Simon Fraser University, Burnaby, BC, Canada
| | - L T Elliott
- Simon Fraser University, Burnaby, BC, Canada
| | - R J Hung
- University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - A R Halevy
- The Hospital for Sick Children, Toronto, ON, Canada
| | - J D Brooks
- University of Toronto, Toronto, ON, Canada
| | - S B Bull
- University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - F Gagnon
- University of Toronto, Toronto, ON, Canada
| | - Cmt Greenwood
- McGill University, Montreal, QC, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC, Canada
| | - J F Lawless
- University of Waterloo, Waterloo, ON, Canada
| | - A D Paterson
- The Hospital for Sick Children, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
| | - L Sun
- University of Toronto, Toronto, ON, Canada
| | | | - J Lerner-Ellis
- University of Toronto, Toronto, ON, Canada
- Sinai Health System, Toronto, ON, Canada
| | - Rjs Abraham
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - I Birol
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - G Bourque
- McGill University, Montreal, QC, Canada
| | - J-M Garant
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - C Gosselin
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - J Li
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - J Whitney
- The Hospital for Sick Children, Toronto, ON, Canada
| | | | - J-A Herbrick
- The Hospital for Sick Children, Toronto, ON, Canada
| | - M Lorenti
- The Hospital for Sick Children, Toronto, ON, Canada
| | - M S Reuter
- The Hospital for Sick Children, Toronto, ON, Canada
| | - O O Adeoye
- The Hospital for Sick Children, Toronto, ON, Canada
| | - S Liu
- The Hospital for Sick Children, Toronto, ON, Canada
| | - U Allen
- The Hospital for Sick Children, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
| | - F P Bernier
- University of Calgary, Calgary, AB, Canada
- Alberta Children's Hospital, Calgary, AB, Canada
| | - C M Biggs
- University of British Columbia, Vancouver, BC, Canada
- BC Children's Hospital, Vancouver, BC, Canada
- St. Paul's Hospital, Vancouver, BC, Canada
| | - A M Cheung
- University Health Network, Toronto, ON, Canada
| | - J Cowan
- University of Ottawa, Ottawa, ON, Canada
- The Ottawa Hospital Research Institute, Ottawa, ON, Canada
| | - M Herridge
- University Health Network, Toronto, ON, Canada
| | | | - B P Modi
- BC Children's Hospital, Vancouver, BC, Canada
| | - V Mooser
- McGill University, Montreal, QC, Canada
| | - S K Morris
- The Hospital for Sick Children, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
| | - M Ostrowski
- University of Toronto, Toronto, ON, Canada
- St. Michael's Hospital, Unity Health, Toronto, ON, Canada
| | - R S Parekh
- The Hospital for Sick Children, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
- Women's College Hospital, Toronto, ON, Canada
| | - G Pfeffer
- University of Calgary, Calgary, AB, Canada
| | | | - J Taher
- University of Toronto, Toronto, ON, Canada
- Sinai Health System, Toronto, ON, Canada
| | - J Upton
- The Hospital for Sick Children, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
| | - R L Warren
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - Rsm Yeung
- The Hospital for Sick Children, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
| | - N Aziz
- The Hospital for Sick Children, Toronto, ON, Canada
| | - S E Turvey
- University of British Columbia, Vancouver, BC, Canada
- BC Children's Hospital, Vancouver, BC, Canada
| | | | - M Lathrop
- McGill University, Montreal, QC, Canada
| | - Sjm Jones
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| | - S W Scherer
- The Hospital for Sick Children, Toronto, ON, Canada
- University of Toronto, Toronto, ON, Canada
| | - L J Strug
- The Hospital for Sick Children, Toronto, ON, Canada.
- University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
2
|
Titmuss E, Corbett RD, Davidson S, Abbasi S, Williamson LM, Pleasance ED, Shlien A, Renouf DJ, Jones SJM, Laskin J, Marra MA. TMBur: a distributable tumor mutation burden approach for whole genome sequencing. BMC Med Genomics 2022; 15:190. [PMID: 36071521 PMCID: PMC9450342 DOI: 10.1186/s12920-022-01348-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 09/01/2022] [Indexed: 12/02/2022] Open
Abstract
Background Tumor mutation burden (TMB) is a key characteristic used in a tumor-type agnostic context to inform the use of immune checkpoint inhibitors (ICI). Accurate and consistent measurement of TMB is crucial as it can significantly impact patient selection for therapy and clinical trials, with a threshold of 10 mutations/Mb commonly used as an inclusion criterion. Studies have shown that the most significant contributor to variability in mutation counts in whole genome sequence (WGS) data is differences in analysis methods, even more than differences in extraction or library construction methods. Therefore, tools for improving consistency in whole genome TMB estimation are of clinical importance.
Methods We developed a distributable TMB analysis suite, TMBur, to address the need for genomic TMB estimate consistency in projects that span jurisdictions. TMBur is implemented in Nextflow and performs all analysis steps to generate TMB estimates directly from fastq files, incorporating somatic variant calling with Manta, Strelka2, and Mutect2, and microsatellite instability profiling with MSISensor. These tools are provided in a Singularity container downloaded by the workflow at runtime, allowing the entire workflow to be run identically on most computing platforms. To test the reproducibility of TMBur TMB estimates, we performed replicate runs on WGS data derived from the COLO829 and COLO829BL cell lines at multiple research centres. The clinical value of derived TMB estimates was then evaluated using a cohort of 90 patients with advanced, metastatic cancer that received ICIs following WGS analysis. Patients were split into groups based on a threshold of 10/Mb, and time to progression from initiation of ICIs was examined using Kaplan–Meier and cox-proportional hazards analyses. Results TMBur produced identical TMB estimates across replicates and at multiple analysis centres. The clinical utility of TMBur-derived TMB estimates were validated, with a genomic TMB ≥ 10/Mb demonstrating improved time to progression, even after correcting for differences in tumor type (HR = 0.39, p = 0.012). Conclusions TMBur, a shareable workflow, generates consistent whole genome derived TMB estimates predictive of response to ICIs across multiple analysis centres. Reproducible TMB estimates from this approach can improve collaboration and ensure equitable treatment and clinical trial access spanning jurisdictions. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01348-z.
Collapse
Affiliation(s)
- Emma Titmuss
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Richard D Corbett
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Scott Davidson
- Program of Genetics and Genome Biology, The Hospital for Sick Children, The Peter Gilgan Centre for Research and Learning, Toronto, ON, Canada
| | - Sanna Abbasi
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Laura M Williamson
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Erin D Pleasance
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Adam Shlien
- Program of Genetics and Genome Biology, The Hospital for Sick Children, The Peter Gilgan Centre for Research and Learning, Toronto, ON, Canada
| | - Daniel J Renouf
- Department of Medical Oncology, BC Cancer, Vancouver, BC, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Janessa Laskin
- Department of Medical Oncology, BC Cancer, Vancouver, BC, Canada
| | - Marco A Marra
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
3
|
Alladio E, Poggiali B, Cosenza G, Pilli E. Multivariate statistical approach and machine learning for the evaluation of biogeographical ancestry inference in the forensic field. Sci Rep 2022; 12:8974. [PMID: 35643723 PMCID: PMC9148302 DOI: 10.1038/s41598-022-12903-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 04/13/2022] [Indexed: 11/24/2022] Open
Abstract
The biogeographical ancestry (BGA) of a trace or a person/skeleton refers to the component of ethnicity, constituted of biological and cultural elements, that is biologically determined. Nowadays, many individuals are interested in exploring their genealogy, and the capability to distinguish biogeographic information about population groups and subgroups via DNA analysis plays an essential role in several fields such as in forensics. In fact, for investigative and intelligence purposes, it is beneficial to inference the biogeographical origins of perpetrators of crimes or victims of unsolved cold cases when no reference profile from perpetrators or database hits for comparative purposes are available. Current approaches for biogeographical ancestry estimation using SNPs data are usually based on PCA and Structure software. The present study provides an alternative method that involves multivariate data analysis and machine learning strategies to evaluate BGA discriminating power of unknown samples using different commercial panels. Starting from 1000 Genomes project, Simons Genome Diversity Project and Human Genome Diversity Project datasets involving African, American, Asian, European and Oceania individuals, and moving towards further and more geographically restricted populations, powerful multivariate techniques such as Partial Least Squares-Discriminant Analysis (PLS-DA) and machine learning techniques such as XGBoost were employed, and their discriminating power was compared. PLS-DA method provided more robust classifications than XGBoost method, showing that the adopted approach might be an interesting tool for forensic experts to infer BGA information from the DNA profile of unknown individuals, but also highlighting that the commercial forensic panels could be inadequate to discriminate populations at intra-continental level.
Collapse
Affiliation(s)
- Eugenio Alladio
- Department of Chemistry, University of Turin, Turin, Italy.,Centro Regionale Antidoping e di Tossicologia "A. Bertinaria", Orbassano, Torino, Italy
| | - Brando Poggiali
- Department of Biology, Forensic Molecular Anthropology Laboratory, University of Florence, Florence, Italy
| | - Giulia Cosenza
- Department of Biology, Forensic Molecular Anthropology Laboratory, University of Florence, Florence, Italy
| | - Elena Pilli
- Department of Biology, Forensic Molecular Anthropology Laboratory, University of Florence, Florence, Italy.
| |
Collapse
|