1
|
Stark Z, Glazer D, Hofmann O, Rendon A, Marshall CR, Ginsburg GS, Lunt C, Allen N, Effingham M, Hastings Ward J, Hill SL, Ali R, Goodhand P, Page A, Rehm HL, North KN, Scott RH. A call to action to scale up research and clinical genomic data sharing. Nat Rev Genet 2024:10.1038/s41576-024-00776-0. [PMID: 39375561 DOI: 10.1038/s41576-024-00776-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2024] [Indexed: 10/09/2024]
Abstract
Genomic data from millions of individuals have been generated worldwide to drive discovery and clinical impact in precision medicine. Lowering the barriers to using these data collectively is needed to equitably realize the benefits of the diversity and scale of population data. We examine the current landscape of global genomic data sharing, including the evolution of data sharing models from data aggregation through to data visiting, and for certain use cases, cross-cohort analysis using federated approaches across multiple environments. We highlight emerging examples of best practice relating to participant, patient and community engagement; evolution of technical standards, tools and infrastructure; and impact of research and health-care policy. We outline 12 actions we can all take together to scale up efforts to enable safe global data sharing and move beyond projects demonstrating feasibility to routinely cross-analysing research and clinical data sets, optimizing benefit.
Collapse
Affiliation(s)
- Zornitza Stark
- Australian Genomics, Melbourne, Victoria, Australia.
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- University of Melbourne, Melbourne, Victoria, Australia.
| | - David Glazer
- Verily Life Sciences, South San Francisco, CA, USA.
| | - Oliver Hofmann
- Australian Genomics, Melbourne, Victoria, Australia
- University of Melbourne, Melbourne, Victoria, Australia
- University of Melbourne Centre for Cancer Research, Melbourne, Victoria, Australia
| | | | - Christian R Marshall
- Division of Genome Diagnostics, Pediatric Laboratory Medicine Department, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Geoffrey S Ginsburg
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Chris Lunt
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Naomi Allen
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
- UK Biobank, Stockport, UK
| | | | | | - Sue L Hill
- National Health Service England, London, UK
| | - Raghib Ali
- Our Future Health, Manchester, UK
- Oxford University Hospitals NHS Trust, Oxford, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Peter Goodhand
- Global Alliance for Genomics and Health, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Angela Page
- Global Alliance for Genomics and Health, Toronto, Ontario, Canada
| | - Heidi L Rehm
- Global Alliance for Genomics and Health, Toronto, Ontario, Canada
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Kathryn N North
- Australian Genomics, Melbourne, Victoria, Australia
- University of Melbourne, Melbourne, Victoria, Australia
- Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Richard H Scott
- Genomics England, London, UK
- Great Ormond Street Hospital for Children, London, UK
- UCL Great Ormond Street Institute of Child Health, London, UK
| |
Collapse
|
2
|
Ellard S, Morgan S, Wynn SL, Walker S, Parrish A, Mein R, Juett A, Ahn JW, Berry I, Cassidy EJ, Durkie M, Fish L, Hall R, Howard E, Rankin J, Wright CF, Deans ZC, Scott RH, Hill SL, Baple EL, Taylor RW. Rare disease genomic testing in the UK and Ireland: promoting timely and equitable access. J Med Genet 2024:jmg-2024-110228. [PMID: 39327040 DOI: 10.1136/jmg-2024-110228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Accepted: 09/09/2024] [Indexed: 09/28/2024]
Abstract
PURPOSE AND SCOPE The aim of this position statement is to provide recommendations regarding the delivery of genomic testing to patients with rare disease in the UK and Ireland. The statement has been developed to facilitate timely and equitable access to genomic testing with reporting of results within commissioned turnaround times. METHODS OF STATEMENT DEVELOPMENT A 1-day workshop was convened by the UK Association for Clinical Genomic Science and attended by key stakeholders within the NHS Genomic Medicine Service, including clinical scientists, clinical geneticists and patient support group representatives. The aim was to identify best practice and innovations for streamlined, geographically consistent services delivering timely results. Attendees and senior responsible officers for genomic testing services in the UK nations and Ireland were invited to contribute. RESULTS AND CONCLUSIONS We identified eight fundamental requirements and describe these together with key enablers in the form of specific recommendations. These relate to laboratory practice (proportionate variant analysis, bioinformatics pipelines, multidisciplinary team working model and test request monitoring), compliance with national guidance (variant classification, incidental findings, reporting and reanalysis), service development and improvement (multimodal testing and innovation through research, informed by patient experience), service demand, capacity management, workforce (recruitment, retention and development), and education and training for service users. This position statement was developed to provide best practice guidance for the specialist genomics workforce within the UK and Ireland but is relevant to any publicly funded healthcare system seeking to deliver timely rare disease genomic testing in the context of high demand and limited resources.
Collapse
Affiliation(s)
- Sian Ellard
- Genomics Laboratory, Royal Devon and Exeter NHS Foundation Trust, Exeter, UK
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Exeter, UK
| | - Sian Morgan
- All Wales Genetics Laboratory, University Hospital of Wales, Cardiff, UK
| | - Sarah L Wynn
- Rare Chromosome Disorder Support Group, Unique, Surrey, UK
| | | | - Andrew Parrish
- Genomics Laboratory, Royal Devon and Exeter NHS Foundation Trust, Exeter, UK
- South West Genomic Medicine Service, England, UK
| | | | - Ana Juett
- South West Genomic Medicine Service, England, UK
| | - Joo Wook Ahn
- Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Ian Berry
- South West Genomic Medicine Service, England, UK
- Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - Emma-Jane Cassidy
- Wessex Genomics Laboratory Service, University Hospital Southampton NHS Foundation Trust, Salisbury, UK
| | - Miranda Durkie
- Sheffield Diagnostic Genetics Service, Sheffield Children's NHS Foundation Trust, Sheffield, UK
| | | | | | - Emma Howard
- Manchester University NHS Foundation Trust, Manchester, UK
| | - Julia Rankin
- South West Genomic Medicine Service, England, UK
- Peninsula Clinical Genetics Service, Exeter, UK
| | - Caroline F Wright
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Exeter, UK
| | - Zandra C Deans
- GenQA, Department of Laboratory Medicine, Royal Infirmary of Edinburgh, Edinburgh, UK
| | - Richard H Scott
- Genomics England Limited, London, UK
- Department of Clinical Genetics, Great Ormond Street Hospital for Children, London, UK
| | | | - Emma L Baple
- Department of Clinical and Biomedical Sciences, University of Exeter Medical School, Exeter, UK
- South West Genomic Medicine Service, England, UK
- Peninsula Clinical Genetics Service, Exeter, UK
| | - Robert W Taylor
- Mitochondrial Research Group, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- NHS Highly Specialised Service for Rare Mitochondrial Disorders, North East and Yorkshire Genomic Laboratory Hub, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK
| |
Collapse
|
3
|
Nicol D, Nielsen J, Archer M. Data access arrangements in genomic research consortia. Sci Rep 2024; 14:21685. [PMID: 39289472 PMCID: PMC11408512 DOI: 10.1038/s41598-024-72653-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 09/09/2024] [Indexed: 09/19/2024] Open
Abstract
One of the most common terms that is used to describe entities responsible for sharing genomic data for research purposes is 'genomic research consortium'. However, there is a lack of clarity around the language used by consortia to describe their data sharing arrangements. Calls have been made for more uniform terminology. This article reports on a review of the genomic research consortium literature illustrating a wide diversity in the language that has been used over time to describe the access arrangements of these entities. The second component of this research involved an examination of publicly available information from a dataset of 98 consortia. This analysis further illustrates the wide diversity in the access arrangements adopted by genomic research consortia. A total of 12 different access arrangements were identified, including four simple forms (open, consortium, managed and registered access) and eight more complex tiered forms (for example, a combination of consortium, managed and open access). The majority of consortia utilised some form of tiered access, often following the policy requirements of funders like the US National Institutes of Health and the UK Wellcome Trust. It was not always easy to precisely identify the access arrangements of individual consortia. Greater consistency, clarity and transparency is likely to be of benefit to donors, depositors and accessors alike. More work needs to be done to achieve this end.
Collapse
Affiliation(s)
- Dianne Nicol
- Centre for Law and Genetics, University of Tasmania, Hobart, TAS, Australia.
| | - Jane Nielsen
- Centre for Law and Genetics, University of Tasmania, Hobart, TAS, Australia
| | - Madeleine Archer
- Faculty of Business and Law, Australian Centre for Health Law Research, Queensland University of Technology, Brisbane, QLD, Australia
| |
Collapse
|
4
|
Leo S, Crusoe MR, Rodríguez-Navas L, Sirvent R, Kanitz A, De Geest P, Wittner R, Pireddu L, Garijo D, Fernández JM, Colonnelli I, Gallo M, Ohta T, Suetake H, Capella-Gutierrez S, de Wit R, Kinoshita BP, Soiland-Reyes S. Recording provenance of workflow runs with RO-Crate. PLoS One 2024; 19:e0309210. [PMID: 39255315 PMCID: PMC11386446 DOI: 10.1371/journal.pone.0309210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 08/08/2024] [Indexed: 09/12/2024] Open
Abstract
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated objects (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain.
Collapse
Affiliation(s)
- Simone Leo
- Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Pula (CA), Italy
| | - Michael R. Crusoe
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- DTL Projects, Utrecht, The Netherlands
- Forschungszentrum Jülich, Jülich, Germany
| | | | - Raül Sirvent
- Barcelona Supercomputing Center, Barcelona, Spain
| | - Alexander Kanitz
- Biozentrum, University of Basel, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | | - Rudolf Wittner
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Institute of Computer Science, Masaryk University, Brno, Czech Republic
- BBMRI-ERIC, Graz, Austria
| | - Luca Pireddu
- Center for Advanced Studies, Research, and Development in Sardinia (CRS4), Pula (CA), Italy
| | - Daniel Garijo
- Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
| | | | - Iacopo Colonnelli
- Computer Science Department, Università degli Studi di Torino, Torino, Italy
| | - Matej Gallo
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
| | - Tazro Ohta
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Shizuoka, Japan
- Institute for Advanced Academic Research, Chiba University, Chiba, Japan
| | | | | | - Renske de Wit
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | | | - Stian Soiland-Reyes
- Department of Computer Science, The University of Manchester, Manchester, United Kingdom
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Holden NJ. Data sharing considerations to maximize the use of pathogen biological and genomics resources data for public health. J Appl Microbiol 2024; 135:lxae204. [PMID: 39113269 DOI: 10.1093/jambio/lxae204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 07/17/2024] [Accepted: 08/06/2024] [Indexed: 09/05/2024]
Abstract
Public sector data associated with health are a highly valuable resource with multiple potential end-users, from health practitioners, researchers, public bodies, policy makers, and industry. Data for infectious disease agents are used for epidemiological investigations, disease tracking and assessing emerging biological threats. Yet, there are challenges in collating and re-using it. Data may be derived from multiple sources, generated and collected for different purposes. While public sector data should be open access, providers from public health settings or from agriculture, food, or environment sources have sensitivity criteria to meet with ethical restrictions in how the data can be reused. Yet, sharable datasets need to describe the pathogens with sufficient contextual metadata for maximal utility, e.g. associated disease or disease potential and the pathogen source. As data comprise the physical resources of pathogen collections and potentially associated sequences, there is an added emerging technical issue of integration of omics 'big data'. Thus, there is a need to identify suitable means to integrate and safely access diverse data for pathogens. Established genomics alliances and platforms interpret and meet the challenges in different ways depending on their own context. Nonetheless, their templates and frameworks provide a solution for adaption to pathogen datasets.
Collapse
Affiliation(s)
- Nicola J Holden
- Scotland's Rural College, Department of Rural Land Use, Craibstone Campus, Aberdeen AB21 9YA, United Kingdom
| |
Collapse
|
6
|
Kullo IJ. Promoting equity in polygenic risk assessment through global collaboration. Nat Genet 2024; 56:1780-1787. [PMID: 39103647 DOI: 10.1038/s41588-024-01843-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 06/24/2024] [Indexed: 08/07/2024]
Abstract
The long delay before genomic technologies become available in low- and middle-income countries is a concern from both scientific and ethical standpoints. Polygenic risk scores (PRSs), a relatively recent advance in genomics, could have a substantial impact on promoting health by improving disease risk prediction and guiding preventive strategies. However, clinical use of PRSs in their current forms might widen global health disparities, as their portability to diverse groups is limited. This Perspective highlights the need for global collaboration to develop and implement PRSs that perform equitably across the world. Such collaboration requires capacity building and the generation of new data in low-resource settings, the sharing of harmonized genotype and phenotype data securely across borders, novel population genetics and statistical methods to improve PRS performance, and thoughtful clinical implementation in diverse settings. All this needs to occur while considering the ethical, legal and social implications, with support from regulatory and funding agencies and policymakers.
Collapse
Affiliation(s)
- Iftikhar J Kullo
- Department of Cardiovascular Medicine and the Gonda Vascular Center, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
7
|
Wright MW, Thaxton CL, Nelson T, DiStefano MT, Savatt JM, Brush MH, Cheung G, Mandell ME, Wulf B, Ward TJ, Goehringer S, O'Neill T, Weller P, Preston CG, Keseler IM, Goldstein JL, Strande NT, McGlaughon J, Azzariti DR, Cordova I, Dziadzio H, Babb L, Riehle K, Milosavljevic A, Martin CL, Rehm HL, Plon SE, Berg JS, Riggs ER, Klein TE. Generating Clinical-Grade Gene-Disease Validity Classifications Through the ClinGen Data Platforms. Annu Rev Biomed Data Sci 2024; 7:31-50. [PMID: 38663031 DOI: 10.1146/annurev-biodatasci-102423-112456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
Clinical genetic laboratories must have access to clinically validated biomedical data for precision medicine. A lack of accessibility, normalized structure, and consistency in evaluation complicates interpretation of disease causality, resulting in confusion in assessing the clinical validity of genes and genetic variants for diagnosis. A key goal of the Clinical Genome Resource (ClinGen) is to fill the knowledge gap concerning the strength of evidence supporting the role of a gene in a monogenic disease, which is achieved through a process known as Gene-Disease Validity curation. Here we review the work of ClinGen in developing a curation infrastructure that supports the standardization, harmonization, and dissemination of Gene-Disease Validity data through the creation of frameworks and the utilization of common data standards. This infrastructure is based on several applications, including the ClinGen GeneTracker, Gene Curation Interface, Data Exchange, GeneGraph, and website.
Collapse
Affiliation(s)
- Matt W Wright
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA; ,
| | - Courtney L Thaxton
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA;
| | | | - Marina T DiStefano
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | | | - Matthew H Brush
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Gloria Cheung
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA; ,
| | - Mark E Mandell
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA; ,
| | - Bryan Wulf
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA; ,
| | - T J Ward
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA;
| | | | - Terry O'Neill
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | | | - Christine G Preston
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA; ,
| | - Ingrid M Keseler
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA; ,
| | - Jennifer L Goldstein
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA;
| | | | - Jennifer McGlaughon
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA;
| | - Danielle R Azzariti
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | | | - Hannah Dziadzio
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Lawrence Babb
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Kevin Riehle
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | | | | | - Heidi L Rehm
- Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Sharon E Plon
- Department of Pediatrics, Division of Hematology-Oncology, Baylor College of Medicine, Houston, Texas, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Jonathan S Berg
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA;
| | | | - Teri E Klein
- Departments of Medicine (Biomedical Informatics Research) and Genetics, Stanford University School of Medicine, Stanford, California, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, USA; ,
| |
Collapse
|
8
|
Cho H, Froelicher D, Dokmai N, Nandi A, Sadhuka S, Hong MM, Berger B. Privacy-Enhancing Technologies in Biomedical Data Science. Annu Rev Biomed Data Sci 2024; 7:317-343. [PMID: 39178425 PMCID: PMC11346580 DOI: 10.1146/annurev-biodatasci-120423-120107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
The rapidly growing scale and variety of biomedical data repositories raise important privacy concerns. Conventional frameworks for collecting and sharing human subject data offer limited privacy protection, often necessitating the creation of data silos. Privacy-enhancing technologies (PETs) promise to safeguard these data and broaden their usage by providing means to share and analyze sensitive data while protecting privacy. Here, we review prominent PETs and illustrate their role in advancing biomedicine. We describe key use cases of PETs and their latest technical advances and highlight recent applications of PETs in a range of biomedical domains. We conclude by discussing outstanding challenges and social considerations that need to be addressed to facilitate a broader adoption of PETs in biomedical data science.
Collapse
Affiliation(s)
- Hyunghoon Cho
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut, USA;
| | - David Froelicher
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Natnatee Dokmai
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut, USA;
| | - Anupama Nandi
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut, USA;
| | - Shuvom Sadhuka
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Matthew M Hong
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
9
|
Sharma V, McDermott J, Keen J, Foster S, Whelan P, Newman W. Pharmacogenetics Clinical Decision Support Systems for Primary Care in England: Co-Design Study. J Med Internet Res 2024; 26:e49230. [PMID: 39042886 PMCID: PMC11303890 DOI: 10.2196/49230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 12/22/2023] [Accepted: 05/13/2024] [Indexed: 07/25/2024] Open
Abstract
BACKGROUND Pharmacogenetics can impact patient care and outcomes through personalizing the selection of medicines, resulting in improved efficacy and a reduction in harmful side effects. Despite the existence of compelling clinical evidence and international guidelines highlighting the benefits of pharmacogenetics in clinical practice, implementation within the National Health Service in the United Kingdom is limited. An important barrier to overcome is the development of IT solutions that support the integration of pharmacogenetic data into health care systems. This necessitates a better understanding of the role of electronic health records (EHRs) and the design of clinical decision support systems that are acceptable to clinicians, particularly those in primary care. OBJECTIVE Explore the needs and requirements of a pharmacogenetic service from the perspective of primary care clinicians with a view to co-design a prototype solution. METHODS We used ethnographic and think-aloud observations, user research workshops, and prototyping. The participants for this study included general practitioners and pharmacists. In total, we undertook 5 sessions of ethnographic observation to understand current practices and workflows. This was followed by 3 user research workshops, each with its own topic guide starting with personas and early ideation, through to exploring the potential of clinical decision support systems and prototype design. We subsequently analyzed workshop data using affinity diagramming and refined the key requirements for the solution collaboratively as a multidisciplinary project team. RESULTS User research results identified that pharmacogenetic data must be incorporated within existing EHRs rather than through a stand-alone portal. The information presented through clinical decision support systems must be clear, accessible, and user-friendly as the service will be used by a range of end users. Critically, the information should be displayed within the prescribing workflow, rather than discrete results stored statically in the EHR. Finally, the prescribing recommendations should be authoritative to provide confidence in the validity of the results. Based on these findings we co-designed an interactive prototype, demonstrating pharmacogenetic clinical decision support integrated within the prescribing workflow of an EHR. CONCLUSIONS This study marks a significant step forward in the design of systems that support pharmacogenetic-guided prescribing in primary care settings. Clinical decision support systems have the potential to enhance the personalization of medicines, provided they are effectively implemented within EHRs and present pharmacogenetic data in a user-friendly, actionable, and standardized format. Achieving this requires the development of a decoupled, standards-based architecture that allows for the separation of data from application, facilitating integration across various EHRs through the use of application programming interfaces (APIs). More globally, this study demonstrates the role of health informatics and user-centered design in realizing the potential of personalized medicine at scale and ensuring that the benefits of genomic innovation reach patients and populations effectively.
Collapse
Affiliation(s)
- Videha Sharma
- Centre for Health Informatics, Division of Informatics, Imaging and Data Science, University of Manchester, Manchester, United Kingdom
- Pankhurst Institute for Health Technology Research and Innovation, University of Manchester, Manchester, United Kingdom
| | - John McDermott
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
- Division of Evolution, Infection and Genomics, School of Biological Sciences, University of Manchester, Manchester, United Kingdom
| | - Jessica Keen
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
| | - Simon Foster
- Centre for Health Informatics, Division of Informatics, Imaging and Data Science, University of Manchester, Manchester, United Kingdom
| | - Pauline Whelan
- Centre for Health Informatics, Division of Informatics, Imaging and Data Science, University of Manchester, Manchester, United Kingdom
| | - William Newman
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
| |
Collapse
|
10
|
Zalis M, Viana Veloso GG, Aguiar Jr. PN, Gimenes N, Reis MX, Matsas S, Ferreira CG. Next-generation sequencing impact on cancer care: applications, challenges, and future directions. Front Genet 2024; 15:1420190. [PMID: 39045325 PMCID: PMC11263191 DOI: 10.3389/fgene.2024.1420190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 06/13/2024] [Indexed: 07/25/2024] Open
Abstract
Fundamentally precision oncology illustrates the path in which molecular profiling of tumors can illuminate their biological behavior, diversity, and likely outcomes by identifying distinct genetic mutations, protein levels, and other biomarkers that underpin cancer progression. Next-generation sequencing became an indispensable diagnostic tool for diagnosis and treatment guidance in current clinical practice. Nowadays, tissue analysis benefits from further support through methods like comprehensive genomic profiling and liquid biopsies. However, precision medicine in the field of oncology presents specific hurdles, such as the cost-benefit balance and widespread accessibility, particularly in countries with low- and middle-income. A key issue is how to effectively extend next-generation sequencing to all cancer patients, thus empowering treatment decision-making. Concerns also extend to the quality and preservation of tissue samples, as well as the evaluation of health technologies. Moreover, as technology advances, novel next-generation sequencing assessments are being developed, including the study of Fragmentomics. Therefore, our objective was to delineate the primary uses of next-generation sequencing, discussing its' applications, limitations, and prospective paths forward in Oncology.
Collapse
Affiliation(s)
- Mariano Zalis
- Oncoclínicas&Co/MedSir, Rio de Janeiro, Brazil
- Medical School of the Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Gilson Gabriel Viana Veloso
- Oncoclínicas&Co/MedSir, Rio de Janeiro, Brazil
- Santa Casa de Misericórdia de Belo Horizonte, Belo Horizonte, Brazil
| | | | | | | | - Silvio Matsas
- Centro de Estudos e Pesquisas de Hematologia e Oncologia (CEPHO), Sao Paulo, Brazil
| | | |
Collapse
|
11
|
Valenti A, Falcone I, Valenti F, Ricciardi E, Di Martino S, Maccallini MT, Cerro M, Desiderio F, Miseo L, Russillo M, Guerrisi A. Biobanks as an Indispensable Tool in the "Era" of Precision Medicine: Key Role in the Management of Complex Diseases, Such as Melanoma. J Pers Med 2024; 14:731. [PMID: 39063985 PMCID: PMC11278009 DOI: 10.3390/jpm14070731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 07/02/2024] [Accepted: 07/03/2024] [Indexed: 07/28/2024] Open
Abstract
In recent years, medicine has undergone profound changes, strongly entering a new phase defined as the "era of precision medicine". In this context, patient clinical management involves various scientific approaches that allow for a comprehensive pathology evaluation: from preventive processes (where applicable) to genetic and diagnostic studies. In this scenario, biobanks play an important role and, over the years, have gained increasing prestige, moving from small deposits to large collections of samples of various natures. Disease-oriented biobanks are rapidly developing as they provide useful information for the management of complex diseases, such as melanoma. Indeed, melanoma, given its highly heterogeneous characteristics, is one of the oncologic diseases with the greatest clinical and therapeutic management complexity. So, the possibility of extrapolating tissue, genetic and imaging data from dedicated biobanks could result in more selective study approaches. In this review, we specifically analyze the several biobank types to evaluate their role in technology development, patient monitoring and research of new biomarkers, especially in the melanoma context.
Collapse
Affiliation(s)
- Alessandro Valenti
- Radiology and Diagnostic Imaging Unit, Department of Clinical and Dermatological Research, San Gallicano Dermatological Institute IRCCS, 00144 Rome, Italy; (F.D.); (L.M.); (A.G.)
| | - Italia Falcone
- SAFU, Department of Research, Advanced Diagnostics, and Technological Innovation, IRCCS-Regina Elena National Cancer Institute, 00144 Rome, Italy;
| | - Fabio Valenti
- UOC Oncological Translational Research, IRCCS-Regina Elena National Cancer Institute, 00144 Rome, Italy; (F.V.); (E.R.)
| | - Elena Ricciardi
- UOC Oncological Translational Research, IRCCS-Regina Elena National Cancer Institute, 00144 Rome, Italy; (F.V.); (E.R.)
| | - Simona Di Martino
- UOC Pathology Unit, Biobank IRCCS-Regina Elena National Cancer Institute, 00144 Rome, Italy;
| | - Maria Teresa Maccallini
- Department of Clinical and Molecular Medicine, Università La Sapienza di Roma, 00185 Rome, Italy; (M.T.M.); (M.C.)
| | - Marianna Cerro
- Department of Clinical and Molecular Medicine, Università La Sapienza di Roma, 00185 Rome, Italy; (M.T.M.); (M.C.)
| | - Flora Desiderio
- Radiology and Diagnostic Imaging Unit, Department of Clinical and Dermatological Research, San Gallicano Dermatological Institute IRCCS, 00144 Rome, Italy; (F.D.); (L.M.); (A.G.)
| | - Ludovica Miseo
- Radiology and Diagnostic Imaging Unit, Department of Clinical and Dermatological Research, San Gallicano Dermatological Institute IRCCS, 00144 Rome, Italy; (F.D.); (L.M.); (A.G.)
| | - Michelangelo Russillo
- Division of Medical Oncology, IRCCS-Regina Elena National Cancer Institute, 00144 Rome, Italy;
| | - Antonino Guerrisi
- Radiology and Diagnostic Imaging Unit, Department of Clinical and Dermatological Research, San Gallicano Dermatological Institute IRCCS, 00144 Rome, Italy; (F.D.); (L.M.); (A.G.)
| |
Collapse
|
12
|
Abueg LAL, Afgan E, Allart O, Awan AH, Bacon WA, Baker D, Bassetti M, Batut B, Bernt M, Blankenberg D, Bombarely A, Bretaudeau A, Bromhead CJ, Burke ML, Capon PK, Čech M, Chavero-Díez M, Chilton JM, Collins TJ, Coppens F, Coraor N, Cuccuru G, Cumbo F, Davis J, De Geest PF, de Koning W, Demko M, DeSanto A, Begines JMD, Doyle MA, Droesbeke B, Erxleben-Eggenhofer A, Föll MC, Formenti G, Fouilloux A, Gangazhe R, Genthon T, Goecks J, Beltran ANG, Goonasekera NA, Goué N, Griffin TJ, Grüning BA, Guerler A, Gundersen S, Gustafsson OJR, Hall C, Harrop TW, Hecht H, Heidari A, Heisner T, Heyl F, Hiltemann S, Hotz HR, Hyde CJ, Jagtap PD, Jakiela J, Johnson JE, Joshi J, Jossé M, Jum’ah K, Kalaš M, Kamieniecka K, Kayikcioglu T, Konkol M, Kostrykin L, Kucher N, Kumar A, Kuntz M, Lariviere D, Lazarus R, Bras YL, Corguillé GL, Lee J, Leo S, Liborio L, Libouban R, Tabernero DL, Lopez-Delisle L, Los LS, Mahmoud A, Makunin I, Marin P, Mehta S, Mok W, Moreno PA, Morier-Genoud F, Mosher S, Müller T, Nasr E, Nekrutenko A, Nelson TM, Oba AJ, Ostrovsky A, Polunina PV, Poterlowicz K, Price EJ, Price GR, Rasche H, Raubenolt B, Royaux C, Sargent L, Savage MT, Savchenko V, Savchenko D, Schatz MC, Seguineau P, Serrano-Solano B, Soranzo N, Srikakulam SK, Suderman K, Syme AE, Tangaro MA, Tedds JA, Tekman M, Cheng (Mike) Thang W, Thanki AS, Uhl M, van den Beek M, Varshney D, Vessio J, Videm P, Von Kuster G, Watson GR, Whitaker-Allen N, Winter U, Wolstencroft M, Zambelli F, Zierep P, Zoabi R. The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res 2024; 52:W83-W94. [PMID: 38769056 PMCID: PMC11223835 DOI: 10.1093/nar/gkae410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/18/2024] [Accepted: 05/02/2024] [Indexed: 05/22/2024] Open
Abstract
Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public Galaxy services by platform stability, tool and reference dataset diversity, training, support and integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements in accessibility, tool discoverability through Galaxy Labs/subdomains, and a redesigned Galaxy ToolShed. Galaxy tool capabilities are progressing in two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support. Engagement with global research consortia is being increased by developing more workflows in Galaxy and by resourcing the public Galaxy services to run them. The Galaxy Training Network (GTN) portfolio has grown in both size, and accessibility, through learning paths and direct integration with Galaxy tools that feature in training courses. Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface. Environmental impact assessment is also helping engage users and developers, reminding them of their role in sustainability, by displaying estimated CO2 emissions generated by each Galaxy job.
Collapse
|
13
|
Ho CH. Secondary Use of Health Data for Medical AI: A Cross-Regional Examination of Taiwan and the EU. Asian Bioeth Rev 2024; 16:407-422. [PMID: 39022371 PMCID: PMC11250748 DOI: 10.1007/s41649-024-00279-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 07/20/2024] Open
Abstract
This paper conducts a comparative analysis of data governance mechanisms concerning the secondary use of health data in Taiwan and the European Union (EU). Both regions have adopted distinctive approaches and regulations for utilizing health data beyond primary care, encompassing areas such as medical research and healthcare system enhancement. Through an examination of these models, this study seeks to elucidate the strategies, frameworks, and legal structures employed by Taiwan and the EU to strike a delicate balance between the imperative of data-driven healthcare innovation and the safeguarding of individual privacy rights. This paper examines and compares several key aspects of the secondary use of health data in Taiwan and the EU. These aspects include data governance frameworks, legal and regulatory frameworks, data access and sharing mechanisms, and privacy and security considerations. This comparative exploration offers invaluable insights into the evolving global landscape of health data governance. It provides a deeper understanding of the strategies implemented by these regions to harness the potential of health data while upholding the ethical and legal considerations surrounding its secondary use. The findings aim to inform best practices for responsible and effective health data utilization, particularly in the context of medical AI applications.
Collapse
Affiliation(s)
- Chih-hsing Ho
- Institute of European and American Studies, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
14
|
Suetake H, Tanjo T, Ishii M, P. Kinoshita B, Fujino T, Hachiya T, Kodama Y, Fujisawa T, Ogasawara O, Shimizu A, Arita M, Fukusato T, Igarashi T, Ohta T. Sapporo: A workflow execution service that encourages the reuse of workflows in various languages in bioinformatics. F1000Res 2024; 11:889. [PMID: 39070189 PMCID: PMC11282396 DOI: 10.12688/f1000research.122924.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/14/2024] [Indexed: 07/30/2024] Open
Abstract
The increased demand for efficient computation in data analysis encourages researchers in biomedical science to use workflow systems. Workflow systems, or so-called workflow languages, are used for the description and execution of a set of data analysis steps. Workflow systems increase the productivity of researchers, specifically in fields that use high-throughput DNA sequencing applications, where scalable computation is required. As systems have improved the portability of data analysis workflows, research communities are able to share workflows to reduce the cost of building ordinary analysis procedures. However, having multiple workflow systems in a research field has resulted in the distribution of efforts across different workflow system communities. As each workflow system has its unique characteristics, it is not feasible to learn every single system in order to use publicly shared workflows. Thus, we developed Sapporo, an application to provide a unified layer of workflow execution upon the differences of various workflow systems. Sapporo has two components: an application programming interface (API) that receives the request of a workflow run and a browser-based client for the API. The API follows the Workflow Execution Service API standard proposed by the Global Alliance for Genomics and Health. The current implementation supports the execution of workflows in four languages: Common Workflow Language, Workflow Description Language, Snakemake, and Nextflow. With its extensible and scalable design, Sapporo can support the research community in utilizing valuable resources for data analysis.
Collapse
Affiliation(s)
- Hirotaka Suetake
- Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Tomoya Tanjo
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Manabu Ishii
- Genome Analytics Japan Inc, Shinjuku, Tokyo, Japan
| | - Bruno P. Kinoshita
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Curii Corporation, Sommerville, MA, USA
| | - Takeshi Fujino
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| | | | - Yuichi Kodama
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Takatomo Fujisawa
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Osamu Ogasawara
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Atsushi Shimizu
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Masanori Arita
- Bioinformation and DDBJ Center, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Tsukasa Fukusato
- Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Takeo Igarashi
- Department of Creative Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Tazro Ohta
- Institute for Advanced Academic Research, Chiba University, Chiba, Japan
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Mishima, Shizuoka, Japan
- Department of Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University, Chiba, Chiba, Japan
| |
Collapse
|
15
|
Sedlić F, Sertić J, Markotić A, Primorac D, Slavica A, Zibar L, Vlahoviček K, Kušec V, Barić I, Paar V, Borovečki F, Žmak L, Kurolt IC, Canki-Klain N, Roksandić S, Rinčić I, Jurić H, Škaro V, Marjanović D, Projić P, Primorac D, Starčević A, Vujaklija D, Šikić M, Križanović K, Gamulin S. The Applied Genomics Development Strategy by the Croatian Academy of Sciences and Arts paves the way for the future development of applied genomics in Croatia. Croat Med J 2024; 65:297-302. [PMID: 38868976 PMCID: PMC11157260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2024] Open
Affiliation(s)
| | | | | | - Dragan Primorac
- Dragan Primorac, St. Catherine Specialty Hospital, Zagreb, Croatia,
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Robertson AJ, Mallett AJ, Stark Z, Sullivan C. It Is in Our DNA: Bringing Electronic Health Records and Genomic Data Together for Precision Medicine. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e55632. [PMID: 38935958 PMCID: PMC11211701 DOI: 10.2196/55632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 03/08/2024] [Accepted: 04/09/2024] [Indexed: 06/29/2024]
Abstract
Health care is at a turning point. We are shifting from protocolized medicine to precision medicine, and digital health systems are facilitating this shift. By providing clinicians with detailed information for each patient and analytic support for decision-making at the point of care, digital health technologies are enabling a new era of precision medicine. Genomic data also provide clinicians with information that can improve the accuracy and timeliness of diagnosis, optimize prescribing, and target risk reduction strategies, all of which are key elements for precision medicine. However, genomic data are predominantly seen as diagnostic information and are not routinely integrated into the clinical workflows of electronic medical records. The use of genomic data holds significant potential for precision medicine; however, as genomic data are fundamentally different from the information collected during routine practice, special considerations are needed to use this information in a digital health setting. This paper outlines the potential of genomic data integration with electronic records, and how these data can enable precision medicine.
Collapse
Affiliation(s)
- Alan J Robertson
- Faculty of Medicine, University of Queensland, Hertson, Australia
- Medical Genomics Group, QIMR Berghofer Medical Research Institute, Brisbane, Australia
- Queensland Digital Health Centre, University of Queensland, Brisbane, Australia
- The Genomic Institute, Department of Health, Queensland Government, Brisbane, Australia
| | - Andrew J Mallett
- Department of Renal Medicine, Townsville University Hospital, Townsville, Australia
- College of Medicine and Dentistry, James Cook University, Townsville, Australia
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Australia
| | - Zornitza Stark
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Melbourne, Australia
- Australian Genomics, Melbourne, Australia
- University of Melbourne, Melbourne, Australia
| | - Clair Sullivan
- Queensland Digital Health Centre, University of Queensland, Brisbane, Australia
- Centre for Health Services Research, Faculty of Medicine, University of Queensland, Woolloongabba, Australia
- Metro North Hospital and Health Service, Department of Health, Queensland Government, Brisbane, Australia
| |
Collapse
|
17
|
Danis D, Bamshad MJ, Bridges Y, Cacheiro P, Carmody LC, Chong JX, Coleman B, Dalgleish R, Freeman PJ, Graefe ASL, Groza T, Jacobsen JOB, Klocperk A, Kusters M, Ladewig MS, Marcello AJ, Mattina T, Mungall CJ, Munoz-Torres MC, Reese JT, Rehburg F, Reis BCS, Schuetz C, Smedley D, Strauss T, Sundaramurthi JC, Thun S, Wissink K, Wagstaff JF, Zocche D, Haendel MA, Robinson PN. A corpus of GA4GH Phenopackets: case-level phenotyping for genomic diagnostics and discovery. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.29.24308104. [PMID: 38854034 PMCID: PMC11160806 DOI: 10.1101/2024.05.29.24308104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms. Here, we present phenopacket-store. Version 0.1.12 of phenopacket-store includes 4916 phenopackets representing 277 Mendelian and chromosomal diseases associated with 236 genes, and 2872 unique pathogenic alleles curated from 605 different publications. This represents the first large-scale collection of case-level, standardized phenotypic information derived from case reports in the literature with detailed descriptions of the clinical data and will be useful for many purposes, including the development and testing of software for prioritizing genes and diseases in diagnostic genomics, machine learning analysis of clinical phenotype data, patient stratification, and genotype-phenotype correlations. This corpus also provides best-practice examples for curating literature-derived data using the GA4GH Phenopacket Schema.
Collapse
Affiliation(s)
- Daniel Danis
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Michael J Bamshad
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
- Department of Pediatrics, Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98195, USA
| | - Yasemin Bridges
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Pilar Cacheiro
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Leigh C Carmody
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Jessica X Chong
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
- Brotman-Baty Institute for Precision Medicine, 1959 NE Pacific Street, Box 357657, Seattle WA 98195, USA
| | - Ben Coleman
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
| | - Raymond Dalgleish
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Peter J Freeman
- Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - Adam S L Graefe
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Tudor Groza
- Rare Care Centre, Perth Children's Hospital, Nedlands, WA 6009, Australia
- SingHealth Duke-NUS Institute of Precision Medicine, 5 Hospital Drive Level 9, Singapore 169609, Singapore
- Telethon Kids Institute, Nedlands, WA 6009, Australia
| | - Julius O B Jacobsen
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Adam Klocperk
- Department of Immunology, 2nd Faculty of Medicine, Charles University and University Hospital in Motol, Prague, Czech Republic
| | - Maaike Kusters
- Department of Paediatric Immunology, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
- University College London Institute of Child Health, London, United Kingdom
| | - Markus S Ladewig
- Department of Ophthalmology, University Clinic Marburg - Campus Fulda, Fulda, Germany
| | - Anthony J Marcello
- Department of Pediatrics, Division of Genetic Medicine, University of Washington, 1959 NE Pacific Street, Box 357371, Seattle, WA 98195, USA
| | - Teresa Mattina
- Medica Genetics University of Catania Italy
- Morgagni foundation and Clinic, Catania, Italy
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Monica C Munoz-Torres
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Ccampus
| | - Justin T Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Filip Rehburg
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Bárbara C S Reis
- Department of Immunology, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
- High Complexity Laboratory, National Institute of Women's, Children's and Adolescents' Health Fernandes Figueira, Rio de Janeiro, Brazil
| | - Catharina Schuetz
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Damian Smedley
- William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Timmy Strauss
- Department of Pediatrics, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- University Center for Rare Diseases, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | | | - Sylvia Thun
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Kyran Wissink
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Utrecht University, Utrecht, the Netherlands
| | | | - David Zocche
- North West Thames Regional Genetics Service, Northwick Park & St Mark's Hospitals, London, UK
| | | | - Peter N Robinson
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- The Jackson Institute for Genomic Medicine, 10 Discovery Drive, Farmington CT 06032, USA
- ELLIS-European Laboratory for Learning and Intelligent Systems
| |
Collapse
|
18
|
Rujano MA, Boiten JW, Ohmann C, Canham S, Contrino S, David R, Ewbank J, Filippone C, Connellan C, Custers I, van Nuland R, Mayrhofer MT, Holub P, Álvarez EG, Bacry E, Hughes N, Freeberg MA, Schaffhauser B, Wagener H, Sánchez-Pla A, Bertolini G, Panagiotopoulou M. Sharing sensitive data in life sciences: an overview of centralized and federated approaches. Brief Bioinform 2024; 25:bbae262. [PMID: 38836701 DOI: 10.1093/bib/bbae262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/19/2024] [Indexed: 06/06/2024] Open
Abstract
Biomedical data are generated and collected from various sources, including medical imaging, laboratory tests and genome sequencing. Sharing these data for research can help address unmet health needs, contribute to scientific breakthroughs, accelerate the development of more effective treatments and inform public health policy. Due to the potential sensitivity of such data, however, privacy concerns have led to policies that restrict data sharing. In addition, sharing sensitive data requires a secure and robust infrastructure with appropriate storage solutions. Here, we examine and compare the centralized and federated data sharing models through the prism of five large-scale and real-world use cases of strategic significance within the European data sharing landscape: the French Health Data Hub, the BBMRI-ERIC Colorectal Cancer Cohort, the federated European Genome-phenome Archive, the Observational Medical Outcomes Partnership/OHDSI network and the EBRAINS Medical Informatics Platform. Our analysis indicates that centralized models facilitate data linkage, harmonization and interoperability, while federated models facilitate scaling up and legal compliance, as the data typically reside on the data generator's premises, allowing for better control of how data are shared. This comparative study thus offers guidance on the selection of the most appropriate sharing strategy for sensitive datasets and provides key insights for informed decision-making in data sharing efforts.
Collapse
Affiliation(s)
- Maria A Rujano
- European Clinical Research Infrastructure Network (ECRIN), Boulevard Saint Jacques 30, 75014, Paris, France
| | - Jan-Willem Boiten
- Foundation Lygature, Jaarbeursplein 6, 3521 AL, Utrecht, The Netherlands
| | - Christian Ohmann
- European Clinical Research Infrastructure Network (ECRIN), Boulevard Saint Jacques 30, 75014, Paris, France
| | - Steve Canham
- European Clinical Research Infrastructure Network (ECRIN), Boulevard Saint Jacques 30, 75014, Paris, France
| | - Sergio Contrino
- European Clinical Research Infrastructure Network (ECRIN), Boulevard Saint Jacques 30, 75014, Paris, France
| | - Romain David
- European Research Infrastructure on Highly Pathogenic Agents (ERINHA AISBL), rue du Trône 98/Boîte 4B, 1050, Brussels, Belgium
| | - Jonathan Ewbank
- European Research Infrastructure on Highly Pathogenic Agents (ERINHA AISBL), rue du Trône 98/Boîte 4B, 1050, Brussels, Belgium
| | - Claudia Filippone
- European Research Infrastructure on Highly Pathogenic Agents (ERINHA AISBL), rue du Trône 98/Boîte 4B, 1050, Brussels, Belgium
| | - Claire Connellan
- European Research Infrastructure on Highly Pathogenic Agents (ERINHA AISBL), rue du Trône 98/Boîte 4B, 1050, Brussels, Belgium
| | - Ilse Custers
- Foundation Lygature, Jaarbeursplein 6, 3521 AL, Utrecht, The Netherlands
| | - Rick van Nuland
- Foundation Lygature, Jaarbeursplein 6, 3521 AL, Utrecht, The Netherlands
| | - Michaela Th Mayrhofer
- Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-ERIC), Neue Stiftingtalstrasse 2/B/6, 8010, Graz, Austria
| | - Petr Holub
- Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-ERIC), Neue Stiftingtalstrasse 2/B/6, 8010, Graz, Austria
| | - Eva García Álvarez
- Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-ERIC), Neue Stiftingtalstrasse 2/B/6, 8010, Graz, Austria
| | - Emmanuel Bacry
- Health Data Hub (HDH), rue Georges Pitard 9, 75015, Paris, France
| | - Nigel Hughes
- Janssen Research and Development, Antwerpseweg 15, 2340, Beerse, Belgium
| | - Mallory A Freeberg
- European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute (EBI), Wellcome Genome Campus, CB10 1SD, Hinxton, Cambridgeshire, United Kingdom
| | - Birgit Schaffhauser
- Department of Clinical Neurosciences, Centre Hospitalier Universitaire Vaudois (CHUV), Rue du Bugnon 21, 1011, Lausanne, Switzerland
| | - Harald Wagener
- Center for Digital Health, BIH@Charité University Medicine, Anna-Louisa-Karsch-Straße 2, 10178, Berlin, Germany
| | - Alex Sánchez-Pla
- Department of Genetics, Microbiology and Statistics, Universitat de Barcelona, Diagonal 643, 08028, Barcelona, Spain
| | - Guido Bertolini
- Laboratory of Clinical Epidemiology, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via GB Camozzi 3, 24020, Ranica (Bergamo), Italy
| | - Maria Panagiotopoulou
- European Clinical Research Infrastructure Network (ECRIN), Boulevard Saint Jacques 30, 75014, Paris, France
| |
Collapse
|
19
|
Kim E, Davidsen T, Davis-Dusenbery BN, Baumann A, Maggio A, Chen Z, Meerzaman D, Casas-Silva E, Pot D, Pihl T, Otridge J, Shalley E, Barnholtz-Sloan JS, Kerlavage AR. NCI Cancer Research Data Commons: Lessons Learned and Future State. Cancer Res 2024; 84:1404-1409. [PMID: 38488510 PMCID: PMC11063686 DOI: 10.1158/0008-5472.can-23-2730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 01/19/2024] [Accepted: 03/05/2024] [Indexed: 05/03/2024]
Abstract
More than ever, scientific progress in cancer research hinges on our ability to combine datasets and extract meaningful interpretations to better understand diseases and ultimately inform the development of better treatments and diagnostic tools. To enable the successful sharing and use of big data, the NCI developed the Cancer Research Data Commons (CRDC), providing access to a large, comprehensive, and expanding collection of cancer data. The CRDC is a cloud-based data science infrastructure that eliminates the need for researchers to download and store large-scale datasets by allowing them to perform analysis where data reside. Over the past 10 years, the CRDC has made significant progress in providing access to data and tools along with training and outreach to support the cancer research community. In this review, we provide an overview of the history and the impact of the CRDC to date, lessons learned, and future plans to further promote data sharing, accessibility, interoperability, and reuse. See related articles by Brady et al., p. 1384, Wang et al., p. 1388, and Pot et al., p. 1396.
Collapse
Affiliation(s)
- Erika Kim
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
| | - Tanja Davidsen
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
| | | | | | | | - Zhaoyi Chen
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
- NIH, Bethesda, Maryland
| | - Daoud Meerzaman
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
| | - Esmeralda Casas-Silva
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
| | - David Pot
- General Dynamics Information Technology, Falls Church, Virginia
| | - Todd Pihl
- Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - John Otridge
- Frederick National Laboratory for Cancer Research, Frederick, Maryland
| | - Eve Shalley
- Essex, an Emmes Company, Rockville, Maryland
| | | | - Jill S. Barnholtz-Sloan
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
- Division of Cancer Epidemiology and Genetics, NCI, Rockville, Maryland
| | - Anthony R. Kerlavage
- Center for Biomedical Informatics and Information Technology, NCI, Rockville, Maryland
| |
Collapse
|
20
|
Augustin RC, Cai WL, Luke JJ, Bao R. Facts and Hopes in Using Omics to Advance Combined Immunotherapy Strategies. Clin Cancer Res 2024; 30:1724-1732. [PMID: 38236069 PMCID: PMC11062841 DOI: 10.1158/1078-0432.ccr-22-2241] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/28/2023] [Accepted: 12/22/2023] [Indexed: 01/19/2024]
Abstract
The field of oncology has been transformed by immune checkpoint inhibitors (ICI) and other immune-based agents; however, many patients do not receive a durable benefit. While biomarker assessments from pivotal ICI trials have uncovered certain mechanisms of resistance, results thus far have only scraped the surface. Mechanisms of resistance are as complex as the tumor microenvironment (TME) itself, and the development of effective therapeutic strategies will only be possible by building accurate models of the tumor-immune interface. With advancement of multi-omic technologies, high-resolution characterization of the TME is now possible. In addition to sequencing of bulk tumor, single-cell transcriptomic, proteomic, and epigenomic data as well as T-cell receptor profiling can now be simultaneously measured and compared between responders and nonresponders to ICI. Spatial sequencing and imaging platforms have further expanded the dimensionality of existing technologies. Rapid advancements in computation and data sharing strategies enable development of biologically interpretable machine learning models to integrate data from high-resolution, multi-omic platforms. These models catalyze the identification of resistance mechanisms and predictors of benefit in ICI-treated patients, providing scientific foundation for novel clinical trials. Moving forward, we propose a framework by which in silico screening, functional validation, and clinical trial biomarker assessment can be used for the advancement of combined immunotherapy strategies.
Collapse
Affiliation(s)
- Ryan C. Augustin
- UPMC Hillman Cancer Center, Pittsburgh, PA
- University of Pittsburgh, Department of Medicine, Pittsburgh, PA
- Mayo Clinic, Department of Medical Oncology, Rochester, MN
| | - Wesley L. Cai
- University of Pittsburgh, Department of Medicine, Pittsburgh, PA
| | - Jason J. Luke
- UPMC Hillman Cancer Center, Pittsburgh, PA
- University of Pittsburgh, Department of Medicine, Pittsburgh, PA
| | - Riyue Bao
- UPMC Hillman Cancer Center, Pittsburgh, PA
- University of Pittsburgh, Department of Medicine, Pittsburgh, PA
| |
Collapse
|
21
|
Jentsch M, Schneider-Lunitz V, Taron U, Braun M, Ishaque N, Wagener H, Conrad C, Twardziok S. Creating cloud platforms for supporting FAIR data management in biomedical research projects. F1000Res 2024; 13:8. [PMID: 38779317 PMCID: PMC11109697 DOI: 10.12688/f1000research.140624.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 05/25/2024] Open
Abstract
Biomedical research projects are becoming increasingly complex and require technological solutions that support all phases of the data lifecycle and application of the FAIR principles. At the Berlin Institute of Health (BIH), we have developed and established a flexible and cost-effective approach to building customized cloud platforms for supporting research projects. The approach is based on a microservice architecture and on the management of a portfolio of supported services. On this basis, we created and maintained cloud platforms for several international research projects. In this article, we present our approach and argue that building customized cloud platforms can offer multiple advantages over using multi-project platforms. Our approach is transferable to other research environments and can be easily adapted by other projects and other service providers.
Collapse
Affiliation(s)
- Marcel Jentsch
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Valentin Schneider-Lunitz
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Ulrike Taron
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Martin Braun
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Naveed Ishaque
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Harald Wagener
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Christian Conrad
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Sven Twardziok
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| |
Collapse
|
22
|
Claussnitzer M, Parikh VN, Wagner AH, Arbesfeld JA, Bult CJ, Firth HV, Muffley LA, Nguyen Ba AN, Riehle K, Roth FP, Tabet D, Bolognesi B, Glazer AM, Rubin AF. Minimum information and guidelines for reporting a multiplexed assay of variant effect. Genome Biol 2024; 25:100. [PMID: 38641812 PMCID: PMC11027375 DOI: 10.1186/s13059-024-03223-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 03/25/2024] [Indexed: 04/21/2024] Open
Abstract
Multiplexed assays of variant effect (MAVEs) have emerged as a powerful approach for interrogating thousands of genetic variants in a single experiment. The flexibility and widespread adoption of these techniques across diverse disciplines have led to a heterogeneous mix of data formats and descriptions, which complicates the downstream use of the resulting datasets. To address these issues and promote reproducibility and reuse of MAVE data, we define a set of minimum information standards for MAVE data and metadata and outline a controlled vocabulary aligned with established biomedical ontologies for describing these experimental designs.
Collapse
Affiliation(s)
- Melina Claussnitzer
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Cambridge, MA, 02142, USA
| | - Victoria N Parikh
- Stanford Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Alex H Wagner
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, 43215, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, 43210, USA
| | - Jeremy A Arbesfeld
- The Steve and Cindy Rasmussen Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, 43215, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Carol J Bult
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | - Helen V Firth
- Wellcome Sanger Institute, Hinxton, Cambridge, UK
- Dept of Medical Genetics, Cambridge University Hospitals NHS Trust, Cambridge, UK
| | - Lara A Muffley
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Alex N Nguyen Ba
- Department of Biology, University of Toronto at Mississauga, Mississauga, ON, Canada
| | - Kevin Riehle
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Frederick P Roth
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Daniel Tabet
- Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Benedetta Bolognesi
- Institute for Bioengineering of Catalunya (IBEC), The Barcelona Institute of Science and Technology, Barcelona, Spain.
| | - Andrew M Glazer
- Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| | - Alan F Rubin
- Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia.
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
23
|
Lawson J, Rahimzadeh V, Baek J, Dove ES. Achieving Procedural Parity in Managing Access to Genomic and Related Health Data: A Global Survey of Data Access Committee Members. Biopreserv Biobank 2024; 22:123-129. [PMID: 37192473 PMCID: PMC11265613 DOI: 10.1089/bio.2022.0205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2023] Open
Abstract
Data access committees (DACs) are critical players in the data sharing ecosystem. DACs review requests for access to data held in one or more repositories and where specific constraints determine how the data may be used and by whom. Our team surveyed DAC members affiliated with genomic data repositories worldwide to understand standard processes and procedures, operational metrics, bottlenecks, and efficiencies, as well as their perspectives on possible improvements to quality review. We found that DAC operations and systemic issues were common across repositories globally. In general, DAC members endeavored to achieve an appropriate balance of review efficiency, quality, and compliance. Our results suggest a similarly proportionate path forward that helps DACs pursue mutual improvements to efficiency and compliance without sacrificing review quality.
Collapse
Affiliation(s)
- Jonathan Lawson
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Vasiliki Rahimzadeh
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | - Jinyoung Baek
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Edward S. Dove
- School of Law, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
24
|
Ohta T, Hananoe A, Fukushima-Nomura A, Ashizaki K, Sekita A, Seita J, Kawakami E, Sakurada K, Amagai M, Koseki H, Kawasaki H. Best practices for multimodal clinical data management and integration: An atopic dermatitis research case. Allergol Int 2024; 73:255-263. [PMID: 38102028 DOI: 10.1016/j.alit.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 10/06/2023] [Accepted: 11/03/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND In clinical research on multifactorial diseases such as atopic dermatitis, data-driven medical research has become more widely used as means to clarify diverse pathological conditions and to realize precision medicine. However, modern clinical data, characterized as large-scale, multimodal, and multi-center, causes difficulties in data integration and management, which limits productivity in clinical data science. METHODS We designed a generic data management flow to collect, cleanse, and integrate data to handle different types of data generated at multiple institutions by 10 types of clinical studies. We developed MeDIA (Medical Data Integration Assistant), a software to browse the data in an integrated manner and extract subsets for analysis. RESULTS MeDIA integrates and visualizes data and information on research participants obtained from multiple studies. It then provides a sophisticated interface that supports data management and helps data scientists retrieve the data sets they need. Furthermore, the system promotes the use of unified terms such as identifiers or sampling dates to reduce the cost of pre-processing by data analysts. We also propose best practices in clinical data management flow, which we learned from the development and implementation of MeDIA. CONCLUSIONS The MeDIA system solves the problem of multimodal clinical data integration, from complex text data such as medical records to big data such as omics data from a large number of patients. The system and the proposed best practices can be applied not only to allergic diseases but also to other diseases to promote data-driven medical research.
Collapse
Affiliation(s)
- Tazro Ohta
- Medical Data Mathematical Reasoning Team, Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Kanagawa, Japan; Institute for Advanced Academic Research, Chiba University, Chiba, Japan; Department of Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University, Chiba, Japan
| | - Ayaka Hananoe
- Medical Data Mathematical Reasoning Team, Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Kanagawa, Japan; Laboratory for Developmental Genetics, RIKEN Center for Integrative Medical Sciences, RIKEN, Kanagawa, Japan; Department of Dermatology, Keio University School of Medicine, Tokyo, Japan
| | | | - Koichi Ashizaki
- Laboratory for Developmental Genetics, RIKEN Center for Integrative Medical Sciences, RIKEN, Kanagawa, Japan; Department of Dermatology, Keio University School of Medicine, Tokyo, Japan; Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Kanagawa, Japan
| | - Aiko Sekita
- Laboratory for Developmental Genetics, RIKEN Center for Integrative Medical Sciences, RIKEN, Kanagawa, Japan
| | - Jun Seita
- Laboratory for Integrative Genomics, RIKEN Center for Integrative Medical Sciences, RIKEN, Kanagawa, Japan; Medical Data Deep Learning Team, Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Kanagawa, Japan; Medical Data Sharing Unit, Infrastructure Research and Development Division, RIKEN Information R&D and Strategy Headquarters, RIKEN, Saitama, Japan
| | - Eiryo Kawakami
- Medical Data Mathematical Reasoning Team, Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Kanagawa, Japan; Institute for Advanced Academic Research, Chiba University, Chiba, Japan; Department of Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University, Chiba, Japan
| | - Kazuhiro Sakurada
- Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, Kanagawa, Japan; Department of Extended Intelligence for Medicine, The Ishii-Ishibashi Laboratory, Keio University School of Medicine, Tokyo, Japan
| | - Masayuki Amagai
- Department of Dermatology, Keio University School of Medicine, Tokyo, Japan; Laboratory for Skin Homeostasis, RIKEN Center for Integrative Medical Sciences, RIKEN, Kanagawa, Japan
| | - Haruhiko Koseki
- Laboratory for Developmental Genetics, RIKEN Center for Integrative Medical Sciences, RIKEN, Kanagawa, Japan
| | - Hiroshi Kawasaki
- Laboratory for Developmental Genetics, RIKEN Center for Integrative Medical Sciences, RIKEN, Kanagawa, Japan; Department of Dermatology, Keio University School of Medicine, Tokyo, Japan; Laboratory for Skin Homeostasis, RIKEN Center for Integrative Medical Sciences, RIKEN, Kanagawa, Japan.
| |
Collapse
|
25
|
Nikolski M, Hovig E, Al-Shahrour F, Blomberg N, Scollen S, Valencia A, Saunders G. Roadmap for a European cancer data management and precision medicine infrastructure. NATURE CANCER 2024; 5:367-372. [PMID: 38321342 DOI: 10.1038/s43018-023-00717-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2024]
Affiliation(s)
- Macha Nikolski
- University of Bordeaux, CNRS-IBGC, UMR 5095, Bordeaux, France.
- University of Bordeaux, Bordeaux Bioinformatics Center CBiB, Bordeaux, France.
| | - Eivind Hovig
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Fatima Al-Shahrour
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | - Serena Scollen
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Alfonso Valencia
- Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- ICREA, Barcelona, Spain
| | - Gary Saunders
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
- EATRIS-ERIC, Amsterdam, the Netherlands
| |
Collapse
|
26
|
Grossman RL, Boyles RR, Davis-Dusenbery BN, Haddock A, Heath AP, O'Connor BD, Resnick AC, Taylor DM, Ahalt S. A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments. Sci Data 2024; 11:241. [PMID: 38409183 PMCID: PMC10897146 DOI: 10.1038/s41597-024-03041-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 02/03/2024] [Indexed: 02/28/2024] Open
Affiliation(s)
- Robert L Grossman
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA.
| | - Rebecca R Boyles
- RTI International, Research Triangle Park, Triangle Park, NC, USA
| | | | | | | | | | - Adam C Resnick
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Deanne M Taylor
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Stan Ahalt
- University of North Carolina, Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
27
|
Boutros M, Baumann M, Bigas A, Chaabane L, Guérin J, Habermann JK, Jobard A, Pelicci PG, Stegle O, Tonon G, Valencia A, Winkler EC, Blanc P, De Maria R, Medema RH, Nagy P, Tabernero J, Solary E. UNCAN.eu: Toward a European Federated Cancer Research Data Hub. Cancer Discov 2024; 14:30-35. [PMID: 38213296 PMCID: PMC10784740 DOI: 10.1158/2159-8290.cd-23-1111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
To enable a collective effort that generates a new level of UNderstanding CANcer (UNCAN.eu) [Cancer Discov (2022) 12 (11): OF1], the European Union supports the creation of a sustainable platform that connects cancer research across Member States. A workshop hosted in Heidelberg gathered European cancer experts to identify ongoing initiatives that may contribute to building this platform and discuss the governance and long-term evolution of a European Federated Cancer Data Hub.
Collapse
Affiliation(s)
- Michael Boutros
- German Cancer Research Center (DKFZ), Division of Signaling and Functional Genomics and Heidelberg University, Medical Faculty Heidelberg, Institute for Human Genetics, Heidelberg, Germany
| | | | - Anna Bigas
- Centro de Investigación Biomedica en Red-Oncología (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain
| | - Linda Chaabane
- Euro-BioImaging ERIC, Med-Hub, National Research Council of Italy (CNR), Turin, Italy
| | | | - Jens K. Habermann
- Interdisciplinary Center for Biobanking-Lübeck (ICB-L), University of Lübeck, Lübeck, Germany
| | - Aurélien Jobard
- Institut National du Cancer (INCa), Boulogne Billancourt, France
| | - Pier Giuseppe Pelicci
- Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milano, Italy
| | - Oliver Stegle
- DKFZ, Division of Computational Genomics and Systems Genetics, Heidelberg, Germany
- Genome Biology Unit, European Molecular Biology, Heidelberg, Germany
| | - Giovanni Tonon
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Alfonso Valencia
- Barcelona Supercomputing Center, Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| | - Eva C. Winkler
- National Center for Tumor Diseases (NCT), Heidelberg University, Section Translational Medical Ethics, Heidelberg, Germany
| | | | - Ruggero De Maria
- Fondazione Policlinico Universitario A. Gemelli IRCCS, Università Cattolica del Sacro Cuore, Rome, Italy
| | - Rene H. Medema
- Oncode Institute and The Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Peter Nagy
- National Institute of Oncology and the National Tumor Biology Laboratory, Budapest, Department of Anatomy and Histology, HUN-REN–UVMB Laboratory of Redox Biology Research Group, University of Veterinary Medicine, and Chemistry Institute, University of Debrecen, Debrecen, Hungary
| | - Josep Tabernero
- DKFZ, Division of Computational Genomics and Systems Genetics, Heidelberg, Germany
- Vall d'Hebron Hospital Campus & Institute of Oncology (VHIO), Barcelona, Spain
| | - Eric Solary
- Université Paris-Saclay and INSERM, Gustave Roussy Cancer Center, Villejuif, France
| |
Collapse
|
28
|
Viner C, Ishak CA, Johnson J, Walker NJ, Shi H, Sjöberg-Herrera MK, Shen SY, Lardo SM, Adams DJ, Ferguson-Smith AC, De Carvalho DD, Hainer SJ, Bailey TL, Hoffman MM. Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet. Genome Biol 2024; 25:11. [PMID: 38191487 PMCID: PMC10773111 DOI: 10.1186/s13059-023-03070-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 09/21/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND Transcription factors bind DNA in specific sequence contexts. In addition to distinguishing one nucleobase from another, some transcription factors can distinguish between unmodified and modified bases. Current models of transcription factor binding tend not to take DNA modifications into account, while the recent few that do often have limitations. This makes a comprehensive and accurate profiling of transcription factor affinities difficult. RESULTS Here, we develop methods to identify transcription factor binding sites in modified DNA. Our models expand the standard A/C/G/T DNA alphabet to include cytosine modifications. We develop Cytomod to create modified genomic sequences and we also enhance the MEME Suite, adding the capacity to handle custom alphabets. We adapt the well-established position weight matrix (PWM) model of transcription factor binding affinity to this expanded DNA alphabet. Using these methods, we identify modification-sensitive transcription factor binding motifs. We confirm established binding preferences, such as the preference of ZFP57 and C/EBPβ for methylated motifs and the preference of c-Myc for unmethylated E-box motifs. CONCLUSIONS Using known binding preferences to tune model parameters, we discover novel modified motifs for a wide array of transcription factors. Finally, we validate our binding preference predictions for OCT4 using cleavage under targets and release using nuclease (CUT&RUN) experiments across conventional, methylation-, and hydroxymethylation-enriched sequences. Our approach readily extends to other DNA modifications. As more genome-wide single-base resolution modification data becomes available, we expect that our method will yield insights into altered transcription factor binding affinities across many different modifications.
Collapse
Affiliation(s)
- Coby Viner
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Charles A Ishak
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - James Johnson
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Nicolas J Walker
- Department of Genetics, University of Cambridge, Cambridge, England
| | - Hui Shi
- Department of Genetics, University of Cambridge, Cambridge, England
| | - Marcela K Sjöberg-Herrera
- Wellcome Sanger Institute, Cambridge, England
- Faculty of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Shu Yi Shen
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Santana M Lardo
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | | | | | - Daniel D De Carvalho
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Sarah J Hainer
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Timothy L Bailey
- Department of Pharmacology, University of Nevada, Reno, Reno, NV, USA
| | - Michael M Hoffman
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada.
| |
Collapse
|
29
|
Oliva A, Kaphle A, Reguant R, Sng LMF, Twine NA, Malakar Y, Wickramarachchi A, Keller M, Ranbaduge T, Chan EKF, Breen J, Buckberry S, Guennewig B, Haas M, Brown A, Cowley MJ, Thorne N, Jain Y, Bauer DC. Future-proofing genomic data and consent management: a comprehensive review of technology innovations. Gigascience 2024; 13:giae021. [PMID: 38837943 PMCID: PMC11152178 DOI: 10.1093/gigascience/giae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 01/15/2024] [Accepted: 04/09/2024] [Indexed: 06/07/2024] Open
Abstract
Genomic information is increasingly used to inform medical treatments and manage future disease risks. However, any personal and societal gains must be carefully balanced against the risk to individuals contributing their genomic data. Expanding our understanding of actionable genomic insights requires researchers to access large global datasets to capture the complexity of genomic contribution to diseases. Similarly, clinicians need efficient access to a patient's genome as well as population-representative historical records for evidence-based decisions. Both researchers and clinicians hence rely on participants to consent to the use of their genomic data, which in turn requires trust in the professional and ethical handling of this information. Here, we review existing and emerging solutions for secure and effective genomic information management, including storage, encryption, consent, and authorization that are needed to build participant trust. We discuss recent innovations in cloud computing, quantum-computing-proof encryption, and self-sovereign identity. These innovations can augment key developments from within the genomics community, notably GA4GH Passports and the Crypt4GH file container standard. We also explore how decentralized storage as well as the digital consenting process can offer culturally acceptable processes to encourage data contributions from ethnic minorities. We conclude that the individual and their right for self-determination needs to be put at the center of any genomics framework, because only on an individual level can the received benefits be accurately balanced against the risk of exposing private information.
Collapse
Affiliation(s)
- Adrien Oliva
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Anubhav Kaphle
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Roc Reguant
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Letitia M F Sng
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Natalie A Twine
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Yuwan Malakar
- Responsible Innovation Future Science Platform, Commonwealth Scientific and Industrial Research Organisation, Brisbane, 41 Boggo Rd, Dutton Park QLD 4102, Australia
| | - Anuradha Wickramarachchi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Marcel Keller
- Data61, Commonwealth Scientific and Industrial Research Organisation, Level 5/13 Garden St, Eveleigh NSW 2015, Australia
| | - Thilina Ranbaduge
- Data61, Commonwealth Scientific and Industrial Research Organisation, Building 101, Clunies Ross St, Black Mountain, Canberra, ACT 2601, Australia
| | - Eva K F Chan
- NSW Health Pathology, Sydney, 1 Reserve Road, St Leonards NSW 2065, Australia
| | - James Breen
- Telethon Kids Institute, Perth, WA 6009, Australia
- National Centre for Indigenous Genomics, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Sam Buckberry
- Telethon Kids Institute, Perth, WA 6009, Australia
- National Centre for Indigenous Genomics, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Boris Guennewig
- Sydney Medical School, Brain and Mind Centre, The University of Sydney, Sydney, 94 Mallett St, Camperdown NSW 2050, Australia
| | - Matilda Haas
- Australian Genomics, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, Parkville, Victoria 3052, Australia
| | - Alex Brown
- Telethon Kids Institute, Perth, WA 6009, Australia
- National Centre for Indigenous Genomics, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Mark J Cowley
- Children’s Cancer Institute, Lowy Cancer Research Centre, Level 4, Lowy Cancer Research Centre Corner Botany & High Streets UNSW Kensington Campus UNSW Sydney, Kensington NSW 2052, Australia
- School of Clinical Medicine, UNSW Medicine & Health, Wallace Wurth Building (C27), Cnr High St & Botany St, UNSW Sydney, Kensington NSW 2052, Australia
| | - Natalie Thorne
- University of Melbourne, Melbourne, Parkville VIC 3052, Australia
- Melbourne Genomics Health Alliance, Melbourne 1G, Walter and Eliza Hall Institute/1G Royal Parade, Parkville VIC 3052, Australia
- Walter and Eliza Hall Institute, Melbourne, 1G, Walter and Eliza Hall Institute/1G Royal Parade, Parkville VIC 3052, Australia
| | - Yatish Jain
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
- Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Applied BioSciences 205B Culloden Rd Macquarie University, NSW 2109, Australia
| | - Denis C Bauer
- Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Applied BioSciences 205B Culloden Rd Macquarie University, NSW 2109, Australia
- Department of Biomedical Sciences, MQ Health General Practice - Macquarie University, Suite 305, Level 3/2 Technology Pl, Macquarie Park NSW 2109, Australia
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Gate 13, Kintore Avenue University of Adelaide, Adelaide SA 5000, Australia
| |
Collapse
|
30
|
Eloe-Fadrosh EA, Mungall CJ, Miller MA, Smith M, Patil SS, Kelliher JM, Johnson LYD, Rodriguez FE, Chain PSG, Hu B, Thornton MB, McCue LA, McHardy AC, Harris NL, Reddy TBK, Mukherjee S, Hunter CI, Walls R, Schriml LM. A Practical Approach to Using the Genomic Standards Consortium MIxS Reporting Standard for Comparative Genomics and Metagenomics. Methods Mol Biol 2024; 2802:587-609. [PMID: 38819573 DOI: 10.1007/978-1-0716-3838-5_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Comparative analysis of (meta)genomes necessitates aggregation, integration, and synthesis of well-annotated data using standards. The Genomic Standards Consortium (GSC) collaborates with the research community to develop and maintain the Minimum Information about any (x) Sequence (MIxS) reporting standard for genomic data. To facilitate the use of the GSC's MIxS reporting standard, we provide a description of the structure and terminology, how to navigate ontologies for required terms in MIxS, and demonstrate practical usage through a soil metagenome example.
Collapse
Affiliation(s)
- Emiley A Eloe-Fadrosh
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Christopher J Mungall
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Mark Andrew Miller
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Montana Smith
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Sujay Sanjeev Patil
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Julia M Kelliher
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Leah Y D Johnson
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | | | - Patrick S G Chain
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Bin Hu
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Michael B Thornton
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Lee Ann McCue
- Pacific Northwest National Laboratory, Richland, WA, USA
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Nomi L Harris
- Environmental Genomics and System Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - T B K Reddy
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Supratim Mukherjee
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christopher I Hunter
- GigaScience Press, Hong Kong Science Park, Pak Shek Kok, New Territories, Hong Kong
| | | | - Lynn M Schriml
- University of Maryland School of Medicine, Institute for Genome Sciences, Baltimore, MD, USA
| |
Collapse
|
31
|
Sánchez MC, Hernández Clemente JC, García López FJ. Public and Patients' Perspectives Towards Data and Sample Sharing for Research: An Overview of Empirical Findings. J Empir Res Hum Res Ethics 2023; 18:319-345. [PMID: 37936410 DOI: 10.1177/15562646231212644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
We aimed to review the attitudes and perspectives of the public and patients towards the sharing of data and biospecimens for research and to identify common dimensions, regardless of setting. Our review included systematic, scoping or thematic reviews of empirical studies retrieved from Medline (PubMed interface), Web of Science, Scopus, ProQuest and Cochrane Reviews. The main themes identified and synthesised across the 14 reviews were readiness and motivations; potential risks and safeguards; trust, transparency and accountability; autonomy and preferred type of consent; and factors influencing data and biospecimen sharing and consent. Sociodemographic factors and research and individual context remain relevant influencing factors in all settings, while preferences for types of consent are highly heterogeneous. Trusted environments and adapted consent options with participant engagement are relevant to improve research participation.
Collapse
|
32
|
Abstract
Rare diseases are a leading cause of infant mortality and lifelong disability. To improve outcomes, timely diagnosis and effective treatments are needed. Genomic sequencing has transformed the traditional diagnostic process, providing rapid, accurate and cost-effective genetic diagnoses to many. Incorporating genomic sequencing into newborn screening programmes at the population scale holds the promise of substantially expanding the early detection of treatable rare diseases, with stored genomic data potentially benefitting health over a lifetime and supporting further research. As several large-scale newborn genomic screening projects launch internationally, we review the challenges and opportunities presented, particularly the need to generate evidence of benefit and to address the ethical, legal and psychosocial issues that genomic newborn screening raises.
Collapse
Affiliation(s)
- Zornitza Stark
- Australian Genomics, Melbourne, Victoria, Australia.
- Victorian Clinical Genetics Services, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia.
| | - Richard H Scott
- Great Ormond Street Hospital for Children, London, UK
- UCL Great Ormond Street Institute of Child Health, London, UK
- Genomics England, London, UK
| |
Collapse
|
33
|
Huttenhower C, Finn RD, McHardy AC. Challenges and opportunities in sharing microbiome data and analyses. Nat Microbiol 2023; 8:1960-1970. [PMID: 37783751 DOI: 10.1038/s41564-023-01484-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 08/28/2023] [Indexed: 10/04/2023]
Abstract
Microbiome data, metadata and analytical workflows have become 'big' in terms of volume and complexity. Although the infrastructure and technologies to share data have been established, the interdisciplinary and multi-omic nature of the field can make resources difficult to identify and use. Following best practices for data deposition requires substantial effort, with sometimes little obvious reward. Gaps remain where microbiome-specific resources for data sharing or reproducibility do not yet exist. We outline available best practices, challenges to their adoption and opportunities in data sharing in microbiome research. We showcase examples of best practices and advocate for their enforcement and incentivization for data sharing. This includes recognition of data curation and sharing endeavours by individuals, institutions, journals and funders. Opportunities for progress include enabling microbiome-specific databases to incorporate future methods for data analysis, integration and reuse.
Collapse
Affiliation(s)
- Curtis Huttenhower
- Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Departments of Biostatistics and Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.
| |
Collapse
|
34
|
Skantharajah N, Baichoo S, Boughtwood TF, Casas-Silva E, Chandrasekharan S, Dave SM, Fakhro KA, Falcon de Vargas AB, Gayle SS, Gupta VK, Hendricks-Sturrup R, Hobb AE, Li S, Llamas B, Lopez-Correa C, Machirori M, Melendez-Zajgla J, Millner MA, Page AJ, Paglione LD, Raven-Adams MC, Smith L, Thomas EM, Kumuthini J, Corpas M. Equity, diversity, and inclusion at the Global Alliance for Genomics and Health. CELL GENOMICS 2023; 3:100386. [PMID: 37868041 PMCID: PMC10589617 DOI: 10.1016/j.xgen.2023.100386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/24/2023]
Abstract
A lack of diversity in genomics for health continues to hinder equitable leadership and access to precision medicine approaches for underrepresented populations. To avoid perpetuating biases within the genomics workforce and genomic data collection practices, equity, diversity, and inclusion (EDI) must be addressed. This paper documents the journey taken by the Global Alliance for Genomics and Health (a genomics-based standard-setting and policy-framing organization) to create a more equitable, diverse, and inclusive environment for its standards and members. Initial steps include the creation of two groups: the Equity, Diversity, and Inclusion Advisory Group and the Regulatory and Ethics Diversity Group. Following a framework that we call "Reflected in our Teams, Reflected in our Standards," both groups address EDI at different stages in their policy development process.
Collapse
Affiliation(s)
- Neerjah Skantharajah
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Global Alliance for Genomics and Health, Toronto, ON, Canada
| | | | - Tiffany F. Boughtwood
- Australian Genomics, Parkville, VIC, Australia
- Murdoch Children’s Research Institute, Parkville, VIC, Australia
| | | | | | - Sanjay M. Dave
- Department of Biotechnology, Hemchandracharya North Gujarat University, Patan, Gujarat, India
| | - Khalid A. Fakhro
- Department of Human Genetics, Sidra Medicine, Doha, Qatar
- Department of Genetic Medicine, Weill Cornell Medical College, Doha, Qatar
| | - Aida B. Falcon de Vargas
- Hospital Vargas de Caracas, Vargas Medical School, Universidad Central de Venezuela, Caracas, Venezuela
- Hospital de Clínicas Caracas, Caracas, Venezuela
| | | | - Vivek K. Gupta
- Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW, Australia
| | | | | | - Stephanie Li
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Broad Institute, Cambridge, MA, USA
| | - Bastien Llamas
- Australian Centre for Ancient DNA, School of Biological Sciences and The Environment Institute, University of Adelaide, Adelaide, SA, Australia
- ARC Centre of Excellence for Australian Biodiversity and Heritage, University of Adelaide, Adelaide, SA, Australia
- National Centre for Indigenous Genomics, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
- Indigenous Genomics, Telethon Kids Institute, Adelaide, SA, Australia
| | | | - Mavis Machirori
- Ada Lovelace Institute, London, UK
- PEALS, Newcastle University, Newcastle Upon Tyne, UK
| | | | - Mareike A. Millner
- Maastricht University, Health Law and Governance Group, Maastricht, the Netherlands
| | - Angela J.H. Page
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Broad Institute, Cambridge, MA, USA
| | - Laura D. Paglione
- Spherical Cow Group, New York, NY, USA
- Laura Paglione LLC, New York, NY, USA
| | - Maili C. Raven-Adams
- Global Alliance for Genomics and Health, Toronto, ON, Canada
- Wellcome Sanger Institute, Hinxton, UK
| | - Lindsay Smith
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Global Alliance for Genomics and Health, Toronto, ON, Canada
| | - Ericka M. Thomas
- The All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Judit Kumuthini
- South African National Bioinformatics Institute, University of Western Cape, Cape Town, South Africa
| | - Manuel Corpas
- School of Life Sciences, University of Westminster, London, UK
| |
Collapse
|
35
|
Zhang JY. Commoning genomic solidarity to improve global health equality. CELL GENOMICS 2023; 3:100405. [PMID: 37868031 PMCID: PMC10589616 DOI: 10.1016/j.xgen.2023.100405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/24/2023]
Abstract
This article underlines two key asynchronies between prevailing governing logic and expanding practices in somatic human genome editing that are hindering an effective and orderly translation of the new technology into public good. The first is a "genomic sovereignty" framing adopted by a number of non-Western countries that may exacerbate data biases in global research and that directs policy attention away from the necessary structural changes required to achieve non-discriminatory and equitable genomic healthcare. The other is a global deficiency in attending to "science at large": the challenge of regulating new assemblages of societal interests that advocate controversial or experimental research, often outside of conventional institutions and aided by "policy shopping." Both issues point to the fact that genomic research does not represent a well-defined scientific commons but rather a domain that requires active "commoning," with the aim of fostering genomic solidarity that coordinates responsible research within and across national boundaries.
Collapse
Affiliation(s)
- Joy Y. Zhang
- Centre for Global Science and Epistemic Justice, Division for the Study of Law, Society and Social Justice, University of Kent, Canterbury, UK
| |
Collapse
|
36
|
Stenzinger A, Moltzen EK, Winkler E, Molnar-Gabor F, Malek N, Costescu A, Jensen BN, Nowak F, Pinto C, Ottersen OP, Schirmacher P, Nordborg J, Seufferlein T, Fröhling S, Edsjö A, Garcia-Foncillas J, Normanno N, Lundgren B, Friedman M, Bolanos N, Tatton-Brown K, Hill S, Rosenquist R. Implementation of precision medicine in healthcare-A European perspective. J Intern Med 2023; 294:437-454. [PMID: 37455247 DOI: 10.1111/joim.13698] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
The technical development of high-throughput sequencing technologies and the parallel development of targeted therapies in the last decade have enabled a transition from traditional medicine to personalized treatment and care. In this way, by using comprehensive genomic testing, more effective treatments with fewer side effects are provided to each patient-that is, precision or personalized medicine (PM). In several European countries-such as in England, France, Denmark, and Spain-the governments have adopted national strategies and taken "top-down" decisions to invest in national infrastructure for PM. In other countries-such as Sweden, Germany, and Italy with regionally organized healthcare systems-the profession has instead taken "bottom-up" initiatives to build competence networks and infrastructure to enable equal access to PM. In this review, we summarize key learnings at the European level on the implementation process to establish sustainable governance and organization for PM at the regional, national, and EU/international levels. We also discuss critical ethical and legal aspects of implementing PM, and the importance of access to real-world data and performing clinical trials for evidence generation, as well as the need for improved reimbursement models, increased cross-disciplinary education and patient involvement. In summary, PM represents a paradigm shift, and modernization of healthcare and all relevant stakeholders-that is, healthcare, academia, policymakers, industry, and patients-must be involved in this system transformation to create a sustainable, non-siloed ecosystem for precision healthcare that benefits our patients and society at large.
Collapse
Affiliation(s)
- Albrecht Stenzinger
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Centers for Personalized Medicine (ZPM), Germany
| | - Ejner K Moltzen
- Innovation Fund Denmark, International Consortium for Personalised Medicine (IC PerMed), Aarhus, Denmark
| | - Eva Winkler
- Section of Translational Medical Ethics, National Center for Tumour Diseases, University Hospital Heidelberg, Heidelberg, Germany
| | | | - Nisar Malek
- Centers for Personalized Medicine (ZPM), Germany
- Department for Internal Medicine, University Hospital Tübingen, Tübingen, Germany
| | | | | | | | - Carmine Pinto
- Medical Oncology, Comprehensive Cancer Centre, AUSL-IRCCS di Reggio Emilia, Reggio Emilia, Italy
| | | | - Peter Schirmacher
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Centers for Personalized Medicine (ZPM), Germany
| | - Jenni Nordborg
- Lif - The Research-Based Pharmaceutical Industry, Stockholm, Sweden
| | - Thomas Seufferlein
- Department of Internal Medicine I, Ulm University Hospital, Ulm, Germany
| | - Stefan Fröhling
- Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Anders Edsjö
- Department of Clinical Genetics, Pathology and Molecular Diagnostics, Office for Medical Services, Region Skåne, Lund, Sweden
- Division of Pathology, Department of Clinical Sciences, Lund University, Lund, Sweden
- Genomic Medicine Sweden (GMS), Sweden
| | - Jesus Garcia-Foncillas
- Department of Oncology and Cancer Institute, Fundacion Jimenez Diaz University Hospital, Autonomous University, Madrid, Spain
| | - Nicola Normanno
- Cell Biology and Biotherapy Unit, Istituto Nazionale Tumori - IRCCS - Fondazione G. Pascale, Napoli, Italy
| | | | - Mikaela Friedman
- Genomic Medicine Sweden (GMS), Sweden
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
| | | | - Katrina Tatton-Brown
- National Genomics Education, NHS England, London, UK
- St George's University Hospitals NHS Foundation Trust, London, UK
- St George's University of London, London, UK
| | - Sue Hill
- Office of Chief Scientific Officer and the Genomics Unit, NHS England, London, UK
| | - Richard Rosenquist
- Genomic Medicine Sweden (GMS), Sweden
- Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm, Sweden
- Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
37
|
Rinaldi E, Drenkhahn C, Gebel B, Saleh K, Tönnies H, von Loewenich FD, Thoma N, Baier C, Boeker M, Hinske LC, Diaz LAP, Behnke M, Ingenerf J, Thun S. Towards interoperability in infection control: a standard data model for microbiology. Sci Data 2023; 10:654. [PMID: 37741862 PMCID: PMC10517923 DOI: 10.1038/s41597-023-02560-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 09/12/2023] [Indexed: 09/25/2023] Open
Abstract
The COVID-19 pandemic has made it clear: sharing and exchanging data among research institutions is crucial in order to efficiently respond to global health threats. This can be facilitated by defining health data models based on interoperability standards. In Germany, a national effort is in progress to create common data models using international healthcare IT standards. In this context, collaborative work on a data set module for microbiology is of particular importance as the WHO has declared antimicrobial resistance one of the top global public health threats that humanity is facing. In this article, we describe how we developed a common model for microbiology data in an interdisciplinary collaborative effort and how we make use of the standard HL7 FHIR and terminologies such as SNOMED CT or LOINC to ensure syntactic and semantic interoperability. The use of international healthcare standards qualifies our data model to be adopted beyond the environment where it was first developed and used at an international level.
Collapse
Affiliation(s)
- Eugenia Rinaldi
- Berlin Institute of Health, Charité Universitätsmedizin, Berlin, Germany.
| | - Cora Drenkhahn
- Institute of Medical Informatics (IMI), University of Lübeck, Lübeck, Germany
| | - Benjamin Gebel
- Klinik für Infektiologie und Mikrobiologie, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany
| | - Kutaiba Saleh
- Data Integration Center, Jena University Hospital, Jena, Germany
| | | | | | - Norbert Thoma
- Institute for Hygiene and Environmental Medicine, Charité Universitätsmedizin, Berlin, Germany
| | - Claas Baier
- Hannover Medical School, Institute for Medical Microbiology and Hospital Epidemiology, Hannover, Germany
| | | | | | - Luis Alberto Peña Diaz
- Institute for Hygiene and Environmental Medicine, Charité Universitätsmedizin, Berlin, Germany
| | - Michael Behnke
- Institute for Hygiene and Environmental Medicine, Charité Universitätsmedizin, Berlin, Germany
| | - Josef Ingenerf
- Institute of Medical Informatics (IMI), University of Lübeck, Lübeck, Germany
| | - Sylvia Thun
- Berlin Institute of Health, Charité Universitätsmedizin, Berlin, Germany
| |
Collapse
|
38
|
Ziemann M, Poulain P, Bora A. The five pillars of computational reproducibility: bioinformatics and beyond. Brief Bioinform 2023; 24:bbad375. [PMID: 37870287 PMCID: PMC10591307 DOI: 10.1093/bib/bbad375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/26/2023] [Accepted: 09/30/2023] [Indexed: 10/24/2023] Open
Abstract
Computational reproducibility is a simple premise in theory, but is difficult to achieve in practice. Building upon past efforts and proposals to maximize reproducibility and rigor in bioinformatics, we present a framework called the five pillars of reproducible computational research. These include (1) literate programming, (2) code version control and sharing, (3) compute environment control, (4) persistent data sharing and (5) documentation. These practices will ensure that computational research work can be reproduced quickly and easily, long into the future. This guide is designed for bioinformatics data analysts and bioinformaticians in training, but should be relevant to other domains of study.
Collapse
Affiliation(s)
- Mark Ziemann
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
- Burnet Institute, Melbourne, Australia
| | - Pierre Poulain
- Université Paris Cité, CNRS, Institut Jacques Monod, Paris, France
| | - Anusuiya Bora
- Deakin University, School of Life and Environmental Sciences, Geelong, Australia
| |
Collapse
|
39
|
Rothe H, Lauer KB, Talbot-Cooper C, Sivizaca Conde DJ. Digital entrepreneurship from cellular data: How omics afford the emergence of a new wave of digital ventures in health. ELECTRONIC MARKETS 2023; 33:48. [PMID: 37724180 PMCID: PMC10505108 DOI: 10.1007/s12525-023-00669-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 08/25/2023] [Indexed: 09/20/2023]
Abstract
Data has become an indispensable input, throughput, and output for the healthcare industry. In recent years, omics technologies such as genomics and proteomics have generated vast amounts of new data at the cellular level including molecular, structural, and functional levels. Cellular data holds the potential to innovate therapeutics, vaccines, diagnostics, consumer products, or even ancestry services. However, data at the cellular level is generated with rapidly evolving omics technologies. These technologies use scientific knowledge from resource-rich environments. This raises the question of how new ventures can use cellular-level data from omics technologies to create new products and scale their business. We report on a series of interviews and a focus group discussion with entrepreneurs, investors, and data providers. By conceptualizing omics technologies as external enablers, we show how characteristics of cellular-level data negatively affect the combination mechanisms that drive venture creation and growth. We illustrate how data characteristics set boundary conditions for innovation and entrepreneurship and highlight how ventures seek to mitigate their impact. Supplementary Information The online version contains supplementary material available at 10.1007/s12525-023-00669-w.
Collapse
Affiliation(s)
- Hannes Rothe
- University of Duisburg Essen, Institute for Computer Science and Business Information Systems, Essen, Germany
| | | | | | | |
Collapse
|
40
|
Deflaux N, Selvaraj MS, Condon HR, Mayo K, Haidermota S, Basford MA, Lunt C, Philippakis AA, Roden DM, Denny JC, Musick A, Collins R, Allen N, Effingham M, Glazer D, Natarajan P, Bick AG. Demonstrating paths for unlocking the value of cloud genomics through cross cohort analysis. Nat Commun 2023; 14:5419. [PMID: 37669985 PMCID: PMC10480504 DOI: 10.1038/s41467-023-41185-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 08/24/2023] [Indexed: 09/07/2023] Open
Abstract
Recently, large scale genomic projects such as All of Us and the UK Biobank have introduced a new research paradigm where data are stored centrally in cloud-based Trusted Research Environments (TREs). To characterize the advantages and drawbacks of different TRE attributes in facilitating cross-cohort analysis, we conduct a Genome-Wide Association Study of standard lipid measures using two approaches: meta-analysis and pooled analysis. Comparison of full summary data from both approaches with an external study shows strong correlation of known loci with lipid levels (R2 ~ 83-97%). Importantly, 90 variants meet the significance threshold only in the meta-analysis and 64 variants are significant only in pooled analysis, with approximately 20% of variants in each of those groups being most prevalent in non-European, non-Asian ancestry individuals. These findings have important implications, as technical and policy choices lead to cross-cohort analyses generating similar, but not identical results, particularly for non-European ancestral populations.
Collapse
Affiliation(s)
| | - Margaret Sunitha Selvaraj
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Henry Robert Condon
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Kelsey Mayo
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Sara Haidermota
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Cardiology, Massachusetts General Hospital, Boston, MA, USA
| | - Melissa A Basford
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Chris Lunt
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | | | - Dan M Roden
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joshua C Denny
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Anjene Musick
- All of Us Research Program, National Institutes of Health, Bethesda, MD, USA
| | - Rory Collins
- Nuffield Department of Population Health, University of Oxford, Oxford, Oxfordshire, UK
- UK Biobank, Cheadle, Stockport, UK
| | - Naomi Allen
- Nuffield Department of Population Health, University of Oxford, Oxford, Oxfordshire, UK
- UK Biobank, Cheadle, Stockport, UK
| | | | | | - Pradeep Natarajan
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Division of Cardiology, Massachusetts General Hospital, Boston, MA, USA
| | - Alexander G Bick
- Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
41
|
Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023; 34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
| | - Meghan A Balk
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Robyn L Ball
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | | | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Laura W Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
| | | | | |
Collapse
|
42
|
Casaletto J, Bernier A, McDougall R, Cline MS. Federated Analysis for Privacy-Preserving Data Sharing: A Technical and Legal Primer. Annu Rev Genomics Hum Genet 2023; 24:347-368. [PMID: 37253596 PMCID: PMC10846631 DOI: 10.1146/annurev-genom-110122-084756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Continued advances in precision medicine rely on the widespread sharing of data that relate human genetic variation to disease. However, data sharing is severely limited by legal, regulatory, and ethical restrictions that safeguard patient privacy. Federated analysis addresses this problem by transferring the code to the data-providing the technical and legal capability to analyze the data within their secure home environment rather than transferring the data to another institution for analysis. This allows researchers to gain new insights from data that cannot be moved, while respecting patient privacy and the data stewards' legal obligations. Because federated analysis is a technical solution to the legal challenges inherent in data sharing, the technology and policy implications must be evaluated together. Here, we summarize the technical approaches to federated analysis and provide a legal analysis of their policy implications.
Collapse
Affiliation(s)
- James Casaletto
- Genomics Institute, University of California, Santa Cruz, California, USA; ,
| | - Alexander Bernier
- Centre of Genomics and Policy, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada; ,
| | - Robyn McDougall
- Centre of Genomics and Policy, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada; ,
| | - Melissa S Cline
- Genomics Institute, University of California, Santa Cruz, California, USA; ,
| |
Collapse
|
43
|
Marrero RJ, Lamba JK. Current Landscape of Genome-Wide Association Studies in Acute Myeloid Leukemia: A Review. Cancers (Basel) 2023; 15:3583. [PMID: 37509244 PMCID: PMC10377605 DOI: 10.3390/cancers15143583] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 06/29/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open
Abstract
Acute myeloid leukemia (AML) is a clonal hematopoietic disease that arises from chromosomal and genetic aberrations in myeloid precursor cells. AML is one of the most common types of acute leukemia in adults; however, it is relatively rare overall, comprising about 1% of all cancers. In the last decade or so, numerous genome-wide association studies (GWAS) have been conducted to screen between hundreds of thousands and millions of variants across many human genomes to discover genetic polymorphisms associated with a particular disease or phenotype. In oncology, GWAS has been performed in almost every commonly occurring cancer. Despite the increasing number of studies published regarding other malignancies, there is a paucity of GWAS studies for AML. In this review article, we will summarize the current status of GWAS in AML.
Collapse
Affiliation(s)
- Richard J. Marrero
- Department of Pharmacotherapy and Translational Research, College of Pharmacy, University of Florida, Gainesville, FL 32610, USA
| | - Jatinder K. Lamba
- Department of Pharmacotherapy and Translational Research, College of Pharmacy, University of Florida, Gainesville, FL 32610, USA
- University of Florida Health Cancer Center, University of Florida, Gainesville, FL 32610, USA
- Center for Pharmacogenomics and Precision Medicine, College of Pharmacy, University of Florida, Gainesville, FL 32610, USA
| |
Collapse
|
44
|
Abondio P, Cilli E, Luiselli D. Human Pangenomics: Promises and Challenges of a Distributed Genomic Reference. Life (Basel) 2023; 13:1360. [PMID: 37374141 DOI: 10.3390/life13061360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/02/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
A pangenome is a collection of the common and unique genomes that are present in a given species. It combines the genetic information of all the genomes sampled, resulting in a large and diverse range of genetic material. Pangenomic analysis offers several advantages compared to traditional genomic research. For example, a pangenome is not bound by the physical constraints of a single genome, so it can capture more genetic variability. Thanks to the introduction of the concept of pangenome, it is possible to use exceedingly detailed sequence data to study the evolutionary history of two different species, or how populations within a species differ genetically. In the wake of the Human Pangenome Project, this review aims at discussing the advantages of the pangenome around human genetic variation, which are then framed around how pangenomic data can inform population genetics, phylogenetics, and public health policy by providing insights into the genetic basis of diseases or determining personalized treatments, targeting the specific genetic profile of an individual. Moreover, technical limitations, ethical concerns, and legal considerations are discussed.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Elisabetta Cilli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
45
|
Sheffield NC, LeRoy NJ, Khoroshevskyi O. Challenges to sharing sample metadata in computational genomics. Front Genet 2023; 14:1154198. [PMID: 37287537 PMCID: PMC10243526 DOI: 10.3389/fgene.2023.1154198] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 05/09/2023] [Indexed: 06/09/2023] Open
Affiliation(s)
- Nathan C. Sheffield
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, United States
- School of Data Science, University of Virginia, Charlottesville, VA, United States
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA, United States
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA, United States
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - Nathan J. LeRoy
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - Oleksandr Khoroshevskyi
- Center for Public Health Genomics, School of Medicine, University of Virginia, Charlottesville, VA, United States
| |
Collapse
|
46
|
Zhang L, Yuan Y, Peng W, Tang B, Li MJ, Gui H, Wang Q, Li M. GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species. Genome Biol 2023; 24:76. [PMID: 37069653 PMCID: PMC10108510 DOI: 10.1186/s13059-023-02906-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 03/22/2023] [Indexed: 04/19/2023] Open
Abstract
Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC's data structure and algorithms are valuable for accelerating large-scale genomic research.
Collapse
Affiliation(s)
- Liubin Zhang
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, China
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Disease Genome Research, Sun Yat-Sen University, Guangzhou, China
| | - Yangyang Yuan
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, China
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Disease Genome Research, Sun Yat-Sen University, Guangzhou, China
- School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, China
| | - Wenjie Peng
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, China
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Disease Genome Research, Sun Yat-Sen University, Guangzhou, China
| | - Bin Tang
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, China
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Disease Genome Research, Sun Yat-Sen University, Guangzhou, China
| | - Mulin Jun Li
- The Province and Ministry Co-Sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin, China
| | - Hongsheng Gui
- Behavioral Health Services, Henry Ford Health, Detroit, MI, USA
- Center for Health Policy & Health Services Research, Henry Ford Health, Detroit, MI, USA
| | - Qiang Wang
- Mental Health Center, West China Hospital, Sichuan University, Chengdu, China
| | - Miaoxin Li
- Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-Sen University, Guangzhou, 510080, China.
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China.
- Center for Disease Genome Research, Sun Yat-Sen University, Guangzhou, China.
- Key Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, 510080, China.
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China.
| |
Collapse
|
47
|
Wagner JK, Yu JH, Fullwiley D, Moore C, Wilson JF, Bamshad MJ, Royal CD. Guidelines for genetic ancestry inference created through roundtable discussions. HGG ADVANCES 2023; 4:100178. [PMID: 36798092 PMCID: PMC9926022 DOI: 10.1016/j.xhgg.2023.100178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 01/03/2023] [Indexed: 01/15/2023] Open
Abstract
The use of genetic and genomic technology to infer ancestry is commonplace in a variety of contexts, particularly in biomedical research and for direct-to-consumer genetic testing. In 2013 and 2015, two roundtables engaged a diverse group of stakeholders toward the development of guidelines for inferring genetic ancestry in academia and industry. This report shares the stakeholder groups' work and provides an analysis of, commentary on, and views from the groundbreaking and sustained dialogue. We describe the engagement processes and the stakeholder groups' resulting statements and proposed guidelines. The guidelines focus on five key areas: application of genetic ancestry inference, assumptions and confidence/laboratory and statistical methods, terminology and population identifiers, impact on individuals and groups, and communication or translation of genetic ancestry inferences. We delineate the terms and limitations of the guidelines and discuss their critical role in advancing the development and implementation of best practices for inferring genetic ancestry and reporting the results. These efforts should inform both governmental regulation and self-regulation.
Collapse
Affiliation(s)
- Jennifer K. Wagner
- School of Engineering Design and Innovation, Pennsylvania State University, University Park, PA 16802, USA
- Institute for Computational and Data Science, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, USA
- Rock Ethics Institute, Pennsylvania State University, University Park, PA 16802, USA
- Penn State Law, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Joon-Ho Yu
- Department of Pediatrics and Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, USA
- Treuman Katz Center for Pediatric Bioethics, Seattle Children’s Hospital and Research Institute, Seattle, WA 98101, USA
| | - Duana Fullwiley
- Department of Anthropology, Stanford University, Stanford, CA 94305, USA
| | | | - James F. Wilson
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh EH8 9AG, Scotland
| | - Michael J. Bamshad
- Department of Pediatrics and Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Division of Genetic Medicine, Seattle Children’s Hospital, Seattle, WA 98101, USA
| | - Charmaine D. Royal
- Departments of African and African American Studies, Biology, Global Health, and Family Medicine and Community Health, Duke University, Durham, NC 27708, USA
| | - Genetic Ancestry Inference Roundtable Participants
- School of Engineering Design and Innovation, Pennsylvania State University, University Park, PA 16802, USA
- Institute for Computational and Data Science, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, USA
- Rock Ethics Institute, Pennsylvania State University, University Park, PA 16802, USA
- Penn State Law, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
- Department of Pediatrics and Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, USA
- Treuman Katz Center for Pediatric Bioethics, Seattle Children’s Hospital and Research Institute, Seattle, WA 98101, USA
- Department of Anthropology, Stanford University, Stanford, CA 94305, USA
- The DNA Detectives, Dana Point, CA, USA
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh EH8 9AG, Scotland
- Department of Pediatrics and Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Division of Genetic Medicine, Seattle Children’s Hospital, Seattle, WA 98101, USA
- Departments of African and African American Studies, Biology, Global Health, and Family Medicine and Community Health, Duke University, Durham, NC 27708, USA
| |
Collapse
|
48
|
Berger B, Yu YW. Navigating bottlenecks and trade-offs in genomic data analysis. Nat Rev Genet 2023; 24:235-250. [PMID: 36476810 PMCID: PMC10204111 DOI: 10.1038/s41576-022-00551-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 12/12/2022]
Abstract
Genome sequencing and analysis allow researchers to decode the functional information hidden in DNA sequences as well as to study cell to cell variation within a cell population. Traditionally, the primary bottleneck in genomic analysis pipelines has been the sequencing itself, which has been much more expensive than the computational analyses that follow. However, an important consequence of the continued drive to expand the throughput of sequencing platforms at lower cost is that often the analytical pipelines are struggling to keep up with the sheer amount of raw data produced. Computational cost and efficiency have thus become of ever increasing importance. Recent methodological advances, such as data sketching, accelerators and domain-specific libraries/languages, promise to address these modern computational challenges. However, despite being more efficient, these innovations come with a new set of trade-offs, both expected, such as accuracy versus memory and expense versus time, and more subtle, including the human expertise needed to use non-standard programming interfaces and set up complex infrastructure. In this Review, we discuss how to navigate these new methodological advances and their trade-offs.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yun William Yu
- Department of Computer and Mathematical Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
- Tri-Campus Department of Mathematics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
49
|
Ahmed M, Kim HJ, Kim DR. Maximizing the utility of public data. Front Genet 2023; 14:1106631. [PMID: 37065493 PMCID: PMC10102460 DOI: 10.3389/fgene.2023.1106631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/21/2023] [Indexed: 04/03/2023] Open
Abstract
The human genome project galvanized the scientific community around an ambitious goal. Upon completion, the project delivered several discoveries, and a new era of research commenced. More importantly, novel technologies and analysis methods materialized during the project period. The cost reduction allowed many more labs to generate high-throughput datasets. The project also served as a model for other extensive collaborations that generated large datasets. These datasets were made public and continue to accumulate in repositories. As a result, the scientific community should consider how these data can be utilized effectively for the purposes of research and the public good. A dataset can be re-analyzed, curated, or integrated with other forms of data to enhance its utility. We highlight three important areas to achieve this goal in this brief perspective. We also emphasize the critical requirements for these strategies to be successful. We draw on our own experience and others in using publicly available datasets to support, develop, and extend our research interest. Finally, we underline the beneficiaries and discuss some risks involved in data reuse.
Collapse
Affiliation(s)
- Mahmoud Ahmed
- Department of Biochemistry and Convergence Medical Sciences, Institute of Health Sciences, College of Medicine, Gyeongsang National University, Jinju, Republic of Korea
| | - Hyun Joon Kim
- Department of Anatomy and Convergence Medical Sciences, Institute of Health Sciences, College of Medicine, Gyeongsang National University, Jinju, Republic of Korea
| | - Deok Ryong Kim
- Department of Biochemistry and Convergence Medical Sciences, Institute of Health Sciences, College of Medicine, Gyeongsang National University, Jinju, Republic of Korea
- *Correspondence: Deok Ryong Kim,
| |
Collapse
|
50
|
Grossman RL. Ten lessons for data sharing with a data commons. Sci Data 2023; 10:120. [PMID: 36878917 PMCID: PMC9988927 DOI: 10.1038/s41597-023-02029-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 02/17/2023] [Indexed: 03/08/2023] Open
Affiliation(s)
- Robert L Grossman
- University of Chicago, Center for Translational Data Science, Chicago, IL, 60615, USA.
| |
Collapse
|