1
|
Solar M, Castañeda V, Ñanculef R, Dombrovskaia L, Araya M. A Data Ingestion Procedure towards a Medical Images Repository. SENSORS (BASEL, SWITZERLAND) 2024; 24:4985. [PMID: 39124032 PMCID: PMC11314906 DOI: 10.3390/s24154985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/02/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024]
Abstract
This article presents an ingestion procedure towards an interoperable repository called ALPACS (Anonymized Local Picture Archiving and Communication System). ALPACS provides services to clinical and hospital users, who can access the repository data through an Artificial Intelligence (AI) application called PROXIMITY. This article shows the automated procedure for data ingestion from the medical imaging provider to the ALPACS repository. The data ingestion procedure was successfully applied by the data provider (Hospital Clínico de la Universidad de Chile, HCUCH) using a pseudo-anonymization algorithm at the source, thereby ensuring that the privacy of patients' sensitive data is respected. Data transfer was carried out using international communication standards for health systems, which allows for replication of the procedure by other institutions that provide medical images. OBJECTIVES This article aims to create a repository of 33,000 medical CT images and 33,000 diagnostic reports with international standards (HL7 HAPI FHIR, DICOM, SNOMED). This goal requires devising a data ingestion procedure that can be replicated by other provider institutions, guaranteeing data privacy by implementing a pseudo-anonymization algorithm at the source, and generating labels from annotations via NLP. METHODOLOGY Our approach involves hybrid on-premise/cloud deployment of PACS and FHIR services, including transfer services for anonymized data to populate the repository through a structured ingestion procedure. We used NLP over the diagnostic reports to generate annotations, which were then used to train ML algorithms for content-based similar exam recovery. OUTCOMES We successfully implemented ALPACS and PROXIMITY 2.0, ingesting almost 19,000 thorax CT exams to date along with their corresponding reports.
Collapse
Affiliation(s)
- Mauricio Solar
- Departamento de Informática, Universidad Tecnica Federico Santa Maria, Campus Vitacura-Santiago, Vitacura 7660251, Chile
| | - Victor Castañeda
- DETEM, Faculty of Medicine, Universidad de Chile, Independencia-Santiago, Santiago 8380453, Chile;
| | - Ricardo Ñanculef
- Departamento de Informática, Universidad Tecnica Federico Santa Maria, Campus San Joaquin-Santiago, Santiago 8940897, Chile; (R.Ñ.); (L.D.)
| | - Lioubov Dombrovskaia
- Departamento de Informática, Universidad Tecnica Federico Santa Maria, Campus San Joaquin-Santiago, Santiago 8940897, Chile; (R.Ñ.); (L.D.)
| | - Mauricio Araya
- Departamento de Informática, Universidad Tecnica Federico Santa Maria, Campus Casa Central-Valparaíso, Valparaíso 2390123, Chile;
| |
Collapse
|
2
|
Thomas M, Mackes N, Preuss-Dodhy A, Wieland T, Bundschus M. Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e54332. [PMID: 38935957 PMCID: PMC11165293 DOI: 10.2196/54332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/26/2024] [Accepted: 03/29/2024] [Indexed: 06/29/2024]
Abstract
BACKGROUND Genetic data are widely considered inherently identifiable. However, genetic data sets come in many shapes and sizes, and the feasibility of privacy attacks depends on their specific content. Assessing the reidentification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation. OBJECTIVE This study aims to gain a comprehensive understanding of the privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic data sets. METHODS We conducted a 2-step search, in which we first identified 21 reviews published between 2017 and 2023 on the topic of genomic privacy and then analyzed all references cited in the reviews (n=1645) to identify 42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks as well as the effort and resources needed for their implementation and their probability of success. RESULTS From our literature review, we derived 9 nonmutually exclusive features of genetic data that are both inherent to any genetic data set and informative about privacy risk: biological modality, experimental assay, data format or level of processing, germline versus somatic variation content, content of single nucleotide polymorphisms, short tandem repeats, aggregated sample measures, structural variants, and rare single nucleotide variants. CONCLUSIONS On the basis of our literature review, the evaluation of these 9 features covers the great majority of privacy-critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk.
Collapse
|
3
|
Gooden A, Thaldar D. Toward an open access genomics database of South Africans: ethical considerations. Front Genet 2023; 14:1166029. [PMID: 37260770 PMCID: PMC10228717 DOI: 10.3389/fgene.2023.1166029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 05/03/2023] [Indexed: 06/02/2023] Open
Abstract
Genomics research holds the potential to improve healthcare. Yet, a very low percentage of the genomic data used in genomics research internationally relates to persons of African origin. Establishing a large-scale, open access genomics database of South Africans may contribute to solving this problem. However, this raises various ethics concerns, including privacy expectations and informed consent. The concept of open consent offers a potential solution to these concerns by (a) being explicit about the research participant's data being in the public domain and the associated privacy risks, and (b) setting a higher-than-usual benchmark for informed consent by making use of the objective assessment of prospective research participants' understanding. Furthermore, in the South African context-where local culture is infused with Ubuntu and its relational view of personhood-community engagement is vital for establishing and maintaining an open access genomics database of South Africans. The South African National Health Research Ethics Council is called upon to provide guidelines for genomics researchers-based on open consent and community engagement-on how to plan and implement open access genomics projects.
Collapse
Affiliation(s)
- Amy Gooden
- School of Law, University of KwaZulu-Natal, Durban, South Africa
| | - Donrich Thaldar
- School of Law, University of KwaZulu-Natal, Durban, South Africa
- Petrie-Flom Center for Health Law Policy, Biotechnology and Bioethics, Harvard Law School, Cambridge, MA, United States
| |
Collapse
|
4
|
Akyüz K, Goisauf M, Chassang G, Kozera Ł, Mežinska S, Tzortzatou-Nanopoulou O, Mayrhofer MT. Post-identifiability in changing sociotechnological genomic data environments. BIOSOCIETIES 2023:1-28. [PMID: 37359141 PMCID: PMC10042674 DOI: 10.1057/s41292-023-00299-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/13/2023] [Indexed: 03/30/2023]
Abstract
Data practices in biomedical research often rely on standards that build on normative assumptions regarding privacy and involve 'ethics work.' In an increasingly datafied research environment, identifiability gains a new temporal and spatial dimension, especially in regard to genomic data. In this paper, we analyze how genomic identifiability is considered as a specific data issue in a recent controversial case: publication of the genome sequence of the HeLa cell line. Considering developments in the sociotechnological and data environment, such as big data, biomedical, recreational, and research uses of genomics, our analysis highlights what it means to be (re-)identifiable in the postgenomic era. By showing how the risk of genomic identifiability is not a specificity of the HeLa controversy, but rather a systematic data issue, we argue that a new conceptualization is needed. With the notion of post-identifiability as a sociotechnological situation, we show how past assumptions and ideas about future possibilities come together in the case of genomic identifiability. We conclude by discussing how kinship, temporality, and openness are subject to renewed negotiations along with the changing understandings and expectations of identifiability and status of genomic data.
Collapse
Affiliation(s)
- Kaya Akyüz
- Department of Science and Technology Studies, University of Vienna, Universitätsstraße 7/Stiege II/6, Stock (NIG), 1010 Vienna, Austria
- BBMRI-ERIC, Graz, Austria
| | - Melanie Goisauf
- Department of Science and Technology Studies, University of Vienna, Universitätsstraße 7/Stiege II/6, Stock (NIG), 1010 Vienna, Austria
- BBMRI-ERIC, Graz, Austria
| | - Gauthier Chassang
- CERPOP, Université de Toulouse, Inserm, Université Paul Sabatier, Toulouse, France
- Plateforme GenoToul Societal “Ethique et Biosciences”, Toulouse, France
| | | | - Signe Mežinska
- Institute of Clinical and Preventive Medicine, University of Latvia, Riga, Latvia
- BBMRI.LV, Riga, Latvia
| | | | | |
Collapse
|
5
|
Liang C, Wagstaff J, Aharony N, Schmit V, Manheim D. Managing the Transition to Widespread Metagenomic Monitoring: Policy Considerations for Future Biosurveillance. Health Secur 2023; 21:34-45. [PMID: 36629860 PMCID: PMC9940815 DOI: 10.1089/hs.2022.0029] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The technological possibilities and future public health importance of metagenomic sequencing have received extensive attention, but there has been little discussion about the policy and regulatory issues that need to be addressed if metagenomic sequencing is adopted as a key technology for biosurveillance. In this article, we introduce metagenomic monitoring as a possible path to eventually replacing current infectious disease monitoring models. Many key enablers are technological, whereas others are not. We therefore highlight key policy challenges and implementation questions that need to be addressed for "widespread metagenomic monitoring" to be possible. Policymakers must address pitfalls like fragmentation of the technological base, private capture of benefits, privacy concerns, the usefulness of the system during nonpandemic times, and how the future systems will enable better response. If these challenges are addressed, the technological and public health promise of metagenomic sequencing can be realized.
Collapse
Affiliation(s)
- Chelsea Liang
- Chelsea Liang is an Independent Researcher, University of New South Wales, School of Biotechnology and Biomolecular Sciences, Sydney, Australia
| | - James Wagstaff
- James Wagstaff, PhD, is a Research Fellow, Future of Humanity Institute, University of Oxford, Oxford, UK
| | - Noga Aharony
- Noga Aharony, MS, is a PhD Student, Department of Systems Biology, Columbia University, New York, NY
| | - Virginia Schmit
- Virginia Schmit, PhD, is Director of Research, 1DatSooner, DE, and a Policy Specialist, National Institute of Allergy and Infectious Diseases, Bethesda, MD
| | - David Manheim
- David Manheim, PhD, is Head of Policy and Research, ALTER, Rehovot, Israel; Lead Researcher, 1DaySooner, Claymont, DE,Visiting Researcher, Humanities and Arts Department, Technion – Israel Institute of Technology, Haifa, Israel.,Address correspondence to: David B. Manheim, 8734 First Avenue, Silver Spring, MD 20910
| |
Collapse
|
6
|
Krumm N. Organizational and Technical Security Considerations for Laboratory Cloud Computing. J Appl Lab Med 2023; 8:180-193. [PMID: 36610429 DOI: 10.1093/jalm/jfac118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 10/25/2022] [Indexed: 01/09/2023]
Abstract
BACKGROUND Clinical and anatomical pathology services are increasingly utilizing cloud information technology (IT) solutions to meet growing requirements for storage, computation, and other IT services. Cloud IT solutions are often considered on the promise of low cost of entry, durability and reliability, scalability, and features that are typically out of reach for small- or mid-sized IT organizations. However, use of cloud-based IT infrastructure also brings additional security and privacy risks to organizations, as unfamiliarity, public networks, and complex feature sets contribute to an increased surface area for attacks. CONTENT In this best-practices guide, we aim to help both managers and IT professionals in healthcare environments understand the requirements and risks when using cloud-based IT infrastructure within the laboratory environment. We will describe how technical, operational, and organizational best practices that can help mitigate security, privacy, and other risks associated with the use of could infrastructure; furthermore, we identify how these best practices fit into healthcare regulatory frameworks.Among organizational best practices, we identify the need for specific hiring requirements, relationships with parent IT groups, mechanisms for reviewing and auditing security practices, and sound practices for onboarding and offboarding employees. Then, we highlight selected specific operational security, account security, and auditing/logging best practices. Finally, we describe how individual cloud technologies have specific resource-level security features. SUMMARY We emphasize that laboratory directors, managers, and IT professionals must ensure that the fundamental organizational and process-based requirements are addressed first, to establish the groundwork for technical security solutions and successful implementation of cloud infrastructure.
Collapse
Affiliation(s)
- Niklas Krumm
- Division of Informatics, Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA
| |
Collapse
|
7
|
Rahimzadeh V, Peng G, Cho M. A mixed-methods protocol to develop and validate a stewardship maturity matrix for human genomic data in the cloud. Front Genet 2022; 13:876869. [DOI: 10.3389/fgene.2022.876869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 09/28/2022] [Indexed: 11/13/2022] Open
Abstract
This article describes a mixed-methods protocol to develop and test the implementation of a stewardship maturity matrix (SMM) for repositories which govern access to human genomic data in the cloud. It is anticipated that the cloud will host most human genomic and related health datasets generated as part of publicly funded research in the coming years. However, repository managers lack practical tools for identifying what stewardship outcomes matter most to key stakeholders as well as how to track progress on their stewardship goals over time. In this article we describe a protocol that combines Delphi survey methods with SMM modeling first introduced in the earth and planetary sciences to develop a stewardship impact assessment tool for repositories that manage access to human genomic data. We discuss the strengths and limitations of this mixed-methods design and offer points to consider for wrangling both quantitative and qualitative data to enhance rigor and representativeness. We conclude with how the empirical methods bridged in this protocol have potential to improve evaluation of data stewardship systems and better align them with diverse stakeholder values in genomic data science.
Collapse
|
8
|
Kadri S, Sboner A, Sigaras A, Roy S. Containers in Bioinformatics: Applications, Practical Considerations, and Best Practices in Molecular Pathology. J Mol Diagn 2022; 24:442-454. [PMID: 35189355 DOI: 10.1016/j.jmoldx.2022.01.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 11/15/2021] [Accepted: 01/21/2022] [Indexed: 12/19/2022] Open
Abstract
Systematic implementation of bioinformatics resources for next generation sequencing (NGS)-based clinical testing is an arduous undertaking. One of the key challenges involves developing an ecosystem of information technology infrastructure for enabling scalable and reproducible bioinformatics services that is resilient and secure for handling genetic and protected health information, often embedded in an existing non-bioinformatics-oriented infrastructure. Container technology provides an ideal and infrastructure-agnostic solution for molecular laboratories developing and using bioinformatics pipelines, whether on-premise or using the cloud. A container is a technology that provides a consistent computational environment and enables reproducibility, scalability, and security when developing NGS bioinformatics analysis pipelines. Containers can increase the bioinformatics team's productivity by automating and simplifying the maintenance of complex bioinformatics resources, as well as facilitate validation, version control, and documentation necessary for clinical laboratory regulatory compliance. Although there is increasing popularity in adopting containers for developing NGS bioinformatics pipelines, there is wide variability and inconsistency in the usage of containers that may result in suboptimal performance and potentially compromise the security and privacy of protected health information. In this article, the authors highlight the current state and provide best or recommended practices for building, using containers in NGS bioinformatics solutions in a clinical setting with focus on scalability, optimization, maintainability, and data security.
Collapse
Affiliation(s)
- Sabah Kadri
- Department of Bioinformatics, Ann & Robert H Lurie Children's Hospital, Chicago, Illinois
| | - Andrea Sboner
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York; Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, New York; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York
| | - Alexandros Sigaras
- Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, New York; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York
| | - Somak Roy
- Department of Molecular Pathology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio.
| |
Collapse
|
9
|
Muenzen KD, Amendola LM, Kauffman TL, Mittendorf KF, Bensen JT, Chen F, Green R, Powell BC, Kvale M, Angelo F, Farnan L, Fullerton SM, Robinson JO, Li T, Murali P, Lawlor JM, Ou J, Hindorff LA, Jarvik GP, Crosslin DR. Lessons learned and recommendations for data coordination in collaborative research: The CSER consortium experience. HGG ADVANCES 2022; 3:100120. [PMID: 35707062 PMCID: PMC9190054 DOI: 10.1016/j.xhgg.2022.100120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 05/16/2022] [Indexed: 11/18/2022] Open
Abstract
Integrating data across heterogeneous research environments is a key challenge in multi-site, collaborative research projects. While it is important to allow for natural variation in data collection protocols across research sites, it is also important to achieve interoperability between datasets in order to reap the full benefits of collaborative work. However, there are few standards to guide the data coordination process from project conception to completion. In this paper, we describe the experiences of the Clinical Sequence Evidence-Generating Research (CSER) consortium Data Coordinating Center (DCC), which coordinated harmonized survey and genomic sequencing data from seven clinical research sites from 2020 to 2022. Using input from multiple consortium working groups and from CSER leadership, we first identify 14 lessons learned from CSER in the categories of communication, harmonization, informatics, compliance, and analytics. We then distill these lessons learned into 11 recommendations for future research consortia in the areas of planning, communication, informatics, and analytics. We recommend that planning and budgeting for data coordination activities occur as early as possible during consortium conceptualization and development to minimize downstream complications. We also find that clear, reciprocal, and continuous communication between consortium stakeholders and the DCC is equally important to maintaining a secure and centralized informatics ecosystem for pooling data. Finally, we discuss the importance of actively interrogating current approaches to data governance, particularly for research studies that straddle the research-clinical divide.
Collapse
|
10
|
Alsaffar MM, Hasan M, McStay GP, Sedky M. Digital DNA lifecycle security and privacy: an overview. Brief Bioinform 2022; 23:6518049. [PMID: 35106557 DOI: 10.1093/bib/bbab607] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 12/29/2021] [Accepted: 12/30/2021] [Indexed: 11/14/2022] Open
Abstract
DNA sequencing technologies have advanced significantly in the last few years leading to advancements in biomedical research which has improved personalised medicine and the discovery of new treatments for diseases. Sequencing technology advancement has also reduced the cost of DNA sequencing, which has led to the rise of direct-to-consumer (DTC) sequencing, e.g. 23andme.com, ancestry.co.uk, etc. In the meantime, concerns have emerged over privacy and security in collecting, handling, analysing and sharing DNA and genomic data. DNA data are unique and can be used to identify individuals. Moreover, those data provide information on people's current disease status and disposition, e.g. mental health or susceptibility for developing cancer. DNA privacy violation does not only affect the owner but also affects their close consanguinity due to its hereditary nature. This article introduces and defines the term 'digital DNA life cycle' and presents an overview of privacy and security threats and their mitigation techniques for predigital DNA and throughout the digital DNA life cycle. It covers DNA sequencing hardware, software and DNA sequence pipeline in addition to common privacy attacks and their countermeasures when DNA digital data are stored, queried or shared. Likewise, the article examines DTC genomic sequencing privacy and security.
Collapse
Affiliation(s)
- Muhalb M Alsaffar
- Department of Computing, AI and Robotics, School of Digital, Technologies and Arts, Staffordshire University, College Road, ST4 2DE, Staffordshire, United Kingdom
| | | | - Gavin P McStay
- Department of Biological Sciences, School of Health, Science and Wellbeing, Staffordshire University, College Road, Stoke-on-Trent, Staffordshire, ST4 2DE, United Kingdom
| | - Mohamed Sedky
- Department of Computing, AI and Robotics, School of Digital, Technologies and Arts, Staffordshire University, College Road, ST4 2DE, Staffordshire, United Kingdom
| |
Collapse
|
11
|
Hekel R, Budis J, Kucharik M, Radvanszky J, Pös Z, Szemes T. Privacy-preserving storage of sequenced genomic data. BMC Genomics 2021; 22:712. [PMID: 34600465 PMCID: PMC8487550 DOI: 10.1186/s12864-021-07996-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 09/10/2021] [Indexed: 11/23/2022] Open
Abstract
Background The current and future applications of genomic data may raise ethical and privacy concerns. Processing and storing of this data introduce a risk of abuse by potential offenders since the human genome contains sensitive personal information. For this reason, we have developed a privacy-preserving method, named Varlock providing secure storage of sequenced genomic data. We used a public set of population allele frequencies to mask the personal alleles detected in genomic reads. Each personal allele described by the public set is masked by a randomly selected population allele with respect to its frequency. Masked alleles are preserved in an encrypted confidential file that can be shared in whole or in part using public-key cryptography. Results Our method masked the personal variants and introduced new variants detected in a personal masked genome. Alternative alleles with lower population frequency were masked and introduced more often. We performed a joint PCA analysis of personal and masked VCFs, showing that the VCFs between the two groups cannot be trivially mapped. Moreover, the method is reversible and personal alleles in specific genomic regions can be unmasked on demand. Conclusion Our method masks personal alleles within genomic reads while preserving valuable non-sensitive properties of sequenced DNA fragments for further research. Personal alleles in the desired genomic regions may be restored and shared with patients, clinics, and researchers. We suggest that the method can provide an additional security layer for storing and sharing of the raw aligned reads. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07996-2.
Collapse
Affiliation(s)
- Rastislav Hekel
- Geneton s.r.o, Bratislava, Slovakia. .,Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia. .,Slovak Centre of Scientific and Technical Information, Bratislava, Slovakia. .,Comenius University Science Park, Bratislava, Slovakia.
| | - Jaroslav Budis
- Geneton s.r.o, Bratislava, Slovakia.,Slovak Centre of Scientific and Technical Information, Bratislava, Slovakia.,Comenius University Science Park, Bratislava, Slovakia
| | - Marcel Kucharik
- Geneton s.r.o, Bratislava, Slovakia.,Comenius University Science Park, Bratislava, Slovakia
| | - Jan Radvanszky
- Geneton s.r.o, Bratislava, Slovakia.,Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia.,Comenius University Science Park, Bratislava, Slovakia.,Biomedical Research Centre, Institute of Clinical and Translational Research, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Zuzana Pös
- Geneton s.r.o, Bratislava, Slovakia.,Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia.,Comenius University Science Park, Bratislava, Slovakia.,Biomedical Research Centre, Institute of Clinical and Translational Research, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Tomas Szemes
- Geneton s.r.o, Bratislava, Slovakia.,Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia.,Comenius University Science Park, Bratislava, Slovakia
| |
Collapse
|
12
|
Öksüz AÇ, Ayday E, Güdükbay U. Privacy-preserving and robust watermarking on sequential genome data using belief propagation and local differential privacy. Bioinformatics 2021; 37:2668-2674. [PMID: 33630065 PMCID: PMC11025661 DOI: 10.1093/bioinformatics/btab128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 02/09/2021] [Accepted: 02/23/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Genome data is a subject of study for both biology and computer science since the start of the Human Genome Project in 1990. Since then, genome sequencing for medical and social purposes becomes more and more available and affordable. Genome data can be shared on public websites or with service providers (SPs). However, this sharing compromises the privacy of donors even under partial sharing conditions. We mainly focus on the liability aspect ensued by the unauthorized sharing of these genome data. One of the techniques to address the liability issues in data sharing is the watermarking mechanism. RESULTS To detect malicious correspondents and SPs-whose aim is to share genome data without individuals' consent and undetected-, we propose a novel watermarking method on sequential genome data using belief propagation algorithm. In our method, we have two criteria to satisfy. (i) Embedding robust watermarks so that the malicious adversaries cannot temper the watermark by modification and are identified with high probability. (ii) Achieving ϵ-local differential privacy in all data sharings with SPs. For the preservation of system robustness against single SP and collusion attacks, we consider publicly available genomic information like Minor Allele Frequency, Linkage Disequilibrium, Phenotype Information and Familial Information. Our proposed scheme achieves 100% detection rate against the single SP attacks with only 3% watermark length. For the worst case scenario of collusion attacks (50% of SPs are malicious), 80% detection is achieved with 5% watermark length and 90% detection is achieved with 10% watermark length. For all cases, the impact of ϵ on precision remained negligible and high privacy is ensured. AVAILABILITY AND IMPLEMENTATION https://github.com/acoksuz/PPRW\_SGD\_BPLDP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Erman Ayday
- Department of Computer Engineering, Bilkent University, Ankara, Turkey
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA
| | - Uğur Güdükbay
- Department of Computer Engineering, Bilkent University, Ankara, Turkey
| |
Collapse
|
13
|
Rosenbaum JN, Berry AB, Church AJ, Crooks K, Gagan JR, López-Terrada D, Pfeifer JD, Rennert H, Schrijver I, Snow AN, Wu D, Ewalt MD. A Curriculum for Genomic Education of Molecular Genetic Pathology Fellows: A Report of the Association for Molecular Pathology Training and Education Committee. J Mol Diagn 2021; 23:1218-1240. [PMID: 34245921 DOI: 10.1016/j.jmoldx.2021.07.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 06/16/2021] [Accepted: 07/01/2021] [Indexed: 12/19/2022] Open
Abstract
Molecular genetic pathology (MGP) is a subspecialty of pathology and medical genetics and genomics. Genomic testing, which we define as that which generates large data sets and interrogates large segments of the genome in a single assay, is increasingly recognized as essential for optimal patient care through precision medicine. The most common genomic testing technologies in clinical laboratories are next-generation sequencing and microarray. It is essential to train in these methods and to consider the data generated in the context of the diagnosis, medical history, and other clinical findings of individual patients. Accordingly, updating the MGP fellowship curriculum to include genomics is timely, important, and challenging. At the completion of training, an MGP fellow should be capable of independently interpreting and signing out results of a wide range of genomic assays and, given the appropriate context and institutional support, of developing and validating new assays in compliance with applicable regulations. The Genomics Task Force of the MGP Program Directors, a working group of the Association for Molecular Pathology Training and Education Committee, has developed a genomics curriculum framework and recommendations specific to the MGP fellowship. These recommendations are presented for consideration and implementation by MGP fellowship programs with the understanding that MGP programs exist in a diversity of clinical practice environments with a spectrum of available resources.
Collapse
Affiliation(s)
- Jason N Rosenbaum
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Anna B Berry
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Swedish Cancer Institute and Institute of Systems Biology, Seattle, Washington
| | - Alanna J Church
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Boston Children's Hospital, Boston, Massachusetts
| | - Kristy Crooks
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, University of Colorado Anschutz Medical Campus, Aurora, Colorado
| | - Jeffrey R Gagan
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Dolores López-Terrada
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Baylor College of Medicine, Houston, Texas
| | - John D Pfeifer
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Washington University School of Medicine, St. Louis, Missouri
| | - Hanna Rennert
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York
| | - Iris Schrijver
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Stanford University School of Medicine, Stanford, California
| | - Anthony N Snow
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, University of Iowa Hospitals and Clinics, Iowa City, Iowa
| | - David Wu
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Laboratory Medicine and Pathology, University of Washington, Seattle, Washington
| | - Mark D Ewalt
- Molecular Genetic Pathology Fellow Training in Genomics Task Force of the Training and Education Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York.
| |
Collapse
|
14
|
B A, S S. A survey on genomic data by privacy-preserving techniques perspective. Comput Biol Chem 2021; 93:107538. [PMID: 34246892 DOI: 10.1016/j.compbiolchem.2021.107538] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 06/15/2021] [Accepted: 06/26/2021] [Indexed: 11/27/2022]
Abstract
Nowadays, the purpose of human genomics is widely emerging in health-related problems and also to achieve time and cost-efficient healthcare. Due to advancement in genomics and its research, development in privacy concerns is needed regarding querying, accessing and, storage and computation of the genomic data. While the genomic data is widely accessible, the privacy issues may emerge due to the untrusted third party (adversaries/researchers), they may reveal the information or strategy plans regarding the genome data of an individual when it is requested for research purposes. To mitigate this problem many privacy-preserving techniques are used along with cryptographic methods are briefly discussed. Furthermore, efficiency and accuracy in a secure and private genomic data computation are needed to be researched in future.
Collapse
Affiliation(s)
- Abinaya B
- Kalaignarkarunanidhi Institute of Technology, Coimbatore, India.
| | - Santhi S
- Kalaignarkarunanidhi Institute of Technology, Coimbatore, India.
| |
Collapse
|
15
|
Affiliation(s)
- Lee Swales
- School of Law, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
16
|
iResponse: An AI and IoT-Enabled Framework for Autonomous COVID-19 Pandemic Management. SUSTAINABILITY 2021. [DOI: 10.3390/su13073797] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
SARS-CoV-2, a tiny virus, is severely affecting the social, economic, and environmental sustainability of our planet, causing infections and deaths (2,674,151 deaths, as of 17 March 2021), relationship breakdowns, depression, economic downturn, riots, and much more. The lessons that have been learned from good practices by various countries include containing the virus rapidly; enforcing containment measures; growing COVID-19 testing capability; discovering cures; providing stimulus packages to the affected; easing monetary policies; developing new pandemic-related industries; support plans for controlling unemployment; and overcoming inequalities. Coordination and multi-term planning have been found to be the key among the successful national and global endeavors to fight the pandemic. The current research and practice have mainly focused on specific aspects of COVID-19 response. There is a need to automate the learning process such that we can learn from good and bad practices during pandemics and normal times. To this end, this paper proposes a technology-driven framework, iResponse, for coordinated and autonomous pandemic management, allowing pandemic-related monitoring and policy enforcement, resource planning and provisioning, and data-driven planning and decision-making. The framework consists of five modules: Monitoring and Break-the-Chain, Cure Development and Treatment, Resource Planner, Data Analytics and Decision Making, and Data Storage and Management. All modules collaborate dynamically to make coordinated and informed decisions. We provide the technical system architecture of a system based on the proposed iResponse framework along with the design details of each of its five components. The challenges related to the design of the individual modules and the whole system are discussed. We provide six case studies in the paper to elaborate on the different functionalities of the iResponse framework and how the framework can be implemented. These include a sentiment analysis case study, a case study on the recognition of human activities, and four case studies using deep learning and other data-driven methods to show how to develop sustainability-related optimal strategies for pandemic management using seven real-world datasets. A number of important findings are extracted from these case studies.
Collapse
|
17
|
de Vries JJC, Brown JR, Couto N, Beer M, Le Mercier P, Sidorov I, Papa A, Fischer N, Oude Munnink BB, Rodriquez C, Zaheri M, Sayiner A, Hönemann M, Cataluna AP, Carbo EC, Bachofen C, Kubacki J, Schmitz D, Tsioka K, Matamoros S, Höper D, Hernandez M, Puchhammer-Stöckl E, Lebrand A, Huber M, Simmonds P, Claas ECJ, López-Labrador FX. Recommendations for the introduction of metagenomic next-generation sequencing in clinical virology, part II: bioinformatic analysis and reporting. J Clin Virol 2021; 138:104812. [PMID: 33819811 DOI: 10.1016/j.jcv.2021.104812] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 03/20/2021] [Indexed: 12/11/2022]
Abstract
Metagenomic next-generation sequencing (mNGS) is an untargeted technique for determination of microbial DNA/RNA sequences in a variety of sample types from patients with infectious syndromes. mNGS is still in its early stages of broader translation into clinical applications. To further support the development, implementation, optimization and standardization of mNGS procedures for virus diagnostics, the European Society for Clinical Virology (ESCV) Network on Next-Generation Sequencing (ENNGS) has been established. The aim of ENNGS is to bring together professionals involved in mNGS for viral diagnostics to share methodologies and experiences, and to develop application guidelines. Following the ENNGS publication Recommendations for the introduction of mNGS in clinical virology, part I: wet lab procedure in this journal, the current manuscript aims to provide practical recommendations for the bioinformatic analysis of mNGS data and reporting of results to clinicians.
Collapse
Affiliation(s)
- Jutte J C de Vries
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | - Julianne R Brown
- Microbiology, Virology and Infection Prevention & Control, Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom.
| | - Natacha Couto
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom.
| | - Martin Beer
- Friedrich-Loeffler-Institute, Institute of Diagnostic Virology, Greifswald, Germany.
| | | | - Igor Sidorov
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | - Anna Papa
- Department of Microbiology, Medical School, Aristotle University of Thessaloniki, Greece.
| | - Nicole Fischer
- University Medical Center Hamburg-Eppendorf, UKE Institute for Medical Microbiology, Virology and Hygiene, Germany.
| | | | - Christophe Rodriquez
- Department of Virology, University hospital Henri Mondor, Assistance Public des Hopitaux de Paris, Créteil, France.
| | - Maryam Zaheri
- Institute of Medical Virology, University of Zurich, Switzerland.
| | - Arzu Sayiner
- Dokuz Eylul University, Medical Faculty, Department of Medical Microbiology, Izmir, Turkey.
| | - Mario Hönemann
- Institute of Virology, Leipzig University, Leipzig, Germany.
| | - Alba Perez Cataluna
- Department of Preservation and Food Safety Technologies, IATA-CSIC, Paterna, Valencia, Spain.
| | - Ellen C Carbo
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | | | - Jakub Kubacki
- Institute of Virology, University of Zurich, Switzerland.
| | - Dennis Schmitz
- RIVM National Institute for Public Health and Environment, Bilthoven, the Netherlands.
| | - Katerina Tsioka
- Department of Microbiology, Medical School, Aristotle University of Thessaloniki, Greece.
| | - Sébastien Matamoros
- Medical Microbiology and Infection Control, Amsterdam UMC, Amsterdam, the Netherlands.
| | - Dirk Höper
- Friedrich-Loeffler-Institute, Institute of Diagnostic Virology, Greifswald, Germany.
| | - Marta Hernandez
- Laboratory of Molecular Biology and Microbiology, Instituto Tecnologico Agrario de Castilla y Leon, Valladolid, Spain.
| | | | | | - Michael Huber
- Institute of Medical Virology, University of Zurich, Switzerland.
| | - Peter Simmonds
- Nuffield Department of Medicine, University of Oxford, Oxford, UK.
| | - Eric C J Claas
- Clinical Microbiological Laboratory, department of Medical Microbiology, Leiden University Medical Center, Leiden, the Netherlands.
| | - F Xavier López-Labrador
- Virology Laboratory, Genomics and Health Area, Centre for Public Health Research (FISABIO-Public Health), Valencia, Spain; Department of Microbiology, Medical School, University of Valencia, Spain; CIBERESP, Instituto de Salud Carlos III, Madrid, Spain.
| | | |
Collapse
|
18
|
Karimi S, Jiang X, Dolin RH, Kim M, Boxwala A. A secure system for genomics clinical decision support. J Biomed Inform 2020; 112:103602. [PMID: 33080397 PMCID: PMC8577277 DOI: 10.1016/j.jbi.2020.103602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2020] [Revised: 09/07/2020] [Accepted: 10/12/2020] [Indexed: 11/26/2022]
Abstract
We developed a prototype genomic archiving and communications system to securely store genome data and provide clinical decision support (CDS). This system operates on a client-server model. The client encrypts the data, and the server stores data and performs the computations necessary for CDS. Computations are directly performed on encrypted data, and the client decrypts results. The server cannot decrypt inputs or outputs, which provides strong guarantees of security. We have validated our system with three genomics-based CDS applications. The results demonstrate that it is possible to resolve a long-standing dilemma in genomic data privacy and accessibility, by using a principled cryptographical framework and a mathematical representation of genome data and CDS questions.
Collapse
Affiliation(s)
| | - Xiaoqian Jiang
- UT Health School of Biomedical Informatics, Houston, TX, United States
| | | | - Miran Kim
- UT Health School of Biomedical Informatics, Houston, TX, United States
| | - Aziz Boxwala
- Elimu Informatics Inc., Richmond, CA, United States
| |
Collapse
|
19
|
Kuo TT, Jiang X, Tang H, Wang X, Bath T, Bu D, Wang L, Harmanci A, Zhang S, Zhi D, Sofia HJ, Ohno-Machado L. iDASH secure genome analysis competition 2018: blockchain genomic data access logging, homomorphic encryption on GWAS, and DNA segment searching. BMC Med Genomics 2020; 13:98. [PMID: 32693816 PMCID: PMC7372776 DOI: 10.1186/s12920-020-0715-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, 47408, USA
| | - XiaoFeng Wang
- School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, 47408, USA
| | - Tyler Bath
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Diyue Bu
- School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, 47408, USA
| | - Lei Wang
- School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, 47408, USA
| | - Arif Harmanci
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Southern Florida, Orlando, FL, 32816, USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Heidi J Sofia
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA.
- Division of Health Services Research & Development, VA San Diego Healthcare System, San Diego, CA, 92161, USA.
| |
Collapse
|
20
|
Krumm N, Hoffman N. Practical estimation of cloud storage costs for clinical genomic data. Pract Lab Med 2020; 21:e00168. [PMID: 32529017 PMCID: PMC7276491 DOI: 10.1016/j.plabm.2020.e00168] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 04/29/2020] [Accepted: 05/06/2020] [Indexed: 01/29/2023] Open
Abstract
Background Laboratories performing clinical high-throughput sequencing for oncology and germline testing are increasingly migrating their data storage to cloud-based solutions. Cloud-based storage has several advantages, such as low per-GB prices, scalability, and minimal fixed costs; however, while these solutions tout ostensibly simple usage-based pricing plans, practical cost analysis of cloud storage for NGS data storage is not straightforward. Methods We developed an easy-to-use tool designed specifically for cost and usage estimation for laboratories performing clinical NGS testing (https://ngscosts.info). Our tool enables quick exploration of dozens of storage options across three major cloud providers, and provides complex cost and usage forecasts over 1–20 year timeframes. Parameters include current test volumes, growth rate, data compression, data retention policies, and case re-access rates. Outputs include an easy-to-visualize chart of total data stored, yearly and lifetime costs, and a “cost per test” estimate. Results Two factors were found to markedly decrease the average cost per test: 1) reducing total file size, including through the use of compression, 2) rapid transfer to “cold” or archival storage. In contrast, re-access of data from archival storage tiers was not found to dramatically increase the cost of storage per test. Conclusions Steady declines in cloud storage pricing, as well as new options for storage and retrieval, make storing clinical NGS data on the cloud economical and friendly to laboratory workflows. Our web-based tool makes it possible to explore and compare cloud storage solutions and provide forecasts specifically for clinical NGS laboratories.
Collapse
Affiliation(s)
- Niklas Krumm
- Department of Laboratory Medicine, University of Washington, Seattle, WA, USA
| | - Noah Hoffman
- Department of Laboratory Medicine, University of Washington, Seattle, WA, USA
| |
Collapse
|
21
|
Carter AB, Zehnbauer B. Expanding the Scope of The Journal of Molecular Diagnostics to the Informatics Subdivision of the Association for Molecular Pathology. J Mol Diagn 2020; 21:539-541. [PMID: 31230765 DOI: 10.1016/j.jmoldx.2019.04.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Accepted: 04/19/2019] [Indexed: 02/03/2023] Open
Abstract
This editorial describes the expanded scope of The Journal of Molecular Diagnostics, to include informatics-based articles.
Collapse
Affiliation(s)
| | - Barbara Zehnbauer
- Department of Pathology, Emory School of Medicine, Atlanta, Georgia (Editor-in-Chief).
| |
Collapse
|
22
|
Rauluseviciute I, Drabløs F, Rye MB. DNA methylation data by sequencing: experimental approaches and recommendations for tools and pipelines for data analysis. Clin Epigenetics 2019; 11:193. [PMID: 31831061 PMCID: PMC6909609 DOI: 10.1186/s13148-019-0795-x] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2019] [Accepted: 12/04/2019] [Indexed: 02/06/2023] Open
Abstract
Sequencing technologies have changed not only our approaches to classical genetics, but also the field of epigenetics. Specific methods allow scientists to identify novel genome-wide epigenetic patterns of DNA methylation down to single-nucleotide resolution. DNA methylation is the most researched epigenetic mark involved in various processes in the human cell, including gene regulation and development of diseases, such as cancer. Increasing numbers of DNA methylation sequencing datasets from human genome are produced using various platforms-from methylated DNA precipitation to the whole genome bisulfite sequencing. Many of those datasets are fully accessible for repeated analyses. Sequencing experiments have become routine in laboratories around the world, while analysis of outcoming data is still a challenge among the majority of scientists, since in many cases it requires advanced computational skills. Even though various tools are being created and published, guidelines for their selection are often not clear, especially to non-bioinformaticians with limited experience in computational analyses. Separate tools are often used for individual steps in the analysis, and these can be challenging to manage and integrate. However, in some instances, tools are combined into pipelines that are capable to complete all the essential steps to achieve the result. In the case of DNA methylation sequencing analysis, the goal of such pipeline is to map sequencing reads, calculate methylation levels, and distinguish differentially methylated positions and/or regions. The objective of this review is to describe basic principles and steps in the analysis of DNA methylation sequencing data that in particular have been used for mammalian genomes, and more importantly to present and discuss the most pronounced computational pipelines that can be used to analyze such data. We aim to provide a good starting point for scientists with limited experience in computational analyses of DNA methylation and hydroxymethylation data, and recommend a few tools that are powerful, but still easy enough to use for their own data analysis.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, P.O. Box 8905, NO-7491, Trondheim, Norway.
| | - Finn Drabløs
- Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, P.O. Box 8905, NO-7491, Trondheim, Norway
| | - Morten Beck Rye
- Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, P.O. Box 8905, NO-7491, Trondheim, Norway.,Clinic of Surgery, St. Olavs Hospital, Trondheim University Hospital, NO-7030, Trondheim, Norway
| |
Collapse
|
23
|
Gullapalli RR. Evaluation of Commercial Next-Generation Sequencing Bioinformatics Software Solutions. J Mol Diagn 2019; 22:147-158. [PMID: 31751676 DOI: 10.1016/j.jmoldx.2019.09.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 09/03/2019] [Accepted: 09/23/2019] [Indexed: 12/14/2022] Open
Abstract
Next-generation sequencing (NGS) diagnostics continue to expand rapidly in clinical medicine. An ever-expanding menu of molecular biomarkers is deemed important for diagnostic, prognostic, and therapeutic assessment in patients. The increasing role of NGS in the clinic is driven mainly by the falling costs of sequencing. However, the data-intensive nature of NGS makes bioinformatic analysis a major challenge to many clinical laboratories. Critically needed NGS bioinformatics personnel are hard to recruit and retain in small- to mid-size clinical laboratories. Also, NGS software often lacks the scalability necessary for expanded clinical laboratory testing volumes. Commercial software solutions aim to bridge the bioinformatics barrier via turnkey informatics solutions tailored specifically for the clinical workplace. Yet, there has been no systematic assessment of these software solutions thus far. This article presents an end-to-end vendor evaluation experience of commercial NGS bioinformatics solutions. Six different commercial vendor solutions were assessed systematically. Key metrics of NGS software evaluation to aid in the robust assessment of software solutions are described. Comprehensive feedback, provided by the TriCore Reference Laboratories molecular pathology team, enabled the final vendor selection. Many key lessons were learned during the software evaluation process, which are described herein. This article aims to provide a detailed road map for small- to mid-size clinical laboratories interested in evaluating commercial bioinformatics solutions available in the marketplace.
Collapse
Affiliation(s)
- Rama R Gullapalli
- Departments of Pathology and Chemical and Biological Engineering, University of New Mexico, Albuquerque, New Mexico.
| |
Collapse
|
24
|
Zehnbauer BA. The Journal of Molecular Diagnostics: 20 Years Defining Professional Practice. J Mol Diagn 2019; 21:938-942. [PMID: 31635797 DOI: 10.1016/j.jmoldx.2019.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 09/09/2019] [Indexed: 01/09/2023] Open
Abstract
This editorial highlights 20 years of JMD defining professional practice.
Collapse
Affiliation(s)
- Barbara A Zehnbauer
- Department of Pathology, Emory University School of Medicine, Atlanta, Georgia (Editor-in-Chief).
| |
Collapse
|