1
|
Berke SR, Kanchan K, Marazita ML, Tobin E, Ruczinski I. Custom Biomedical FAIR Data Analysis in the Cloud Using CAVATICA. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.27.24309340. [PMID: 38978644 PMCID: PMC11230316 DOI: 10.1101/2024.06.27.24309340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
The historically fragmented biomedical data ecosystem has moved towards harmonization under the findable, accessible, interoperable, and reusable (FAIR) data principles, creating more opportunities for cloud-based research. This shift is especially opportune for scientists across diverse domains interested in implementing creative, nonstandard computational analytic pipelines on large and varied datasets. However, executing custom cloud analyses may present difficulties, particularly for investigators lacking advanced computational expertise. Here, we present an accessible, streamlined approach for the cloud compute platform CAVATICA that offers a solution. We outline how we developed a custom workflow in the cloud, for analyzing whole genome sequences of case-parent trios to detect sex-specific genetic effects on orofacial cleft risk, which required several programming languages and custom software packages. The approach involves just three components: Docker to containerize software environments, tool creation for each analysis step, and a visual workflow editor to weave the tools into a Common Workflow Language (CWL) pipeline. Our approach should be accessible to any investigator with basic computational skills, is readily extended to implement any scalable high-throughput biomedical data analysis in the cloud, and is applicable to other commonly used compute platforms such as BioData Catalyst. We believe our approach empowers versatile data reuse and promotes accelerated biomedical discovery in a time of substantial FAIR data.
Collapse
|
2
|
Theriault-Lauzier P, Cobin D, Tastet O, Langlais EL, Taji B, Kang G, Chong AY, So D, Tang A, Gichoya JW, Chandar S, Déziel PL, Hussin JG, Kadoury S, Avram R. A Responsible Framework for Applying Artificial Intelligence on Medical Images and Signals at the Point of Care: The PACS-AI Platform. Can J Cardiol 2024:S0828-282X(24)00427-6. [PMID: 38885787 DOI: 10.1016/j.cjca.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 05/09/2024] [Accepted: 05/26/2024] [Indexed: 06/20/2024] Open
Abstract
The potential of artificial intelligence (AI) in medicine lies in its ability to enhance clinicians' capacity to analyse medical images, thereby improving diagnostic precision and accuracy and thus enhancing current tests. However, the integration of AI within health care is fraught with difficulties. Heterogeneity among health care system applications, reliance on proprietary closed-source software, and rising cybersecurity threats pose significant challenges. Moreover, before their deployment in clinical settings, AI models must demonstrate their effectiveness across a wide range of scenarios and must be validated by prospective studies, but doing so requires testing in an environment mirroring the clinical workflow, which is difficult to achieve without dedicated software. Finally, the use of AI techniques in health care raises significant legal and ethical issues, such as the protection of patient privacy, the prevention of bias, and the monitoring of the device's safety and effectiveness for regulatory compliance. This review describes challenges to AI integration in health care and provides guidelines on how to move forward. We describe an open-source solution that we developed that integrates AI models into the Picture Archives Communication System (PACS), called PACS-AI. This approach aims to increase the evaluation of AI models by facilitating their integration and validation with existing medical imaging databases. PACS-AI may overcome many current barriers to AI deployment and offer a pathway toward responsible, fair, and effective deployment of AI models in health care. In addition, we propose a list of criteria and guidelines that AI researchers should adopt when publishing a medical AI model to enhance standardisation and reproducibility.
Collapse
Affiliation(s)
- Pascal Theriault-Lauzier
- Division of Cardiovascular Medicine, Stanford School of Medicine, Palo Alto, California, USA; Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario, Canada
| | - Denis Cobin
- Montréal Heart Institute, Montréal, Québec, Canada
| | | | | | - Bahareh Taji
- Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario, Canada
| | - Guson Kang
- Division of Cardiovascular Medicine, Stanford School of Medicine, Palo Alto, California, USA
| | - Aun-Yeong Chong
- Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario, Canada
| | - Derek So
- Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario, Canada
| | - An Tang
- Department of Radiology, Radiation Oncology and Nuclear Medicine, Université de Montréal, Montréal, Québec, Canada
| | - Judy Wawira Gichoya
- Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, Georgia, USA
| | | | | | - Julie G Hussin
- Montréal Heart Institute, Montréal, Québec, Canada; Mila-Québec AI Institute, Montréal, Québec, Canada; Faculty of Law, Université Laval, Québec, Québec, Canada
| | - Samuel Kadoury
- Department of Radiology, Radiation Oncology and Nuclear Medicine, Université de Montréal, Montréal, Québec, Canada; Polytechnique Montréal, Montréal, Québec, Canada
| | - Robert Avram
- Montréal Heart Institute, Montréal, Québec, Canada; Department of Medicine, Université de Montréal, Montréal, Québec, Canada.
| |
Collapse
|
3
|
Jentsch M, Schneider-Lunitz V, Taron U, Braun M, Ishaque N, Wagener H, Conrad C, Twardziok S. Creating cloud platforms for supporting FAIR data management in biomedical research projects. F1000Res 2024; 13:8. [PMID: 38779317 PMCID: PMC11109697 DOI: 10.12688/f1000research.140624.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 05/25/2024] Open
Abstract
Biomedical research projects are becoming increasingly complex and require technological solutions that support all phases of the data lifecycle and application of the FAIR principles. At the Berlin Institute of Health (BIH), we have developed and established a flexible and cost-effective approach to building customized cloud platforms for supporting research projects. The approach is based on a microservice architecture and on the management of a portfolio of supported services. On this basis, we created and maintained cloud platforms for several international research projects. In this article, we present our approach and argue that building customized cloud platforms can offer multiple advantages over using multi-project platforms. Our approach is transferable to other research environments and can be easily adapted by other projects and other service providers.
Collapse
Affiliation(s)
- Marcel Jentsch
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Valentin Schneider-Lunitz
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Ulrike Taron
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Martin Braun
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Naveed Ishaque
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Harald Wagener
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Christian Conrad
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| | - Sven Twardziok
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Center of Digital Health, Berlin, 10117, Germany
| |
Collapse
|
4
|
Sachdeva S, Bhatia S, Al Harrasi A, Shah YA, Anwer K, Philip AK, Shah SFA, Khan A, Ahsan Halim S. Unraveling the role of cloud computing in health care system and biomedical sciences. Heliyon 2024; 10:e29044. [PMID: 38601602 PMCID: PMC11004887 DOI: 10.1016/j.heliyon.2024.e29044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 03/24/2024] [Accepted: 03/28/2024] [Indexed: 04/12/2024] Open
Abstract
Cloud computing has emerged as a transformative force in healthcare and biomedical sciences, offering scalable, on-demand resources for managing vast amounts of data. This review explores the integration of cloud computing within these fields, highlighting its pivotal role in enhancing data management, security, and accessibility. We examine the application of cloud computing in various healthcare domains, including electronic medical records, telemedicine, and personalized patient care, as well as its impact on bioinformatics research, particularly in genomics, proteomics, and metabolomics. The review also addresses the challenges and ethical considerations associated with cloud-based healthcare solutions, such as data privacy and cybersecurity. By providing a comprehensive overview, we aim to assist readers in understanding the significance of cloud computing in modern medical applications and its potential to revolutionize both patient care and biomedical research.
Collapse
Affiliation(s)
| | - Saurabh Bhatia
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
- School of Health Science, University of Petroleum and Energy Studies, Prem Nagar, Dehradun, Uttarakhand, 248007, India
| | - Ahmed Al Harrasi
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
| | - Yasir Abbas Shah
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
| | - Khalid Anwer
- Department of Pharmaceutics, College of Pharmacy, Prince Sattam Bin Abdulaziz University, Al-Kharj, 11942, Saudi Arabia
| | - Anil K. Philip
- School of Pharmacy, University of Nizwa, Birkat Al Mouz, Nizwa, 616, Oman
| | - Syed Faisal Abbas Shah
- Faculty of Computer Science & Information Technology, Virtual University of Pakistan, Lahore, 54000, Pakistan
| | - Ajmal Khan
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
| | - Sobia Ahsan Halim
- Natural & Medical Sciences Research Center, University of Nizwa, P.O. Box 33, 616 Birkat Al Mauz, Nizwa, Oman
| |
Collapse
|
5
|
Hicks CB, Martinez TJ. Massively scalable workflows for quantum chemistry: BigChem and ChemCloud. J Chem Phys 2024; 160:142501. [PMID: 38591672 DOI: 10.1063/5.0190834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/14/2024] [Indexed: 04/10/2024] Open
Abstract
Electronic structure theory, i.e., quantum chemistry, is the fundamental building block for many problems in computational chemistry. We present a new distributed computing framework (BigChem), which allows for an efficient solution of many quantum chemistry problems in parallel. BigChem is designed to be easily composable and leverages industry-standard middleware (e.g., Celery, RabbitMQ, and Redis) for distributed approaches to large scale problems. BigChem can harness any collection of worker nodes, including ones on cloud providers (such as AWS or Azure), local clusters, or supercomputer centers (and any mixture of these). BigChem builds upon MolSSI packages, such as QCEngine to standardize the operation of numerous computational chemistry programs, demonstrated here with Psi4, xtb, geomeTRIC, and TeraChem. BigChem delivers full utilization of compute resources at scale, offers a programable canvas for designing sophisticated quantum chemistry workflows, and is fault tolerant to node failures and network disruptions. We demonstrate linear scalability of BigChem running computational chemistry workloads on up to 125 GPUs. Finally, we present ChemCloud, a web API to BigChem and successor to TeraChem Cloud. ChemCloud delivers scalable and secure access to BigChem over the Internet.
Collapse
Affiliation(s)
- Colton B Hicks
- Department of Chemistry and The PULSE Institute, Stanford University, Stanford, California 94305, USA and SLAC National Accelerator Laboratory, Menlo Park, California 94025, USA
| | - Todd J Martinez
- Department of Chemistry and The PULSE Institute, Stanford University, Stanford, California 94305, USA and SLAC National Accelerator Laboratory, Menlo Park, California 94025, USA
| |
Collapse
|
6
|
Simpson RL, Lee JA, Li Y, Kang YJ, Tsui C, Cimiotti JP. Medicare meets the cloud: the development of a secure platform for the storage and analysis of claims data. JAMIA Open 2024; 7:ooae007. [PMID: 38344670 PMCID: PMC10856805 DOI: 10.1093/jamiaopen/ooae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 09/05/2023] [Accepted: 01/13/2024] [Indexed: 02/18/2024] Open
Abstract
Introduction Cloud-based solutions are a modern-day necessity for data intense computing. This case report describes in detail the development and implementation of Amazon Web Services (AWS) at Emory-a secure, reliable, and scalable platform to store and analyze identifiable research data from the Centers for Medicare and Medicaid Services (CMS). Materials and Methods Interdisciplinary teams from CMS, MBL Technologies, and Emory University collaborated to ensure compliance with CMS policy that consolidates laws, regulations, and other drivers of information security and privacy. Results A dedicated team of individuals ensured successful transition from a physical storage server to a cloud-based environment. This included implementing access controls, vulnerability scanning, and audit logs that are reviewed regularly with a remediation plan. User adaptation required specific training to overcome the challenges of cloud computing. Conclusion Challenges created opportunities for lessons learned through the creation of an end-product accepted by CMS and shared across disciplines university-wide.
Collapse
Affiliation(s)
- Roy L Simpson
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA 30322, United States
| | - Joseph A Lee
- Harvard University, Boston, MA 02138, United States
| | - Yin Li
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA 30322, United States
| | - Yu Jin Kang
- Byrdine F. Lewis College of Nursing and Health Professions, Georgia State University, Atlanta, GA 30303, United States
| | - Circe Tsui
- Office of Information Technology, Emory University, Atlanta, GA 30322, United States
| | - Jeannie P Cimiotti
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA 30322, United States
| |
Collapse
|
7
|
Grossman RL, Boyles RR, Davis-Dusenbery BN, Haddock A, Heath AP, O'Connor BD, Resnick AC, Taylor DM, Ahalt S. A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments. Sci Data 2024; 11:241. [PMID: 38409183 PMCID: PMC10897146 DOI: 10.1038/s41597-024-03041-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 02/03/2024] [Indexed: 02/28/2024] Open
Affiliation(s)
- Robert L Grossman
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA.
| | - Rebecca R Boyles
- RTI International, Research Triangle Park, Triangle Park, NC, USA
| | | | | | | | | | - Adam C Resnick
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Deanne M Taylor
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Stan Ahalt
- University of North Carolina, Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
8
|
Reilly JB, Kim JG, Cooney R, DeWaters AL, Holmboe ES, Mazotti L, Gonzalo JD. Breaking Down Silos Between Medical Education and Health Systems: Creating an Integrated Multilevel Data Model to Advance the Systems-Based Practice Competency. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2024; 99:146-152. [PMID: 37289829 DOI: 10.1097/acm.0000000000005294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
ABSTRACT The complexity of improving health in the United States and the rising call for outcomes-based physician training present unique challenges and opportunities for both graduate medical education (GME) and health systems. GME programs have been particularly challenged to implement systems-based practice (SBP) as a core physician competency and educational outcome. Disparate definitions and educational approaches to SBP, as well as limited understanding of the complex interactions between GME trainees, programs, and their health system settings, contribute to current suboptimal educational outcomes elated to SBP. To advance SBP competence at individual, program, and institutional levels, the authors present the rationale for an integrated multilevel systems approach to assess and evaluate SBP, propose a conceptual multilevel data model that integrates health system and educational SBP performance, and explore the opportunities and challenges of using multilevel data to promote an empirically driven approach to residency education. The development, study, and adoption of multilevel analytic approaches to GME are imperative to the successful operationalization of SBP and thereby imperative to GME's social accountability in meeting societal needs for improved health. The authors call for the continued collaboration of national leaders toward producing integrated and multilevel datasets that link health systems and their GME-sponsoring institutions to evolve SBP.
Collapse
|
9
|
Bernier A, Knoppers BM, Bermudez P, Beauvais MJS, Thorogood A, Evans A. Open Data governance at the Canadian Open Neuroscience Platform (CONP): From the Walled Garden to the Arboretum. Gigascience 2024; 13:giad114. [PMID: 38217404 PMCID: PMC10787360 DOI: 10.1093/gigascience/giad114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 11/14/2023] [Accepted: 12/10/2023] [Indexed: 01/15/2024] Open
Abstract
Scientific research communities pursue dual imperatives in implementing strategies to share their data. These communities attempt to maximize the accessibility of biomedical data for downstream research use, in furtherance of open science objectives. Simultaneously, such communities safeguard the interests of research participants through data stewardship measures and the integration of suitable risk disclosures to the informed consent process. The Canadian Open Neuroscience Platform (CONP) convened an Ethics and Governance Committee composed of experts in bioethics, neuroethics, and law to develop holistic policy tools, organizational approaches, and technological supports to align the open governance of data with ethical and legal norms. The CONP has adopted novel platform governance methods that favor full data openness, legitimated through the use of robust deidentification processes and informed consent practices. The experience of the CONP is articulated as a potential template for other open science efforts to further build upon. This experience highlights informed consent guidance, deidentification practices, ethicolegal metadata, platform-level norms, and commercialization and publication policies as the principal pillars of a practicable approach to the governance of open data. The governance approach adopted by the CONP stands as a viable model for the broader neuroscience and open science communities to adopt for sharing data in full open access.
Collapse
Affiliation(s)
- Alexander Bernier
- Centre of Genomics and Policy, Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, 740, Dr Penfield Ave, suite 5200, Montréal, Québec H3A 0G1, Canada
| | - Bartha M Knoppers
- Centre of Genomics and Policy, Department of Human Genetics, Faculty of Medicine and Health Sciences, McGill University, 740, Dr Penfield Ave, suite 5200, Montréal, Québec H3A 0G1, Canada
| | - Patrick Bermudez
- McGill Centre for Integrative Neuroscience, Montreal Neurological Institute, McGill University, Montréal, Québec H3A 2B4, Canada
| | - Michael J S Beauvais
- Faculty of Law, University of Toronto, Falconer Hall, 84 Queens Park, Toronto, Ontario M5S 2C5, Canada
| | - Adrian Thorogood
- The Terry Fox Research Institute, 110 Pine Ave W, Montreal, Quebec H2W IR7, Canada
| | - Alan Evans
- McGill Centre for Integrative Neuroscience, Montreal Neurological Institute, McGill University, Montréal, Québec H3A 2B4, Canada
| |
Collapse
|
10
|
Lim HGM, Fann YC, Lee YCG. COWID: an efficient cloud-based genomics workflow for scalable identification of SARS-COV-2. Brief Bioinform 2023; 24:bbad280. [PMID: 37738400 PMCID: PMC10516370 DOI: 10.1093/bib/bbad280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 07/15/2023] [Accepted: 07/19/2023] [Indexed: 09/24/2023] Open
Abstract
Implementing a specific cloud resource to analyze extensive genomic data on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) poses a challenge when resources are limited. To overcome this, we repurposed a cloud platform initially designed for use in research on cancer genomics (https://cgc.sbgenomics.com) to enable its use in research on SARS-CoV-2 to build Cloud Workflow for Viral and Variant Identification (COWID). COWID is a workflow based on the Common Workflow Language that realizes the full potential of sequencing technology for use in reliable SARS-CoV-2 identification and leverages cloud computing to achieve efficient parallelization. COWID outperformed other contemporary methods for identification by offering scalable identification and reliable variant findings with no false-positive results. COWID typically processed each sample of raw sequencing data within 5 min at a cost of only US$0.01. The COWID source code is publicly available (https://github.com/hendrick0403/COWID) and can be accessed on any computer with Internet access. COWID is designed to be user-friendly; it can be implemented without prior programming knowledge. Therefore, COWID is a time-efficient tool that can be used during a pandemic.
Collapse
Affiliation(s)
- Hendrick Gao-Min Lim
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan 11031
- Department of Medical Research, Tzu Chi Hospital Indonesia, Pantai Indah Kapuk, Greater Jakarta, Indonesia 14470
| | - Yang C Fann
- IT and Bioinformatics Program, Division of Intramural, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA 20892
| | - Yuan-Chii Gladys Lee
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan 11031
| |
Collapse
|
11
|
Papachristou N, Kotronoulas G, Dikaios N, Allison SJ, Eleftherochorinou H, Rai T, Kunz H, Barnaghi P, Miaskowski C, Bamidis PD. Digital Transformation of Cancer Care in the Era of Big Data, Artificial Intelligence and Data-Driven Interventions: Navigating the Field. Semin Oncol Nurs 2023; 39:151433. [PMID: 37137770 DOI: 10.1016/j.soncn.2023.151433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 03/29/2023] [Indexed: 05/05/2023]
Abstract
OBJECTIVES To navigate the field of digital cancer care and define and discuss key aspects and applications of big data analytics, artificial intelligence (AI), and data-driven interventions. DATA SOURCES Peer-reviewed scientific publications and expert opinion. CONCLUSION The digital transformation of cancer care, enabled by big data analytics, AI, and data-driven interventions, presents a significant opportunity to revolutionize the field. An increased understanding of the lifecycle and ethics of data-driven interventions will enhance development of innovative and applicable products to advance digital cancer care services. IMPLICATIONS FOR NURSING PRACTICE As digital technologies become integrated into cancer care, nurse practitioners and scientists will be required to increase their knowledge and skills to effectively use these tools to the patient's benefit. An enhanced understanding of the core concepts of AI and big data, confident use of digital health platforms, and ability to interpret the outputs of data-driven interventions are key competencies. Nurses in oncology will play a crucial role in patient education around big data and AI, with a focus on addressing any arising questions, concerns, or misconceptions to foster trust in these technologies. Successful integration of data-driven innovations into oncology nursing practice will empower practitioners to deliver more personalized, effective, and evidence-based care.
Collapse
Affiliation(s)
- Nikolaos Papachristou
- Medical Physics and Digital Innovation Laboratory, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | | | - Nikolaos Dikaios
- Centre for Vision Speech and Signal Processing, University of Surrey, Guildford, UK; Mathematics Research Centre, Academy of Athens, Athens, Greece
| | - Sarah J Allison
- Department of Sport, Exercise and Rehabilitation, Faculty of Health and Life Sciences, Northumbria University, Newcastle, UK; School of Bioscience and Medicine, Faculty of Health & Medical Sciences, University of Surrey, Guildford, UK
| | | | - Taranpreet Rai
- Centre for Vision Speech and Signal Processing, University of Surrey, Guildford, UK; Datalab, The Veterinary Health Innovation Engine (vHive), Guildford, UK
| | - Holger Kunz
- Institute of Health Informatics, University College London, London, UK
| | - Payam Barnaghi
- UK Dementia Research Institute Care Research and Technology Centre, Imperial College London, London, UK
| | - Christine Miaskowski
- School of Nursing, University California San Francisco, San Francisco, California, USA
| | - Panagiotis D Bamidis
- Medical Physics and Digital Innovation Laboratory, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
12
|
Krumm N. Organizational and Technical Security Considerations for Laboratory Cloud Computing. J Appl Lab Med 2023; 8:180-193. [PMID: 36610429 DOI: 10.1093/jalm/jfac118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 10/25/2022] [Indexed: 01/09/2023]
Abstract
BACKGROUND Clinical and anatomical pathology services are increasingly utilizing cloud information technology (IT) solutions to meet growing requirements for storage, computation, and other IT services. Cloud IT solutions are often considered on the promise of low cost of entry, durability and reliability, scalability, and features that are typically out of reach for small- or mid-sized IT organizations. However, use of cloud-based IT infrastructure also brings additional security and privacy risks to organizations, as unfamiliarity, public networks, and complex feature sets contribute to an increased surface area for attacks. CONTENT In this best-practices guide, we aim to help both managers and IT professionals in healthcare environments understand the requirements and risks when using cloud-based IT infrastructure within the laboratory environment. We will describe how technical, operational, and organizational best practices that can help mitigate security, privacy, and other risks associated with the use of could infrastructure; furthermore, we identify how these best practices fit into healthcare regulatory frameworks.Among organizational best practices, we identify the need for specific hiring requirements, relationships with parent IT groups, mechanisms for reviewing and auditing security practices, and sound practices for onboarding and offboarding employees. Then, we highlight selected specific operational security, account security, and auditing/logging best practices. Finally, we describe how individual cloud technologies have specific resource-level security features. SUMMARY We emphasize that laboratory directors, managers, and IT professionals must ensure that the fundamental organizational and process-based requirements are addressed first, to establish the groundwork for technical security solutions and successful implementation of cloud infrastructure.
Collapse
Affiliation(s)
- Niklas Krumm
- Division of Informatics, Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA
| |
Collapse
|
13
|
Dall'Alba G, Casa PL, Abreu FPD, Notari DL, de Avila E Silva S. A Survey of Biological Data in a Big Data Perspective. BIG DATA 2022; 10:279-297. [PMID: 35394342 DOI: 10.1089/big.2020.0383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The amount of available data is continuously growing. This phenomenon promotes a new concept, named big data. The highlight technologies related to big data are cloud computing (infrastructure) and Not Only SQL (NoSQL; data storage). In addition, for data analysis, machine learning algorithms such as decision trees, support vector machines, artificial neural networks, and clustering techniques present promising results. In a biological context, big data has many applications due to the large number of biological databases available. Some limitations of biological big data are related to the inherent features of these data, such as high degrees of complexity and heterogeneity, since biological systems provide information from an atomic level to interactions between organisms or their environment. Such characteristics make most bioinformatic-based applications difficult to build, configure, and maintain. Although the rise of big data is relatively recent, it has contributed to a better understanding of the underlying mechanisms of life. The main goal of this article is to provide a concise and reliable survey of the application of big data-related technologies in biology. As such, some fundamental concepts of information technology, including storage resources, analysis, and data sharing, are described along with their relation to biological data.
Collapse
Affiliation(s)
- Gabriel Dall'Alba
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
- Genome Science and Technology Program, Faculty of Science, The University of British Columbia, Vancouver, Canada
| | - Pedro Lenz Casa
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Fernanda Pessi de Abreu
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Daniel Luis Notari
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| | - Scheila de Avila E Silva
- Computational Biology and Bioinformatics Laboratory, Biotechnology Institute, Department of Life Sciences, University of Caxias do Sul, Caxias do Sul, Brazil
| |
Collapse
|
14
|
Diversifying the genomic data science research community. Genome Res 2022; 32:gr.276496.121. [PMID: 35858750 PMCID: PMC9341509 DOI: 10.1101/gr.276496.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 06/02/2022] [Indexed: 11/25/2022]
Abstract
Over the past 20 years, the explosion of genomic data collection and the cloud computing revolution have made computational and data science research accessible to anyone with a web browser and an internet connection. However, students at institutions with limited resources have received relatively little exposure to curricula or professional development opportunities that lead to careers in genomic data science. To broaden participation in genomics research, the scientific community needs to support these programs in local education and research at underserved institutions (UIs). These include community colleges, historically Black colleges and universities, Hispanic-serving institutions, and tribal colleges and universities that support ethnically, racially, and socioeconomically underrepresented students in the United States. We have formed the Genomic Data Science Community Network to support students, faculty, and their networks to identify opportunities and broaden access to genomic data science. These opportunities include expanding access to infrastructure and data, providing UI faculty development opportunities, strengthening collaborations among faculty, recognizing UI teaching and research excellence, fostering student awareness, developing modular and open-source resources, expanding course-based undergraduate research experiences (CUREs), building curriculum, supporting student professional development and research, and removing financial barriers through funding programs and collaborator support.
Collapse
|
15
|
Navale V, McAuliffe M. The Integration of a Canonical Workflow Framework with an Informatics
System for Disease Area Research. DATA INTELLIGENCE 2022. [DOI: 10.1162/dint_a_00125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
Abstract
A recurring pattern of access to existing databases, data analyses, formulation of new hypotheses, use of an experimental design, institutional review board approvals, data collection, curation, and storage within trusted digital repositories is observable during clinical research work. The workflows that support the repeated nature of these activities can be ascribed as a Canonical Workflow Framework for Research (CWFR). Disease area clinical research is protocol specific, and during data collection, the electronic case report forms can use Common Data Elements (CDEs) that have precisely defined questions and are associated with the specified value(s) as responses. The CDE-based CWFR is integrated with a biomedical research informatics computing system, which consists of a complete stack of technical layers including the Protocol and Form Research Management System. The unique data dictionaries associated with the CWFR for Traumatic Brain Injury and Parkinson's Disease resulted in the development of the Federal Interagency Traumatic Brain Injury and Parkinson's Disease Biomarker systems. Due to a canonical workflow, these two systems can use similar tools, applications, and service modules to create findable, accessible, interoperable, and reusable Digital Objects. The Digital Objects for Traumatic Brain Injury and Parkinson's disease contain all relevant information needed from the time data is collected, validated, and maintained within a Storage Repository for future access. All Traumatic Brain Injury and Parkinson's Disease studies can be shared as Research Objects that can be produced by aggregating related resources as information packages and is findable on the Internet by using unique identifiers. Overall, the integration of CWFR with an informatics system has resulted in the reuse of software applications for several National Institutes of Health-supported biomedical research programs.
Collapse
Affiliation(s)
- Vivek Navale
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Matthew McAuliffe
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
16
|
Pillay NS, Ross OA, Christoffels A, Bardien S. Current Status of Next-Generation Sequencing Approaches for Candidate Gene Discovery in Familial Parkinson´s Disease. Front Genet 2022; 13:781816. [PMID: 35299952 PMCID: PMC8921601 DOI: 10.3389/fgene.2022.781816] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/12/2022] [Indexed: 11/13/2022] Open
Abstract
Parkinson’s disease is a neurodegenerative disorder with a heterogeneous genetic etiology. The advent of next-generation sequencing (NGS) technologies has aided novel gene discovery in several complex diseases, including PD. This Perspective article aimed to explore the use of NGS approaches to identify novel loci in familial PD, and to consider their current relevance. A total of 17 studies, spanning various populations (including Asian, Middle Eastern and European ancestry), were identified. All the studies used whole-exome sequencing (WES), with only one study incorporating both WES and whole-genome sequencing. It is worth noting how additional genetic analyses (including linkage analysis, haplotyping and homozygosity mapping) were incorporated to enhance the efficacy of some studies. Also, the use of consanguineous families and the specific search for de novo mutations appeared to facilitate the finding of causal mutations. Across the studies, similarities and differences in downstream analysis methods and the types of bioinformatic tools used, were observed. Although these studies serve as a practical guide for novel gene discovery in familial PD, these approaches have not significantly resolved the “missing heritability” of PD. We speculate that what is needed is the use of third-generation sequencing technologies to identify complex genomic rearrangements and new sequence variation, missed with existing methods. Additionally, the study of ancestrally diverse populations (in particular those of Black African ancestry), with the concomitant optimization and tailoring of sequencing and analytic workflows to these populations, are critical. Only then, will this pave the way for exciting new discoveries in the field.
Collapse
Affiliation(s)
- Nikita Simone Pillay
- South African National Bioinformatics Institute (SANBI), South African Medical Research Council Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
| | - Owen A. Ross
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, United States
- Department of Clinical Genomics, Mayo Clinic, Jacksonville, FL, United States
| | - Alan Christoffels
- South African National Bioinformatics Institute (SANBI), South African Medical Research Council Bioinformatics Unit, University of the Western Cape, Bellville, South Africa
- Africa Centres for Disease Control and Prevention, African Union Headquarters, Addis Ababa, Ethiopia
| | - Soraya Bardien
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- South African Medical Research Council/Stellenbosch University Genomics of Brain Disorders Research Unit, Cape Town, South Africa
- *Correspondence: Soraya Bardien,
| |
Collapse
|
17
|
Martínez-García M, Hernández-Lemus E. Data Integration Challenges for Machine Learning in Precision Medicine. Front Med (Lausanne) 2022; 8:784455. [PMID: 35145977 PMCID: PMC8821900 DOI: 10.3389/fmed.2021.784455] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/28/2021] [Indexed: 12/19/2022] Open
Abstract
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on different databases about the molecular and environmental origins of disease, into analytic frameworks, allowing the development of individualized, context-dependent diagnostics, and therapeutic approaches. In this regard, artificial intelligence and machine learning approaches can be used to build analytical models of complex disease aimed at prediction of personalized health conditions and outcomes. Such models must handle the wide heterogeneity of individuals in both their genetic predisposition and their social and environmental determinants. Computational approaches to medicine need to be able to efficiently manage, visualize and integrate, large datasets combining structure, and unstructured formats. This needs to be done while constrained by different levels of confidentiality, ideally doing so within a unified analytical architecture. Efficient data integration and management is key to the successful application of computational intelligence approaches to medicine. A number of challenges arise in the design of successful designs to medical data analytics under currently demanding conditions of performance in personalized medicine, while also subject to time, computational power, and bioethical constraints. Here, we will review some of these constraints and discuss possible avenues to overcome current challenges.
Collapse
Affiliation(s)
- Mireya Martínez-García
- Clinical Research Division, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autnoma de Mexico, Mexico City, Mexico
| |
Collapse
|
18
|
Lacar B. Generation of Centered Log-Ratio Normalized Antibody-Derived Tag Counts from Large Single-Cell Sequencing Datasets. Methods Mol Biol 2022; 2386:203-217. [PMID: 34766274 DOI: 10.1007/978-1-0716-1771-7_14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent developments in single-cell analysis has provided the ability to assay >50 surface-level proteins by combining oligo-conjugated antibodies with sequencing technology. These methods, such as CITE-seq and REAP-seq, have added another modality to single-cell analysis, enhancing insight across many biological subdisciplines. While packages like Seurat have greatly facilitated analysis of single-cell protein expression, the practical steps to carry out the analysis with increasingly larger datasets have been fragmented. In addition, using data visualizations, I will highlight some details about the centered log-ratio (CLR) normalization of antibody-derived tag (ADT) counts that may be overlooked. In this method chapter, I provide detailed steps to generate CLR-normalized CITE-seq data using cloud computing from a large CITE-seq dataset.
Collapse
Affiliation(s)
- Benjamin Lacar
- University of California, San Francisco, San Francisco, CA, USA.
- University of California, Berkeley, Berkeley, CA, USA.
| |
Collapse
|
19
|
Afolayan AO, Bernal JF, Gayeta JM, Masim ML, Shamanna V, Abrudan M, Abudahab K, Argimón S, Carlos CC, Sia S, Ravikumar KL, Okeke IN, Donado-Godoy P, Aanensen DM, Underwood A. Overcoming Data Bottlenecks in Genomic Pathogen Surveillance. Clin Infect Dis 2021; 73:S267-S274. [PMID: 34850839 PMCID: PMC8634317 DOI: 10.1093/cid/ciab785] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Performing whole genome sequencing (WGS) for the surveillance of antimicrobial resistance offers the ability to determine not only the antimicrobials to which rates of resistance are increasing, but also the evolutionary mechanisms and transmission routes responsible for the increase at local, national, and global scales. To derive WGS-based outputs, a series of processes are required, beginning with sample and metadata collection, followed by nucleic acid extraction, library preparation, sequencing, and analysis. Throughout this pathway there are many data-related operations required (informatics) combined with more biologically focused procedures (bioinformatics). For a laboratory aiming to implement pathogen genomics, the informatics and bioinformatics activities can be a barrier to starting on the journey; for a laboratory that has already started, these activities may become overwhelming. Here we describe these data bottlenecks and how they have been addressed in laboratories in India, Colombia, Nigeria, and the Philippines, as part of the National Institute for Health Research Global Health Research Unit on Genomic Surveillance of Antimicrobial Resistance. The approaches taken include the use of reproducible data parsing pipelines and genome sequence analysis workflows, using technologies such as Data-flo, the Nextflow workflow manager, and containerization of software dependencies. By overcoming barriers to WGS implementation in countries where genome sampling for some species may be underrepresented, a body of evidence can be built to determine the concordance of antimicrobial sensitivity testing and genome-derived resistance, and novel high-risk clones and unknown mechanisms of resistance can be discovered.
Collapse
Affiliation(s)
- Ayorinde O Afolayan
- Department of Pharmaceutical Microbiology, Faculty of Pharmacy, University of Ibadan, Oyo State, Nigeria
| | - Johan Fabian Bernal
- Colombian Integrated Program for Antimicrobial Resistance Surveillance, Centro de Investigatión Tibaitatá, Corporación Colombiana de Investigación Agropecuaria, Tibaitatá, Mosquera, Cundinamarca, Colombia
| | - June M Gayeta
- Antimicrobial Resistance Surveillance Reference Laboratory, Research Institute for Tropical Medicine, Muntinlupa, Philippines
| | - Melissa L Masim
- Antimicrobial Resistance Surveillance Reference Laboratory, Research Institute for Tropical Medicine, Muntinlupa, Philippines
| | - Varun Shamanna
- Central Research Laboratory, Kempegowda Institute of Medical Sciences, Bengaluru, India
| | - Monica Abrudan
- Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Khalil Abudahab
- Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Silvia Argimón
- Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Celia C Carlos
- Antimicrobial Resistance Surveillance Reference Laboratory, Research Institute for Tropical Medicine, Muntinlupa, Philippines
| | - Sonia Sia
- Antimicrobial Resistance Surveillance Reference Laboratory, Research Institute for Tropical Medicine, Muntinlupa, Philippines
| | - Kadahalli L Ravikumar
- Central Research Laboratory, Kempegowda Institute of Medical Sciences, Bengaluru, India
| | - Iruka N Okeke
- The NIHR Global Health Research Unit for the Genomic Surveillance of Antimicrobial Resistance
| | - Pilar Donado-Godoy
- Colombian Integrated Program for Antimicrobial Resistance Surveillance, Centro de Investigatión Tibaitatá, Corporación Colombiana de Investigación Agropecuaria, Tibaitatá, Mosquera, Cundinamarca, Colombia
| | - David M Aanensen
- Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Anthony Underwood
- Centre for Genomic Pathogen Surveillance, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
20
|
Barnes C, Bajracharya B, Cannalte M, Gowani Z, Haley W, Kass-Hout T, Hernandez K, Ingram M, Juvvala HP, Kuffel G, Martinov P, Maxwell JM, McCann J, Malhotra A, Metoki-Shlubsky N, Meyer C, Paredes A, Qureshi J, Ritter X, Schumm P, Shao M, Sheth U, Simmons T, VanTol A, Zhang Z, Grossman RL. The Biomedical Research Hub: a federated platform for patient research data. J Am Med Inform Assoc 2021; 29:619-625. [PMID: 35289369 PMCID: PMC8922179 DOI: 10.1093/jamia/ocab247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 09/24/2021] [Accepted: 10/27/2021] [Indexed: 11/17/2022] Open
Abstract
Objective The objective was to develop and operate a cloud-based federated system for managing, analyzing, and sharing patient data for research purposes, while allowing each resource sharing patient data to operate their component based upon their own governance rules. The federated system is called the Biomedical Research Hub (BRH). Materials and Methods The BRH is a cloud-based federated system built over a core set of software services called framework services. BRH framework services include authentication and authorization, services for generating and assessing findable, accessible, interoperable, and reusable (FAIR) data, and services for importing and exporting bulk clinical data. The BRH includes data resources providing data operated by different entities and workspaces that can access and analyze data from one or more of the data resources in the BRH. Results The BRH contains multiple data commons that in aggregate provide access to over 6 PB of research data from over 400 000 research participants. Discussion and conclusion With the growing acceptance of using public cloud computing platforms for biomedical research, and the growing use of opaque persistent digital identifiers for datasets, data objects, and other entities, there is now a foundation for systems that federate data from multiple independently operated data resources that expose FAIR application programming interfaces, each using a separate data model. Applications can be built that access data from one or more of the data resources.
Collapse
Affiliation(s)
- Craig Barnes
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Binam Bajracharya
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Matthew Cannalte
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Zakir Gowani
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Will Haley
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | | | - Kyle Hernandez
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Michael Ingram
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Hara Prasad Juvvala
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Gina Kuffel
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | | | - J Montgomery Maxwell
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - John McCann
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | | | - Noah Metoki-Shlubsky
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Chris Meyer
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Andre Paredes
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Jawad Qureshi
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Xenia Ritter
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Philip Schumm
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, USA
| | - Mingfei Shao
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Urvi Sheth
- Open Commons Consortium, Chicago, Illinois, USA
| | - Trevar Simmons
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Alexander VanTol
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Zhenyu Zhang
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
| | - Robert L Grossman
- Center for Translational Data Science, University of Chicago, Chicago, Illinois, USA
- Department of Medicine and Computer Science, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
21
|
Lim HGM, Hsiao SH, Lee YCG. Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics. BIOLOGY 2021; 10:biology10101023. [PMID: 34681121 PMCID: PMC8533344 DOI: 10.3390/biology10101023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 09/24/2021] [Accepted: 10/06/2021] [Indexed: 10/24/2022]
Abstract
Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has recently become a novel pandemic event following the swine flu that occurred in 2009, which was caused by the influenza A virus (H1N1 subtype). The accurate identification of the huge number of samples during a pandemic still remains a challenge. In this study, we integrate two technologies, next-generation sequencing and cloud computing, into an optimized workflow version that uses a specific identification algorithm on the designated cloud platform. We use 182 samples (92 for COVID-19 and 90 for swine flu) with short-read sequencing data from two open-access datasets to represent each pandemic and evaluate our workflow performance based on an index specifically created for SARS-CoV-2 or H1N1. Results show that our workflow could differentiate cases between the two pandemics with a higher accuracy depending on the index used, especially when the index that exclusively represented each dataset was used. Our workflow substantially outperforms the original complete identification workflow available on the same platform in terms of time and cost by preserving essential tools internally. Our workflow can serve as a powerful tool for the robust identification of cases and, thus, aid in controlling the current and future pandemics.
Collapse
Affiliation(s)
- Hendrick Gao-Min Lim
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11031, Taiwan;
| | - Shih-Hsin Hsiao
- Division of Pulmonary Medicine, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan;
- Division of Pulmonary Medicine, Department of Internal Medicine, Taipei Medical University Hospital, Taipei 11031, Taiwan
| | - Yuan-Chii Gladys Lee
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11031, Taiwan;
- Correspondence:
| |
Collapse
|
22
|
DiPalma J, Suriawinata AA, Tafe LJ, Torresani L, Hassanpour S. Resolution-based distillation for efficient histology image classification. Artif Intell Med 2021; 119:102136. [PMID: 34531005 PMCID: PMC8449014 DOI: 10.1016/j.artmed.2021.102136] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 07/07/2021] [Accepted: 08/02/2021] [Indexed: 12/14/2022]
Abstract
Developing deep learning models to analyze histology images has been computationally challenging, as the massive size of the images causes excessive strain on all parts of the computing pipeline. This paper proposes a novel deep learning-based methodology for improving the computational efficiency of histology image classification. The proposed approach is robust when used with images that have reduced input resolution, and it can be trained effectively with limited labeled data. Moreover, our approach operates at either the tissue- or slide-level, removing the need for laborious patch-level labeling. Our method uses knowledge distillation to transfer knowledge from a teacher model pre-trained at high resolution to a student model trained on the same images at a considerably lower resolution. Also, to address the lack of large-scale labeled histology image datasets, we perform the knowledge distillation in a self-supervised fashion. We evaluate our approach on three distinct histology image datasets associated with celiac disease, lung adenocarcinoma, and renal cell carcinoma. Our results on these datasets demonstrate that a combination of knowledge distillation and self-supervision allows the student model to approach and, in some cases, surpass the teacher model's classification accuracy while being much more computationally efficient. Additionally, we observe an increase in student classification performance as the size of the unlabeled dataset increases, indicating that there is potential for this method to scale further with additional unlabeled data. Our model outperforms the high-resolution teacher model for celiac disease in accuracy, F1-score, precision, and recall while requiring 4 times fewer computations. For lung adenocarcinoma, our results at 1.25× magnification are within 1.5% of the results for the teacher model at 10× magnification, with a reduction in computational cost by a factor of 64. Our model on renal cell carcinoma at 1.25× magnification performs within 1% of the teacher model at 5× magnification while requiring 16 times fewer computations. Furthermore, our celiac disease outcomes benefit from additional performance scaling with the use of more unlabeled data. In the case of 0.625× magnification, using unlabeled data improves accuracy by 4% over the tissue-level baseline. Therefore, our approach can improve the feasibility of deep learning solutions for digital pathology on standard computational hardware and infrastructures.
Collapse
Affiliation(s)
- Joseph DiPalma
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA
| | - Arief A Suriawinata
- Department of Pathology and Laboratory Medicine, Dartmouth-Hitchcock Medical Center, Lebanon, NH 03756, USA
| | - Laura J Tafe
- Department of Pathology and Laboratory Medicine, Dartmouth-Hitchcock Medical Center, Lebanon, NH 03756, USA
| | - Lorenzo Torresani
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA
| | - Saeed Hassanpour
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA; Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA; Department of Epidemiology, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA.
| |
Collapse
|
23
|
Zeng T, Yu X, Chen Z. Applying artificial intelligence in the microbiome for gastrointestinal diseases: A review. J Gastroenterol Hepatol 2021; 36:832-840. [PMID: 33880762 DOI: 10.1111/jgh.15503] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/20/2022]
Abstract
For a long time, gut bacteria have been recognized for their important roles in the occurrence and progression of gastrointestinal diseases like colorectal cancer, and the ever-increasing amounts of microbiome data combined with other high-quality clinical and imaging datasets are leading the study of gastrointestinal diseases into an era of biomedical big data. The "omics" technologies used for microbiome analysis continuously evolve, and the machine learning or artificial intelligence technologies are key to extract the relevant information from microbiome data. This review intends to provide a focused summary of recent research and applications of microbiome big data and to discuss the use of artificial intelligence to combat gastrointestinal diseases.
Collapse
Affiliation(s)
- Tao Zeng
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
| | - Xiangtian Yu
- Clinical Reasearch Center, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Zhangran Chen
- Institute for Microbial Ecology, School of Medicine, Xiamen University, Xiamen, China
| |
Collapse
|
24
|
Zayas-Cabán T, Chaney KJ, Rucker DW. National health information technology priorities for research: A policy and development agenda. J Am Med Inform Assoc 2021; 27:652-657. [PMID: 32090265 DOI: 10.1093/jamia/ocaa008] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 01/03/2020] [Accepted: 01/16/2020] [Indexed: 01/17/2023] Open
Abstract
The growth of digitized health data presents exciting opportunities to leverage the health information technology (IT) infrastructure for advancing biomedical and health services research. However, challenges impede use of those resources effectively and at scale to improve outcomes. The Office of the National Coordinator for Health Information Technology (ONC) led a collaborative effort to identify challenges, priorities, and actions to leverage health IT and electronic health data for research. Specifically, ONC led a review of relevant literature and programs, key informant interviews, and a stakeholder workshop to identify electronic health data and health IT infrastructure gaps. This effort resulted in the National Health IT Priorities for Research: A Policy and Development Agenda, which articulates an optimized health information ecosystem for scientific discovery. This article outlines 9 priorities and recommended actions to be implemented in collaboration with the research and informatics communities for realizing this vision.
Collapse
Affiliation(s)
- Teresa Zayas-Cabán
- Office of the National Coordinator for Health Information Technology, U.S. Department of Health and Human Services, Washington, DC, USA
| | - Kevin J Chaney
- Office of the National Coordinator for Health Information Technology, U.S. Department of Health and Human Services, Washington, DC, USA
| | - Donald W Rucker
- Office of the National Coordinator for Health Information Technology, U.S. Department of Health and Human Services, Washington, DC, USA
| |
Collapse
|
25
|
Mulshine JL, Avila RS, Conley E, Devaraj A, Ambrose LF, Flanagan T, Henschke CI, Hirsch FR, Janz R, Kakinuma R, Lam S, McWilliams A, Van Ooijen PMA, Oudkerk M, Pastorino U, Reeves A, Rogalla P, Schmidt H, Sullivan DC, Wind HHJ, Wu N, Wynes M, Xueqian X, Yankelevitz DF, Field JK. The International Association for the Study of Lung Cancer Early Lung Imaging Confederation. JCO Clin Cancer Inform 2021; 4:89-99. [PMID: 32027538 PMCID: PMC7053806 DOI: 10.1200/cci.19.00099] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
PURPOSE To improve outcomes for lung cancer through low-dose computed tomography (LDCT) early lung cancer detection. The International Association for the Study of Lung Cancer is developing the Early Lung Imaging Confederation (ELIC) to serve as an open-source, international, universally accessible environment to analyze large collections of quality-controlled LDCT images and associated biomedical data for research and routine screening care. METHODS ELIC is an international confederation that allows access to efficiently analyze large numbers of high-quality computed tomography (CT) images with associated de-identified clinical information without moving primary imaging/clinical or imaging data from its local or regional site of origin. Rather, ELIC uses a cloud-based infrastructure to distribute analysis tools to the local site of the stored imaging and clinical data, thereby allowing for research and quality studies to proceed in a vendor-neutral, collaborative environment. ELIC’s hub-and-spoke architecture will be deployed to permit analysis of CT images and associated data in a secure environment, without any requirement to reveal the data itself (ie, privacy protecting). Identifiable data remain under local control, so the resulting environment complies with national regulations and mitigates against privacy or data disclosure risk. RESULTS The goal of pilot experiments is to connect image collections of LDCT scans that can be accurately analyzed in a fashion to support a global network using methodologies that can be readily scaled to accrued databases of sufficient size to develop and validate robust quantitative imaging tools. CONCLUSION This initiative can rapidly accelerate improvements to the multidisciplinary management of early, curable lung cancer and other major thoracic diseases (eg, coronary artery disease and chronic obstructive pulmonary disease) visualized on a screening LDCT scan. The addition of a facile, quantitative CT scanner image quality conformance process is a unique step toward improving the reliability of clinical decision support with CT screening worldwide.
Collapse
Affiliation(s)
| | | | - Ed Conley
- University of Liverpool, Liverpool, United Kingdom
| | | | | | | | | | | | - Robert Janz
- University of Groningen, Groningen, Netherlands
| | | | - Stephen Lam
- University of British Columbia, Vancouver, British Columbia, Canada
| | | | | | | | | | | | - Patrick Rogalla
- Toronto Joint Department of Medical Imaging, University of Toronto, Ontario, Canada
| | - Heidi Schmidt
- Toronto Joint Department of Medical Imaging, University of Toronto, Ontario, Canada
| | | | | | - Ning Wu
- National Cancer Center, Peking Union Medical College, Beijing, China
| | - Murry Wynes
- International Association for the Study of Lung Cancer, Denver, CO
| | | | | | - John K Field
- University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
26
|
Khan SR, Al Rijjal D, Piro A, Wheeler MB. Integration of AI and traditional medicine in drug discovery. Drug Discov Today 2021; 26:982-992. [PMID: 33476566 DOI: 10.1016/j.drudis.2021.01.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 12/01/2020] [Accepted: 01/11/2021] [Indexed: 11/24/2022]
Abstract
AI integration in plant-based traditional medicine could be used to overcome drug discovery challenges.
Collapse
Affiliation(s)
- Saifur R Khan
- Endocrine and Diabetes Platform, Department of Physiology, University of Toronto, Medical Sciences Building, Room 3352, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Advanced Diagnostics, Metabolism, Toronto General Hospital Research Institute, Toronto, ON, Canada.
| | - Dana Al Rijjal
- Endocrine and Diabetes Platform, Department of Physiology, University of Toronto, Medical Sciences Building, Room 3352, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Advanced Diagnostics, Metabolism, Toronto General Hospital Research Institute, Toronto, ON, Canada
| | - Anthony Piro
- Endocrine and Diabetes Platform, Department of Physiology, University of Toronto, Medical Sciences Building, Room 3352, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Advanced Diagnostics, Metabolism, Toronto General Hospital Research Institute, Toronto, ON, Canada
| | - Michael B Wheeler
- Endocrine and Diabetes Platform, Department of Physiology, University of Toronto, Medical Sciences Building, Room 3352, 1 King's College Circle, Toronto, ON M5S 1A8, Canada; Advanced Diagnostics, Metabolism, Toronto General Hospital Research Institute, Toronto, ON, Canada
| |
Collapse
|
27
|
Wilton R, Szalay AS. Arioc: High-concurrency short-read alignment on multiple GPUs. PLoS Comput Biol 2020; 16:e1008383. [PMID: 33166275 PMCID: PMC7676696 DOI: 10.1371/journal.pcbi.1008383] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 11/19/2020] [Accepted: 09/10/2020] [Indexed: 12/22/2022] Open
Abstract
In large DNA sequence repositories, archival data storage is often coupled with computers that provide 40 or more CPU threads and multiple GPU (general-purpose graphics processing unit) devices. This presents an opportunity for DNA sequence alignment software to exploit high-concurrency hardware to generate short-read alignments at high speed. Arioc, a GPU-accelerated short-read aligner, can compute WGS (whole-genome sequencing) alignments ten times faster than comparable CPU-only alignment software. When two or more GPUs are available, Arioc's speed increases proportionately because the software executes concurrently on each available GPU device. We have adapted Arioc to recent multi-GPU hardware architectures that support high-bandwidth peer-to-peer memory accesses among multiple GPUs. By modifying Arioc's implementation to exploit this GPU memory architecture we obtained a further 1.8x-2.9x increase in overall alignment speeds. With this additional acceleration, Arioc computes two million short-read alignments per second in a four-GPU system; it can align the reads from a human WGS sequencer run–over 500 million 150nt paired-end reads–in less than 15 minutes. As WGS data accumulates exponentially and high-concurrency computational resources become widespread, Arioc addresses a growing need for timely computation in the short-read data analysis toolchain.
Collapse
Affiliation(s)
- Richard Wilton
- Department of Physics and Astronomy, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail:
| | - Alexander S. Szalay
- Department of Physics and Astronomy, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
28
|
Srivastava A, Hanig JP. Quantitative neurotoxicology: Potential role of artificial intelligence/deep learning approach. J Appl Toxicol 2020; 41:996-1006. [PMID: 33140470 DOI: 10.1002/jat.4098] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 10/17/2020] [Indexed: 12/17/2022]
Abstract
Neurotoxicity studies are important in the preclinical stages of drug development process, because exposure to certain compounds that may enter the brain across a permeable blood brain barrier damages neurons and other supporting cells such as astrocytes. This could, in turn, lead to various neurological disorders such as Parkinson's or Huntington's disease as well as various dementias. Toxicity assessment is often done by pathologists after these exposures by qualitatively or semiquantitatively grading the severity of neurotoxicity in histopathology slides. Quantification of the extent of neurotoxicity supports qualitative histopathological analysis and provides a better understanding of the global extent of brain damage. Stereological techniques such as the utilization of an optical fractionator provide an unbiased quantification of the neuronal damage; however, the process is time-consuming. Advent of whole slide imaging (WSI) introduced digital image analysis which made quantification of neurotoxicity automated, faster and with reduced bias, making statistical comparisons possible. Although automated to a certain level, simple digital image analysis requires manual efforts of experts which is time-consuming and limits analysis of large datasets. Digital image analysis coupled with a deep learning artificial intelligence model provides a good alternative solution to time-consuming stereological and simple digital analysis. Deep learning models could be trained to identify damaged or dead neurons in an automated fashion. This review has focused on and discusses studies demonstrating the role of deep learning in segmentation of brain regions, toxicity detection and quantification of degenerated neurons as well as the estimation of area/volume of degeneration.
Collapse
Affiliation(s)
- Anshul Srivastava
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| | - Joseph P Hanig
- Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, Maryland, USA
| |
Collapse
|
29
|
Kanzi AM, San JE, Chimukangara B, Wilkinson E, Fish M, Ramsuran V, de Oliveira T. Next Generation Sequencing and Bioinformatics Analysis of Family Genetic Inheritance. Front Genet 2020; 11:544162. [PMID: 33193618 PMCID: PMC7649788 DOI: 10.3389/fgene.2020.544162] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Accepted: 09/21/2020] [Indexed: 12/29/2022] Open
Abstract
Mendelian and complex genetic trait diseases continue to burden and affect society both socially and economically. The lack of effective tests has hampered diagnosis thus, the affected lack proper prognosis. Mendelian diseases are caused by genetic mutations in a singular gene while complex trait diseases are caused by the accumulation of mutations in either linked or unlinked genomic regions. Significant advances have been made in identifying novel diseases associated mutations especially with the introduction of next generation and third generation sequencing. Regardless, some diseases are still without diagnosis as most tests rely on SNP genotyping panels developed from population based genetic analyses. Analysis of family genetic inheritance using whole genomes, whole exomes or a panel of genes has been shown to be effective in identifying disease-causing mutations. In this review, we discuss next generation and third generation sequencing platforms, bioinformatic tools and genetic resources commonly used to analyze family based genomic data with a focus on identifying inherited or novel disease-causing mutations. Additionally, we also highlight the analytical, ethical and regulatory challenges associated with analyzing personal genomes which constitute the data used for family genetic inheritance.
Collapse
Affiliation(s)
- Aquillah M. Kanzi
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), School of Laboratory Medicine and Medical Sciences, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | | | | | | | | | | | | |
Collapse
|
30
|
Vignolo SM, Diray-Arce J, McEnaney K, Rao S, Shannon CP, Idoko OT, Cole F, Darboe A, Cessay F, Ben-Othman R, Tebbutt SJ, Kampmann B, Levy O, Ozonoff A. A cloud-based bioinformatic analytic infrastructure and Data Management Core for the Expanded Program on Immunization Consortium. J Clin Transl Sci 2020; 5:e52. [PMID: 33948273 PMCID: PMC8057481 DOI: 10.1017/cts.2020.546] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 08/06/2020] [Accepted: 09/14/2020] [Indexed: 12/30/2022] Open
Abstract
The Expanded Program for Immunization Consortium - Human Immunology Project Consortium study aims to employ systems biology to identify and characterize vaccine-induced biomarkers that predict immunogenicity in newborns. Key to this effort is the establishment of the Data Management Core (DMC) to provide reliable data and bioinformatic infrastructure for centralized curation, storage, and analysis of multiple de-identified "omic" datasets. The DMC established a cloud-based architecture using Amazon Web Services to track, store, and share data according to National Institutes of Health standards. The DMC tracks biological samples during collection, shipping, and processing while capturing sample metadata and associated clinical data. Multi-omic datasets are stored in access-controlled Amazon Simple Storage Service (S3) for data security and file version control. All data undergo quality control processes at the generating site followed by DMC validation for quality assurance. The DMC maintains a controlled computing environment for data analysis and integration. Upon publication, the DMC deposits finalized datasets to public repositories. The DMC architecture provides resources and scientific expertise to accelerate translational discovery. Robust operations allow rapid sharing of results across the project team. Maintenance of data quality standards and public data deposition will further benefit the scientific community.
Collapse
Affiliation(s)
- Sofia M. Vignolo
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, USA
- Division of Infectious Diseases, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
| | - Joann Diray-Arce
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, USA
- Division of Infectious Diseases, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Kerry McEnaney
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, USA
| | - Shun Rao
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, USA
| | | | - Olubukola T. Idoko
- Vaccines & Immunity Theme, Medical Research Council Unit, The Gambia at the London School of Hygiene and Tropical Medicine, Atlantic Boulevard, Banjul, The Gambia
- Vaccine Centre, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Fatoumata Cole
- Vaccines & Immunity Theme, Medical Research Council Unit, The Gambia at the London School of Hygiene and Tropical Medicine, Atlantic Boulevard, Banjul, The Gambia
| | - Alansana Darboe
- Vaccines & Immunity Theme, Medical Research Council Unit, The Gambia at the London School of Hygiene and Tropical Medicine, Atlantic Boulevard, Banjul, The Gambia
- Vaccine Centre, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Fatoumatta Cessay
- Vaccines & Immunity Theme, Medical Research Council Unit, The Gambia at the London School of Hygiene and Tropical Medicine, Atlantic Boulevard, Banjul, The Gambia
| | | | - Scott J. Tebbutt
- PROOF Centre of Excellence, Vancouver, BC, Canada
- Centre for Heart Lung Innovation, St Paul’s Hospital, University of British Columbia, Vancouver, BC, Canada
- Division of Respiratory Medicine, Department of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Beate Kampmann
- Vaccines & Immunity Theme, Medical Research Council Unit, The Gambia at the London School of Hygiene and Tropical Medicine, Atlantic Boulevard, Banjul, The Gambia
- Vaccine Centre, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Ofer Levy
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, USA
- Division of Infectious Diseases, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Al Ozonoff
- Precision Vaccines Program, Boston Children’s Hospital, Boston, MA, USA
- Division of Infectious Diseases, Department of Pediatrics, Boston Children’s Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
31
|
Jespersgaard C, Syed A, Chmura P, Løngreen P. Supercomputing and Secure Cloud Infrastructures in Biology and Medicine. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-012920-013357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The increasing amounts of healthcare data stored in health registries, in combination with genomic and other types of data, have the potential to enable better decision making and pave the path for personalized medicine. However, reaping the full benefits of big, sensitive data for the benefit of patients requires greater access to data across organizations and institutions in various regions. This overview first introduces cloud computing and takes stock of the challenges to enhancing data availability in the healthcare system. Four models for ensuring higher data accessibility are then discussed. Finally, several cases are discussed that explore how enhanced access to data would benefit the end user.
Collapse
Affiliation(s)
| | - Ali Syed
- Danish National Genome Center, DK-2300 Copenhagen S, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, DK-2200 Copenhagen N, Denmark
| | - Peter Løngreen
- Danish National Genome Center, DK-2300 Copenhagen S, Denmark
| |
Collapse
|
32
|
|
33
|
Goh WWB, Wong L. The Birth of Bio-data Science: Trends, Expectations, and Applications. GENOMICS, PROTEOMICS & BIOINFORMATICS 2020; 18:5-15. [PMID: 32428604 PMCID: PMC7393550 DOI: 10.1016/j.gpb.2020.01.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 12/02/2019] [Accepted: 02/26/2020] [Indexed: 12/23/2022]
Affiliation(s)
- Wilson Wen Bin Goh
- (1)School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore.
| | - Limsoon Wong
- (2)Department of Computer Science, National University of Singapore, Singapore 117417, Singapore.
| |
Collapse
|
34
|
Stevens L, Kao D, Hall J, Görg C, Abdo K, Linstead E. ML-MEDIC: A Preliminary Study of an Interactive Visual Analysis Tool Facilitating Clinical Applications of Machine Learning for Precision Medicine. APPLIED SCIENCES (BASEL, SWITZERLAND) 2020; 10:3309. [PMID: 33664984 PMCID: PMC7928533 DOI: 10.3390/app10093309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Accessible interactive tools that integrate machine learning methods with clinical research and reduce the programming experience required are needed to move science forward. Here, we present Machine Learning for Medical Exploration and Data-Inspired Care (ML-MEDIC), a point-and-click, interactive tool with a visual interface for facilitating machine learning and statistical analyses in clinical research. We deployed ML-MEDIC in the American Heart Association (AHA) Precision Medicine Platform to provide secure internet access and facilitate collaboration. ML-MEDIC's efficacy for facilitating the adoption of machine learning was evaluated through two case studies in collaboration with clinical domain experts. A domain expert review was also conducted to obtain an impression of the usability and potential limitations.
Collapse
Affiliation(s)
- Laura Stevens
- Department of Cardiology, University of Colorado Medical School, Aurora, CO 80045, USA
- Cardiovascular Medicine, Institute for Precision Cardiovascular Medicine at the American Heart Association, Dallas, TX 75231, USA
| | - David Kao
- Department of Cardiology, University of Colorado Medical School, Aurora, CO 80045, USA
| | - Jennifer Hall
- Cardiovascular Medicine, Institute for Precision Cardiovascular Medicine at the American Heart Association, Dallas, TX 75231, USA
| | - Carsten Görg
- Department of Cardiology, University of Colorado Medical School, Aurora, CO 80045, USA
| | - Kaitlyn Abdo
- Electrical Engineering and Computer Science, Chapman University, Orange, CA 92866, USA
| | - Erik Linstead
- Electrical Engineering and Computer Science, Chapman University, Orange, CA 92866, USA
| |
Collapse
|
35
|
Ko G, Kim PG, Cho Y, Jeong S, Kim JY, Kim KH, Lee HY, Han J, Yu N, Ham S, Jang I, Kang B, Shin S, Kim L, Lee SW, Nam D, Kim JF, Kim N, Kim SY, Lee S, Roh TY, Lee B. Bioinformatics services for analyzing massive genomic datasets. Genomics Inform 2020; 18:e8. [PMID: 32224841 PMCID: PMC7120352 DOI: 10.5808/gi.2020.18.1.e8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 03/11/2020] [Indexed: 11/25/2022] Open
Abstract
The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www.bioexpress.re.kr/.
Collapse
Affiliation(s)
- Gunhwan Ko
- Korea Bioinformation Center (KOBIC), KRIBB, Daejeon 34141, Korea
| | - Pan-Gyu Kim
- Korea Bioinformation Center (KOBIC), KRIBB, Daejeon 34141, Korea
| | - Youngbum Cho
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | - Seongmun Jeong
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | - Jae-Yoon Kim
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | | | - Ho-Yeon Lee
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | - Jiyeon Han
- Department of BioInformation Science, Ewha Womans University, Seoul 03760, Korea
| | - Namhee Yu
- Department of BioInformation Science, Ewha Womans University, Seoul 03760, Korea
| | - Seokjin Ham
- Department of Life Sciences and Division of Integrative Biosciences & Biotechnology, Pohang University of Science & Technology (POSTECH), Pohang 37673, Korea
| | - Insoon Jang
- Department of Life Sciences and Division of Integrative Biosciences & Biotechnology, Pohang University of Science & Technology (POSTECH), Pohang 37673, Korea
| | - Byunghee Kang
- Department of Life Sciences and Division of Integrative Biosciences & Biotechnology, Pohang University of Science & Technology (POSTECH), Pohang 37673, Korea
| | - Sunguk Shin
- Department of Systems, Biology Division of Life Sciences, and Institute for Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Lian Kim
- Bioposh Inc., Daejeon 34016, Korea
| | | | - Dougu Nam
- School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Korea
| | - Jihyun F Kim
- Department of Systems, Biology Division of Life Sciences, and Institute for Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea.,Strategic Initiative for Microbiomes in Agriculture and Food, Yonsei University, Seoul 03722, Korea
| | - Namshin Kim
- Genome Editing Research Center, KRIBB, Daejeon 34141, Korea
| | - Seon-Young Kim
- Genome Structure Research Center, KRIBB, Daejeon 34141, Korea
| | - Sanghyuk Lee
- Department of BioInformation Science, Ewha Womans University, Seoul 03760, Korea
| | - Tae-Young Roh
- Department of Life Sciences and Division of Integrative Biosciences & Biotechnology, Pohang University of Science & Technology (POSTECH), Pohang 37673, Korea.,SysGenLab Inc., Pohang 37613, Korea
| | - Byungwook Lee
- Korea Bioinformation Center (KOBIC), KRIBB, Daejeon 34141, Korea
| |
Collapse
|
36
|
Ziegler E, Urban T, Brown D, Petts J, Pieper SD, Lewis R, Hafey C, Harris GJ. Open Health Imaging Foundation Viewer: An Extensible Open-Source Framework for Building Web-Based Imaging Applications to Support Cancer Research. JCO Clin Cancer Inform 2020; 4:336-345. [PMID: 32324447 PMCID: PMC7259879 DOI: 10.1200/cci.19.00131] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/16/2020] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Zero-footprint Web architecture enables imaging applications to be deployed on premise or in the cloud without requiring installation of custom software on the user's computer. Benefits include decreased costs and information technology support requirements, as well as improved accessibility across sites. The Open Health Imaging Foundation (OHIF) Viewer is an extensible platform developed to leverage these benefits and address the demand for open-source Web-based imaging applications. The platform can be modified to support site-specific workflows and accommodate evolving research requirements. MATERIALS AND METHODS The OHIF Viewer provides basic image review functionality (eg, image manipulation and measurement) as well as advanced visualization (eg, multiplanar reformatting). It is written as a client-only, single-page Web application that can easily be embedded into third-party applications or hosted as a standalone Web site. The platform provides extension points for software developers to include custom tools and adapt the system for their workflows. It is standards compliant and relies on DICOMweb for data exchange and OpenID Connect for authentication, but it can be configured to use any data source or authentication flow. Additionally, the user interface components are provided in a standalone component library so that developers can create custom extensions. RESULTS The OHIF Viewer and its underlying components have been widely adopted and integrated into multiple clinical research platforms (e,g Precision Imaging Metrics, XNAT, LabCAS, ISB-CGC) and commercial applications (eg, Osirix). It has also been used to build custom imaging applications (eg, ProstateCancer.ai, Crowds Cure Cancer [presented as a case study]). CONCLUSION The OHIF Viewer provides a flexible framework for building applications to support imaging research. Its adoption could reduce redundancies in software development for National Cancer Institute-funded projects, including Informatics Technology for Cancer Research and the Quantitative Imaging Network.
Collapse
Affiliation(s)
| | - Trinity Urban
- Open Health Imaging Foundation, Boston, MA
- Precision Imaging Metrics, Boston, MA
| | | | | | | | - Rob Lewis
- Open Health Imaging Foundation, Boston, MA
| | | | - Gordon J. Harris
- Open Health Imaging Foundation, Boston, MA
- Precision Imaging Metrics, Boston, MA
| |
Collapse
|
37
|
Danilevicz MF, Tay Fernandez CG, Marsh JI, Bayer PE, Edwards D. Plant pangenomics: approaches, applications and advancements. CURRENT OPINION IN PLANT BIOLOGY 2020; 54:18-25. [PMID: 31982844 DOI: 10.1016/j.pbi.2019.12.005] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 12/15/2019] [Accepted: 12/18/2019] [Indexed: 05/05/2023]
Abstract
With the assembly of increasing numbers of plant genomes, it is becoming accepted that a single reference assembly does not reflect the gene diversity of a species. The production of pangenomes, which reflect the structural variation and polymorphisms in genomes, enables in depth comparisons of variation within species or higher taxonomic groups. In this review, we discuss the current and emerging approaches for pangenome assembly, analysis and visualisation. In addition, we consider the potential of pangenomes for applied crop improvement, evolutionary and biodiversity studies. To fully exploit the value of pangenomes it is important to integrate broad information such as phenotypic, environmental, and expression data to gain insights into the role of variable regions within genomes.
Collapse
Affiliation(s)
- Monica Furaste Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | | | - Jacob Ian Marsh
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Philipp Emanuel Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia.
| |
Collapse
|
38
|
Aarestrup FM, Albeyatti A, Armitage WJ, Auffray C, Augello L, Balling R, Benhabiles N, Bertolini G, Bjaalie JG, Black M, Blomberg N, Bogaert P, Bubak M, Claerhout B, Clarke L, De Meulder B, D'Errico G, Di Meglio A, Forgo N, Gans-Combe C, Gray AE, Gut I, Gyllenberg A, Hemmrich-Stanisak G, Hjorth L, Ioannidis Y, Jarmalaite S, Kel A, Kherif F, Korbel JO, Larue C, Laszlo M, Maas A, Magalhaes L, Manneh-Vangramberen I, Morley-Fletcher E, Ohmann C, Oksvold P, Oxtoby NP, Perseil I, Pezoulas V, Riess O, Riper H, Roca J, Rosenstiel P, Sabatier P, Sanz F, Tayeb M, Thomassen G, Van Bussel J, Van den Bulcke M, Van Oyen H. Towards a European health research and innovation cloud (HRIC). Genome Med 2020; 12:18. [PMID: 32075696 PMCID: PMC7029532 DOI: 10.1186/s13073-020-0713-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Accepted: 01/29/2020] [Indexed: 12/21/2022] Open
Abstract
The European Union (EU) initiative on the Digital Transformation of Health and Care (Digicare) aims to provide the conditions necessary for building a secure, flexible, and decentralized digital health infrastructure. Creating a European Health Research and Innovation Cloud (HRIC) within this environment should enable data sharing and analysis for health research across the EU, in compliance with data protection legislation while preserving the full trust of the participants. Such a HRIC should learn from and build on existing data infrastructures, integrate best practices, and focus on the concrete needs of the community in terms of technologies, governance, management, regulation, and ethics requirements. Here, we describe the vision and expected benefits of digital data sharing in health research activities and present a roadmap that fosters the opportunities while answering the challenges of implementing a HRIC. For this, we put forward five specific recommendations and action points to ensure that a European HRIC: i) is built on established standards and guidelines, providing cloud technologies through an open and decentralized infrastructure; ii) is developed and certified to the highest standards of interoperability and data security that can be trusted by all stakeholders; iii) is supported by a robust ethical and legal framework that is compliant with the EU General Data Protection Regulation (GDPR); iv) establishes a proper environment for the training of new generations of data and medical scientists; and v) stimulates research and innovation in transnational collaborations through public and private initiatives and partnerships funded by the EU through Horizon 2020 and Horizon Europe.
Collapse
Affiliation(s)
- F M Aarestrup
- Technical University of Denmark, Kongens Lyngby, Denmark
| | - A Albeyatti
- Medicalchain, York Road, London, SQ1 7NQ, UK.,National Health Service, London, UK
| | - W J Armitage
- Translation Health Sciences, Bristol Medical School, Bristol, BS81UD, UK
| | - C Auffray
- European Institute for Systems Biology and Medicine (EISBM), Vourles, France.
| | - L Augello
- Regional Agency for Innovation & Procurement (ARIA), Welfare Services Division, Lombardy, Milan, Italy
| | - R Balling
- Luxembourg Centre for Systems Biomedicine, Campus Belval, University of Luxembourg, Luxembourg City, Luxembourg
| | - N Benhabiles
- CEA, French Atomic Energy and Alternative Energy Commission, Direction de la Recherche Fondamentale, Université Paris-Saclay, F-91191, Gif-sur-Yvette, France.
| | - G Bertolini
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Bergamo, Italy
| | - J G Bjaalie
- Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - M Black
- Ulster University, Belfast, BT15 1ED, UK
| | - N Blomberg
- ELIXIR, Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - P Bogaert
- Sciensano, Brussels, Belgium and Tilburg University, Tilburg, The Netherlands
| | - M Bubak
- Department of Computer Science and Academic Computing Center Cyfronet, Akademia Gornizco Hutnizca University of Science and Technology, Krakow, Poland
| | | | - L Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - B De Meulder
- European Institute for Systems Biology and Medicine (EISBM), Vourles, France
| | - G D'Errico
- Fondazione Toscana Life Sciences, 53100, Siena, Italy
| | - A Di Meglio
- CERN, European Organization for Nuclear Research, Meyrin, Switzerland
| | - N Forgo
- University of Vienna, Vienna, Austria
| | - C Gans-Combe
- INSEEC School of Business & Economics, Paris, France
| | - A E Gray
- PwC, Dronning Eufemiasgate, N-0191, Oslo, Norway
| | - I Gut
- Center for Genomic Regulations, Barcelona, Spain
| | - A Gyllenberg
- Neuroimmunology Unit, The Karolinska Neuroimmunology & Multiple Sclerosis Centre, Department of Clinical Neuroscience, Karolinska Institute, Stockholm, Sweden
| | - G Hemmrich-Stanisak
- Institute of Clinical Molecular Biology, Kiel University and University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - L Hjorth
- Department of Clinical Sciences, Pediatrics, Lund University, Skåne University Hospital, Lund, Sweden
| | - Y Ioannidis
- Athena Research & Innovation Center and University of Athens, Athens, Greece
| | | | - A Kel
- geneXplain GmbH, Wolfenbüttel, Germany
| | - F Kherif
- Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - J O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany.
| | - C Larue
- Integrated Biobank of Luxembourg, Rue Louis Rech, L-3555, Dudelange, Luxembourg
| | | | - A Maas
- Antwerp University Hospital and University of Antwerp, Edegem, Belgium
| | - L Magalhaes
- Clinerion Ltd, Elisabethenanlage, 4051, Basel, Switzerland
| | - I Manneh-Vangramberen
- European Cancer Patient Coalition, Rue de Montoyer/Montoyerstraat, B-1000, Brussels, Belgium
| | - E Morley-Fletcher
- Lynkeus, Via Livenza, 00198, Rome, Italy.,Public Policy Consultant, Rome, Italy
| | - C Ohmann
- European Clinical Research Infrastructure Network, Heinrich-Heine-Universität, Düsseldorf, Germany
| | - P Oksvold
- Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden
| | - N P Oxtoby
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - I Perseil
- Information Technology Department, Institut National de la Santé et de la Recherche Médicale, Paris, France
| | - V Pezoulas
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, Greece
| | - O Riess
- Institute of Medical Genetics and Applied Genomics, Rare Disease Center, Tübingen, Germany
| | - H Riper
- Section Clinical, Neuro and Developmental Psychology, Department of Behavioural and Movement Sciences, Vrije Universiteit, Amsterdam, The Netherlands
| | - J Roca
- Hospital Clínic de Barcelona, IDIBAPS, University of Barcelona, Barcelona, Spain
| | - P Rosenstiel
- Institute of Clinical Molecular Biology, Kiel University and University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - P Sabatier
- French National Centre for Scientific Research, Grenoble, France
| | - F Sanz
- Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| | - M Tayeb
- Medicalchain, York Road, London, SQ1 7NQ, UK.,National Health Service, London, UK
| | | | - J Van Bussel
- Scientific Institute of Public Health, Brussels, Belgium
| | | | - H Van Oyen
- Department of Computer Science and Academic Computing Center Cyfronet, Akademia Gornizco Hutnizca University of Science and Technology, Krakow, Poland.,Sciensano, Juliette Wystmanstraat, 1050, Brussels, Belgium
| |
Collapse
|
39
|
Navale V, Ji M, Vovk O, Misquitta L, Gebremichael T, Garcia A, Fann Y, McAuliffe M. Development of an informatics system for accelerating biomedical research. F1000Res 2019; 8:1430. [PMID: 32760576 PMCID: PMC7376384 DOI: 10.12688/f1000research.19161.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/08/2020] [Indexed: 01/04/2023] Open
Abstract
The Biomedical Research Informatics Computing System (BRICS) was developed to support multiple disease-focused research programs. Seven service modules are integrated together to provide a collaborative and extensible web-based environment. The modules-Data Dictionary, Account Management, Query Tool, Protocol and Form Research Management System, Meta Study, Data Repository and Globally Unique Identifier -facilitate the management of research protocols, to submit, process, curate, access and store clinical, imaging, and derived genomics data within the associated data repositories. Multiple instances of BRICS are deployed to support various biomedical research communities focused on accelerating discoveries for rare diseases, Traumatic Brain Injury, Parkinson's Disease, inherited eye diseases and symptom science research. No Personally Identifiable Information is stored within the data repositories. Digital Object Identifiers are associated with the research studies. Reusability of biomedical data is enhanced by Common Data Elements (CDEs) which enable systematic collection, analysis and sharing of data. The use of CDEs with a service-oriented informatics architecture enabled the development of disease-specific repositories that support hypothesis-based biomedical research.
Collapse
Affiliation(s)
- Vivek Navale
- Office of Intramural Research, Center for Information Technology, National Institutes of Health, USA, Bethesda, Maryland, 20892, USA
| | - Michele Ji
- Office of Intramural Research, Center for Information Technology, National Institutes of Health, USA, Bethesda, Maryland, 20892, USA
| | - Olga Vovk
- General Dynamics Information Technology, Inc., Fairfax, Virginia, 22030, USA
| | | | | | - Alison Garcia
- Sapient Government Services, Arlington, Virginia, 22201, USA
| | - Yang Fann
- Intramural IT and Bioinformatics Program, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, 20892, USA
| | - Matthew McAuliffe
- Office of Intramural Research, Center for Information Technology, National Institutes of Health, USA, Bethesda, Maryland, 20892, USA
| |
Collapse
|
40
|
Crawford DC, Cooke Bailey JN, Briggs FBS. Mind the gap: resources required to receive, process and interpret research-returned whole genome data. Hum Genet 2019; 138:691-701. [PMID: 31161416 PMCID: PMC6767905 DOI: 10.1007/s00439-019-02033-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 05/27/2019] [Indexed: 12/17/2022]
Abstract
Most genotype-phenotype studies have historically lacked population diversity, impacting the generalizability of findings and thereby limiting the ability to equitably implement precision medicine. This well-documented problem has generated much interest in the ascertainment of new cohorts with an emphasis on multiple dimensions of diversity, including race/ethnicity, gender, age, socioeconomic status, disability, and geography. The most well known of these new cohort efforts is arguably All of Us, formerly known as the Precision Medicine Cohort Initiative Program. All of Us intends to ascertain at least one million participants in the United States representative of the multiple dimensions of diversity. As an incentive to participate, All of Us is offering the return of research results, including whole genome sequencing data, as well as the opportunity to contribute to the scientific process as non-scientists. The scale and scope of the proposed return of research results are unprecedented. Here, we briefly review possible return of genetic data models, including the likely data file formats and modes of data transfer or access. We also review the resources required to access and interpret the genetic or genomic data once received by the average participant, highlighting the nuanced anticipated barriers that will challenge both the digitally, computationally literate and illiterate participant alike. This inventory of resources required to receive, process, and interpret return of research results exposes the potential for access disparities and warns the scientific community to mind the gap so that all participants have equal access and understanding of the benefits of human genetic research.
Collapse
Affiliation(s)
- Dana C Crawford
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA.
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA.
- Cleveland Institute for Computational Biology, Case Western Reserve University, 2103 Cornell Road. Wolstein Research Building, Suite 2-527, Cleveland, OH, 44106, USA.
| | - Jessica N Cooke Bailey
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA
- Cleveland Institute for Computational Biology, Case Western Reserve University, 2103 Cornell Road. Wolstein Research Building, Suite 2-527, Cleveland, OH, 44106, USA
| | - Farren B S Briggs
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA
- Cleveland Institute for Computational Biology, Case Western Reserve University, 2103 Cornell Road. Wolstein Research Building, Suite 2-527, Cleveland, OH, 44106, USA
| |
Collapse
|
41
|
Bah SY, Morang'a CM, Kengne-Ouafo JA, Amenga-Etego L, Awandare GA. Highlights on the Application of Genomics and Bioinformatics in the Fight Against Infectious Diseases: Challenges and Opportunities in Africa. Front Genet 2018; 9:575. [PMID: 30538723 PMCID: PMC6277583 DOI: 10.3389/fgene.2018.00575] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 11/08/2018] [Indexed: 01/18/2023] Open
Abstract
Genomics and bioinformatics are increasingly contributing to our understanding of infectious diseases caused by bacterial pathogens such as Mycobacterium tuberculosis and parasites such as Plasmodium falciparum. This ranges from investigations of disease outbreaks and pathogenesis, host and pathogen genomic variation, and host immune evasion mechanisms to identification of potential diagnostic markers and vaccine targets. High throughput genomics data generated from pathogens and animal models can be combined with host genomics and patients’ health records to give advice on treatment options as well as potential drug and vaccine interactions. However, despite accounting for the highest burden of infectious diseases, Africa has the lowest research output on infectious disease genomics. Here we review the contributions of genomics and bioinformatics to the management of infectious diseases of serious public health concern in Africa including tuberculosis (TB), dengue fever, malaria and filariasis. Furthermore, we discuss how genomics and bioinformatics can be applied to identify drug and vaccine targets. We conclude by identifying challenges to genomics research in Africa and highlighting how these can be overcome where possible.
Collapse
Affiliation(s)
- Saikou Y Bah
- West African Centre for Cell Biology of Infectious Pathogens, University of Ghana, Accra, Ghana.,Vaccine and Immunity Theme, MRC Unit The Gambia at London School of Hygiene & Tropical Medicine, Banjul, Gambia
| | - Collins Misita Morang'a
- West African Centre for Cell Biology of Infectious Pathogens, University of Ghana, Accra, Ghana
| | - Jonas A Kengne-Ouafo
- West African Centre for Cell Biology of Infectious Pathogens, University of Ghana, Accra, Ghana
| | - Lucas Amenga-Etego
- West African Centre for Cell Biology of Infectious Pathogens, University of Ghana, Accra, Ghana
| | - Gordon A Awandare
- West African Centre for Cell Biology of Infectious Pathogens, University of Ghana, Accra, Ghana
| |
Collapse
|
42
|
Abstract
Genomics and molecular imaging, along with clinical and translational research have transformed biomedical science into a data-intensive scientific endeavor. For researchers to benefit from Big Data sets, developing long-term biomedical digital data preservation strategy is very important. In this opinion article, we discuss specific actions that researchers and institutions can take to make research data a continued resource even after research projects have reached the end of their lifecycle. The actions involve utilizing an Open Archival Information System model comprised of six functional entities: Ingest, Access, Data Management, Archival Storage, Administration and Preservation Planning. We believe that involvement of data stewards early in the digital data life-cycle management process can significantly contribute towards long term preservation of biomedical data. Developing data collection strategies consistent with institutional policies, and encouraging the use of common data elements in clinical research, patient registries and other human subject research can be advantageous for data sharing and integration purposes. Specifically, data stewards at the onset of research program should engage with established repositories and curators to develop data sustainability plans for research data. Placing equal importance on the requirements for initial activities (e.g., collection, processing, storage) with subsequent activities (data analysis, sharing) can improve data quality, provide traceability and support reproducibility. Preparing and tracking data provenance, using common data elements and biomedical ontologies are important for standardizing the data description, making the interpretation and reuse of data easier. The Big Data biomedical community requires scalable platform that can support the diversity and complexity of data ingest modes (e.g. machine, software or human entry modes). Secure virtual workspaces to integrate and manipulate data, with shared software programs (e.g., bioinformatics tools), can facilitate the FAIR (Findable, Accessible, Interoperable and Reusable) use of data for near- and long-term research needs.
Collapse
Affiliation(s)
- Vivek Navale
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland, 20892, USA
| | - Matthew McAuliffe
- Center for Information Technology, National Institutes of Health, Bethesda, Maryland, 20892, USA
| |
Collapse
|
43
|
Mura C, Draizen EJ, Bourne PE. Structural biology meets data science: does anything change? Curr Opin Struct Biol 2018; 52:95-102. [PMID: 30267935 DOI: 10.1016/j.sbi.2018.09.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 08/31/2018] [Accepted: 09/07/2018] [Indexed: 01/22/2023]
Abstract
Data science has emerged from the proliferation of digital data, coupled with advances in algorithms, software and hardware (e.g., GPU computing). Innovations in structural biology have been driven by similar factors, spurring us to ask: can these two fields impact one another in deep and hitherto unforeseen ways? We posit that the answer is yes. New biological knowledge lies in the relationships between sequence, structure, function and disease, all of which play out on the stage of evolution, and data science enables us to elucidate these relationships at scale. Here, we consider the above question from the five key pillars of data science: acquisition, engineering, analytics, visualization and policy, with an emphasis on machine learning as the premier analytics approach.
Collapse
Affiliation(s)
- Cameron Mura
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Eli J Draizen
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Philip E Bourne
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA; Data Science Institute, University of Virginia, Charlottesville, VA 22904, USA.
| |
Collapse
|