1
|
Welten S, Weber S, Holt A, Beyan O, Decker S. Will it run?-A proof of concept for smoke testing decentralized data analytics experiments. Front Med (Lausanne) 2024; 10:1305415. [PMID: 38259836 PMCID: PMC10801058 DOI: 10.3389/fmed.2023.1305415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 12/14/2023] [Indexed: 01/24/2024] Open
Abstract
The growing interest in data-driven medicine, in conjunction with the formation of initiatives such as the European Health Data Space (EHDS) has demonstrated the need for methodologies that are capable of facilitating privacy-preserving data analysis. Distributed Analytics (DA) as an enabler for privacy-preserving analysis across multiple data sources has shown its potential to support data-intensive research. However, the application of DA creates new challenges stemming from its distributed nature, such as identifying single points of failure (SPOFs) in DA tasks before their actual execution. Failing to detect such SPOFs can, for example, result in improper termination of the DA code, necessitating additional efforts from multiple stakeholders to resolve the malfunctions. Moreover, these malfunctions disrupt the seamless conduct of DA and entail several crucial consequences, including technical obstacles to resolve the issues, potential delays in research outcomes, and increased costs. In this study, we address this challenge by introducing a concept based on a method called Smoke Testing, an initial and foundational test run to ensure the operability of the analysis code. We review existing DA platforms and systematically extract six specific Smoke Testing criteria for DA applications. With these criteria in mind, we create an interactive environment called Development Environment for AuTomated and Holistic Smoke Testing of Analysis-Runs (DEATHSTAR), which allows researchers to perform Smoke Tests on their DA experiments. We conduct a user-study with 29 participants to assess our environment and additionally apply it to three real use cases. The results of our evaluation validate its effectiveness, revealing that 96.6% of the analyses created and (Smoke) tested by participants using our approach successfully terminated without any errors. Thus, by incorporating Smoke Testing as a fundamental method, our approach helps identify potential malfunctions early in the development process, ensuring smoother data-driven research within the scope of DA. Through its flexibility and adaptability to diverse real use cases, our solution enables more robust and efficient development of DA experiments, which contributes to their reliability.
Collapse
Affiliation(s)
- Sascha Welten
- Chair of Computer Science 5, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen University, Aachen, Germany
| | - Sven Weber
- Chair of Computer Science 5, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen University, Aachen, Germany
- Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Adrian Holt
- Chair of Computer Science 5, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen University, Aachen, Germany
| | - Oya Beyan
- Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Fraunhofer Institute for Applied Information Technology FIT, St. Augustin, Germany
| | - Stefan Decker
- Institute for Biomedical Informatics, Faculty of Medicine and University Hospital Cologne, University of Cologne, Cologne, Germany
- Fraunhofer Institute for Applied Information Technology FIT, St. Augustin, Germany
| |
Collapse
|
2
|
A Review of the Role and Challenges of Big Data in Healthcare Informatics and Analytics. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5317760. [PMID: 36210978 PMCID: PMC9536942 DOI: 10.1155/2022/5317760] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 07/02/2022] [Accepted: 07/05/2022] [Indexed: 11/17/2022]
Abstract
Healthcare has evolved with the development of technology to improve the quality of life and save lives. Today, big data is considered as one of the most essential and promising future technology areas and has been attracting the medical community's attention. As a result of big data, we can improve patient outcomes, personalize care, improve relationships between the patient and the provider, and decrease hospital costs. The effect of big data is very large since medical societies are known for their size, diversity of complexity, and a high degree of dynamism. Big data has been discussed from different viewpoints in recent years, protecting its involvement in many aspects, specifically those related to the healthcare system. Assembling health information, sharing data, and integrating health are essential in spreading health care. In addition, the security and privacy of data are critical since the data must be accessed from multiple locations within the distributed system. This paper review aims to understand the role of big data in healthcare issues aggregating data and the challenges associated with big data in healthcare. The papers that have been selected for review are from last year's research.
Collapse
|
3
|
Yogesh MJ, Karthikeyan J. Health Informatics: Engaging Modern Healthcare Units: A Brief Overview. Front Public Health 2022; 10:854688. [PMID: 35570921 PMCID: PMC9099090 DOI: 10.3389/fpubh.2022.854688] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open
Abstract
In the current scenario, with a large amount of unstructured data, Health Informatics is gaining traction, allowing Healthcare Units to leverage and make meaningful insights for doctors and decision-makers with relevant information to scale operations and predict the future view of treatments via Information Systems Communication. Now, around the world, massive amounts of data are being collected and analyzed for better patient diagnosis and treatment, improving public health systems and assisting government agencies in designing and implementing public health policies, instilling confidence in future generations who want to use better public health systems. This article provides an overview of the HL7 FHIR Architecture, including the workflow state, linkages, and various informatics approaches used in healthcare units. The article discusses future trends and directions in Health Informatics for successful application to provide public health safety. With the advancement of technology, healthcare units face new issues that must be addressed with appropriate adoption policies and standards.
Collapse
Affiliation(s)
- M. J. Yogesh
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | | |
Collapse
|
4
|
Lin Y, Zhao X, Miao Z, Ling Z, Wei X, Pu J, Hou J, Shen B. Data-driven translational prostate cancer research: from biomarker discovery to clinical decision. J Transl Med 2020; 18:119. [PMID: 32143723 PMCID: PMC7060655 DOI: 10.1186/s12967-020-02281-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 02/26/2020] [Indexed: 02/08/2023] Open
Abstract
Prostate cancer (PCa) is a common malignant tumor with increasing incidence and high heterogeneity among males worldwide. In the era of big data and artificial intelligence, the paradigm of biomarker discovery is shifting from traditional experimental and small data-based identification toward big data-driven and systems-level screening. Complex interactions between genetic factors and environmental effects provide opportunities for systems modeling of PCa genesis and evolution. We hereby review the current research frontiers in informatics for PCa clinical translation. First, the heterogeneity and complexity in PCa development and clinical theranostics are introduced to raise the concern for PCa systems biology studies. Then biomarkers and risk factors ranging from molecular alternations to clinical phenotype and lifestyle changes are explicated for PCa personalized management. Methodologies and applications for multi-dimensional data integration and computational modeling are discussed. The future perspectives and challenges for PCa systems medicine and holistic healthcare are finally provided.
Collapse
Affiliation(s)
- Yuxin Lin
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Xiaojun Zhao
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Zhijun Miao
- Department of Urology, Suzhou Dushuhu Public Hospital, Suzhou, 215123, China
| | - Zhixin Ling
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Xuedong Wei
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Jinxian Pu
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215006, China
| | - Jianquan Hou
- Department of Urology, The First Affiliated Hospital of Soochow University, Suzhou, 215006, China.
| | - Bairong Shen
- Institutes for Systems Genetics, West China Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
5
|
Aarestrup FM, Albeyatti A, Armitage WJ, Auffray C, Augello L, Balling R, Benhabiles N, Bertolini G, Bjaalie JG, Black M, Blomberg N, Bogaert P, Bubak M, Claerhout B, Clarke L, De Meulder B, D’Errico G, Di Meglio A, Forgo N, Gans-Combe C, Gray AE, Gut I, Gyllenberg A, Hemmrich-Stanisak G, Hjorth L, Ioannidis Y, Jarmalaite S, Kel A, Kherif F, Korbel JO, Larue C, Laszlo M, Maas A, Magalhaes L, Manneh-Vangramberen I, Morley-Fletcher E, Ohmann C, Oksvold P, Oxtoby NP, Perseil I, Pezoulas V, Riess O, Riper H, Roca J, Rosenstiel P, Sabatier P, Sanz F, Tayeb M, Thomassen G, Van Bussel J, Van den Bulcke M, Van Oyen H. Towards a European health research and innovation cloud (HRIC). Genome Med 2020; 12:18. [PMID: 32075696 PMCID: PMC7029532 DOI: 10.1186/s13073-020-0713-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2019] [Accepted: 01/29/2020] [Indexed: 12/21/2022] Open
Abstract
The European Union (EU) initiative on the Digital Transformation of Health and Care (Digicare) aims to provide the conditions necessary for building a secure, flexible, and decentralized digital health infrastructure. Creating a European Health Research and Innovation Cloud (HRIC) within this environment should enable data sharing and analysis for health research across the EU, in compliance with data protection legislation while preserving the full trust of the participants. Such a HRIC should learn from and build on existing data infrastructures, integrate best practices, and focus on the concrete needs of the community in terms of technologies, governance, management, regulation, and ethics requirements. Here, we describe the vision and expected benefits of digital data sharing in health research activities and present a roadmap that fosters the opportunities while answering the challenges of implementing a HRIC. For this, we put forward five specific recommendations and action points to ensure that a European HRIC: i) is built on established standards and guidelines, providing cloud technologies through an open and decentralized infrastructure; ii) is developed and certified to the highest standards of interoperability and data security that can be trusted by all stakeholders; iii) is supported by a robust ethical and legal framework that is compliant with the EU General Data Protection Regulation (GDPR); iv) establishes a proper environment for the training of new generations of data and medical scientists; and v) stimulates research and innovation in transnational collaborations through public and private initiatives and partnerships funded by the EU through Horizon 2020 and Horizon Europe.
Collapse
Affiliation(s)
| | - A. Albeyatti
- Medicalchain, York Road, London, SQ1 7NQ UK
- National Health Service, London, UK
| | - W. J. Armitage
- Translation Health Sciences, Bristol Medical School, Bristol, BS81UD UK
| | - C. Auffray
- European Institute for Systems Biology and Medicine (EISBM), Vourles, France
| | - L. Augello
- Regional Agency for Innovation & Procurement (ARIA), Welfare Services Division, Lombardy, Milan, Italy
| | - R. Balling
- Luxembourg Centre for Systems Biomedicine, Campus Belval, University of Luxembourg, Luxembourg City, Luxembourg
| | - N. Benhabiles
- CEA, French Atomic Energy and Alternative Energy Commission, Direction de la Recherche Fondamentale, Université Paris-Saclay, F-91191 Gif-sur-Yvette, France
| | - G. Bertolini
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Bergamo, Italy
| | - J. G. Bjaalie
- Institute of Basic Medical Sciences, University of Oslo, Oslo, Norway
| | - M. Black
- Ulster University, Belfast, BT15 1ED UK
| | - N. Blomberg
- ELIXIR, Welcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - P. Bogaert
- Sciensano, Brussels, Belgium and Tilburg University, Tilburg, The Netherlands
| | - M. Bubak
- Department of Computer Science and Academic Computing Center Cyfronet, Akademia Gornizco Hutnizca University of Science and Technology, Krakow, Poland
| | | | - L. Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - B. De Meulder
- Translation Health Sciences, Bristol Medical School, Bristol, BS81UD UK
| | - G. D’Errico
- Fondazione Toscana Life Sciences, 53100 Siena, Italy
| | - A. Di Meglio
- CERN, European Organization for Nuclear Research, Meyrin, Switzerland
| | - N. Forgo
- University of Vienna, Vienna, Austria
| | - C. Gans-Combe
- INSEEC School of Business & Economics, Paris, France
| | - A. E. Gray
- PwC, Dronning Eufemiasgate, N-0191 Oslo, Norway
| | - I. Gut
- Center for Genomic Regulations, Barcelona, Spain
| | - A. Gyllenberg
- Neuroimmunology Unit, The Karolinska Neuroimmunology & Multiple Sclerosis Centre, Department of Clinical Neuroscience, Karolinska Institute, Stockholm, Sweden
| | - G. Hemmrich-Stanisak
- Institute of Clinical Molecular Biology, Kiel University and University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - L. Hjorth
- Department of Clinical Sciences, Pediatrics, Lund University, Skåne University Hospital, Lund, Sweden
| | - Y. Ioannidis
- Athena Research & Innovation Center and University of Athens, Athens, Greece
| | | | - A. Kel
- geneXplain GmbH, Wolfenbüttel, Germany
| | - F. Kherif
- Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland
| | - J. O. Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - C. Larue
- Integrated Biobank of Luxembourg, Rue Louis Rech, L-3555 Dudelange, Luxembourg
| | | | - A. Maas
- Antwerp University Hospital and University of Antwerp, Edegem, Belgium
| | - L. Magalhaes
- Clinerion Ltd, Elisabethenanlage, 4051 Basel, Switzerland
| | - I. Manneh-Vangramberen
- European Cancer Patient Coalition, Rue de Montoyer/Montoyerstraat, B-1000 Brussels, Belgium
| | - E. Morley-Fletcher
- Lynkeus, Via Livenza, 00198 Rome, Italy
- Public Policy Consultant, Rome, Italy
| | - C. Ohmann
- European Clinical Research Infrastructure Network, Heinrich-Heine-Universität, Düsseldorf, Germany
| | - P. Oksvold
- Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden
| | - N. P. Oxtoby
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - I. Perseil
- Information Technology Department, Institut National de la Santé et de la Recherche Médicale, Paris, France
| | - V. Pezoulas
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, Greece
| | - O. Riess
- Institute of Medical Genetics and Applied Genomics, Rare Disease Center, Tübingen, Germany
| | - H. Riper
- Section Clinical, Neuro and Developmental Psychology, Department of Behavioural and Movement Sciences, Vrije Universiteit, Amsterdam, The Netherlands
| | - J. Roca
- Hospital Clínic de Barcelona, IDIBAPS, University of Barcelona, Barcelona, Spain
| | - P. Rosenstiel
- Institute of Clinical Molecular Biology, Kiel University and University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - P. Sabatier
- French National Centre for Scientific Research, Grenoble, France
| | - F. Sanz
- Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Barcelona, Spain
| | - M. Tayeb
- Medicalchain, York Road, London, SQ1 7NQ UK
- National Health Service, London, UK
| | | | - J. Van Bussel
- Scientific Institute of Public Health, Brussels, Belgium
| | | | - H. Van Oyen
- Department of Computer Science and Academic Computing Center Cyfronet, Akademia Gornizco Hutnizca University of Science and Technology, Krakow, Poland
- Sciensano, Juliette Wystmanstraat, 1050 Brussels, Belgium
| |
Collapse
|
6
|
Shen B, Lin Y, Bi C, Zhou S, Bai Z, Zheng G, Zhou J. Translational Informatics for Parkinson's Disease: from Big Biomedical Data to Small Actionable Alterations. GENOMICS, PROTEOMICS & BIOINFORMATICS 2019; 17:415-429. [PMID: 31786313 PMCID: PMC6943761 DOI: 10.1016/j.gpb.2018.10.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 08/29/2018] [Accepted: 11/02/2018] [Indexed: 02/05/2023]
Abstract
Parkinson's disease (PD) is a common neurological disease in elderly people, and its morbidity and mortality are increasing with the advent of global ageing. The traditional paradigm of moving from small data to big data in biomedical research is shifting toward big data-based identification of small actionable alterations. To highlight the use of big data for precision PD medicine, we review PD big data and informatics for the translation of basic PD research to clinical applications. We emphasize some key findings in clinically actionable changes, such as susceptibility genetic variations for PD risk population screening, biomarkers for the diagnosis and stratification of PD patients, risk factors for PD, and lifestyles for the prevention of PD. The challenges associated with the collection, storage, and modelling of diverse big data for PD precision medicine and healthcare are also summarized. Future perspectives on systems modelling and intelligent medicine for PD monitoring, diagnosis, treatment, and healthcare are discussed in the end.
Collapse
Affiliation(s)
- Bairong Shen
- Institutes for Systems Genetics, West China Hospital, Sichuan University, Chengdu 610041, China.
| | - Yuxin Lin
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Cheng Bi
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Shengrong Zhou
- Center for Systems Biology, Soochow University, Suzhou 215006, China
| | - Zhongchen Bai
- Center for Translational Biomedical Informatics, Guizhou University School of Medicine, Guiyang 550025, China
| | - Guangmin Zheng
- Center for Translational Biomedical Informatics, Guizhou University School of Medicine, Guiyang 550025, China
| | - Jing Zhou
- Center for Translational Biomedical Informatics, Guizhou University School of Medicine, Guiyang 550025, China
| |
Collapse
|
7
|
Parallel MapReduce: Maximizing Cloud Resource Utilization and Performance Improvement Using Parallel Execution Strategies. BIOMED RESEARCH INTERNATIONAL 2018; 2018:7501042. [PMID: 30417014 PMCID: PMC6207866 DOI: 10.1155/2018/7501042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 09/30/2018] [Indexed: 01/05/2023]
Abstract
MapReduce is the preferred cloud computing framework used in large data analysis and application processing. MapReduce frameworks currently in place suffer performance degradation due to the adoption of sequential processing approaches with little modification and thus exhibit underutilization of cloud resources. To overcome this drawback and reduce costs, we introduce a Parallel MapReduce (PMR) framework in this paper. We design a novel parallel execution strategy of Map and Reduce worker nodes. Our strategy enables further performance improvement and efficient utilization of cloud resources execution of Map and Reduce functions to utilize multicore environments available with computing nodes. We explain in detail makespan modeling and working principle of the PMR framework in the paper. Performance of PMR is compared with Hadoop through experiments considering three biomedical applications. Experiments conducted for BLAST, CAP3, and DeepBind biomedical applications report makespan time reduction of 38.92%, 18.00%, and 34.62% considering the PMR framework against Hadoop framework. Experiments' results prove that the PMR cloud computing platform proposed is robust, cost-effective, and scalable, which sufficiently supports diverse applications on public and private cloud platforms. Consequently, overall presentation and results indicate that there is good matching between theoretical makespan modeling presented and experimental values investigated.
Collapse
|
8
|
Textoris J, Taccone FS, Zafrani L, Guillon A, Gibot S, Uhel F, Azabou E, Monneret G, Pène F, de Prost N, Silva S. Data-driving methods: More than merely trendy buzzwords? Ann Intensive Care 2018; 8:58. [PMID: 29721786 PMCID: PMC5931952 DOI: 10.1186/s13613-018-0405-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 04/23/2018] [Indexed: 11/10/2022] Open
Affiliation(s)
- Julien Textoris
- Département d'Anesthésie-Réanimation, hôpital Édouard-Herriot, Hospices Civils de Lyon, CHU de Lyon, 69437, Lyon, France
| | | | - Lara Zafrani
- Service de Réanimation Médicale, APHP Hôpital Saint-Louis, Paris, France
| | - Antoine Guillon
- Service de Médecine Intensive - Réanimation, CHU de Tours, 37000, Tours, France
| | - Sébastien Gibot
- Service de Réanimation Médicale, Hôpital Central, CHU de Nancy, 54000, Nancy, France
| | - Fabrice Uhel
- Service de Réanimation Médicale et Maladies Infectieuses, Hôpital Pontchaillou, CHU de Rennes, Rennes, France
| | - Eric Azabou
- Service de Réanimation, APHP Hôpital Raymond Poincaré, Garches, 92380, Paris, France
| | - Guillaume Monneret
- Laboratoire d'immunologie, hôpital Edouard Herriot, Hospices Civils de Lyon, CHU de Lyon, 69437, Lyon, France
| | - Frédéric Pène
- Service de Réanimation Médicale, APHP, Hôpital Cochin, Paris, France
| | - Nicolas de Prost
- Service de Réanimation Médicale, Hôpital Henri Mondor, 51, Avenue du Maréchal de Lattre de Tassigny, 94010, Créteil Cedex, France.
| | - Stein Silva
- Service de Réanimation, CHU Purpan, 31300, Toulouse, France
| | | |
Collapse
|
9
|
Phillips KA, Trosman JR, Kelley RK, Pletcher MJ, Douglas MP, Weldon CB. Genomic sequencing: assessing the health care system, policy, and big-data implications. Health Aff (Millwood) 2016; 33:1246-53. [PMID: 25006153 DOI: 10.1377/hlthaff.2014.0020] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
New genomic sequencing technologies enable the high-speed analysis of multiple genes simultaneously, including all of those in a person's genome. Sequencing is a prominent example of a "big data" technology because of the massive amount of information it produces and its complexity, diversity, and timeliness. Our objective in this article is to provide a policy primer on sequencing and illustrate how it can affect health care system and policy issues. Toward this end, we developed an easily applied classification of sequencing based on inputs, methods, and outputs. We used it to examine the implications of sequencing for three health care system and policy issues: making care more patient-centered, developing coverage and reimbursement policies, and assessing economic value. We conclude that sequencing has great promise but that policy challenges include how to optimize patient engagement as well as privacy, develop coverage policies that distinguish research from clinical uses and account for bioinformatics costs, and determine the economic value of sequencing through complex economic models that take into account multiple findings and downstream costs.
Collapse
Affiliation(s)
- Kathryn A Phillips
- Kathryn A. Phillips is a professor in the Center for Translational and Policy Research on Personalized Medicine (TRANSPERS), the Department of Clinical Pharmacy, the Philip R. Lee Institute for Health Policy, and the Helen Diller Family Comprehensive Cancer Center, all at the University of California, San Francisco (UCSF)
| | - Julia R Trosman
- Julia R. Trosman is codirector of the Center for Business Models in Healthcare, in Chicago, Illinois, and an adjunct faculty member in the Department of Clinical Pharmacy, UCSF
| | - Robin K Kelley
- Robin K. Kelley is an assistant clinical professor in the Department of Medicine, Division of Hematology/Oncology, UCSF
| | - Mark J Pletcher
- Mark J. Pletcher is an associate professor in the Department of Epidemiology and Biostatistics and the Department of Medicine, UCSF
| | - Michael P Douglas
- Michael P. Douglas is a program manager in TRANSPERS and the Department of Clinical Pharmacy, UCSF
| | - Christine B Weldon
- Christine B. Weldon is codirector of the Center for Business Models in Healthcare and an adjunct faculty member in the Feinberg School of Medicine, Northwestern University, in Chicago
| |
Collapse
|
10
|
Auffray C, Balling R, Barroso I, Bencze L, Benson M, Bergeron J, Bernal-Delgado E, Blomberg N, Bock C, Conesa A, Del Signore S, Delogne C, Devilee P, Di Meglio A, Eijkemans M, Flicek P, Graf N, Grimm V, Guchelaar HJ, Guo YK, Gut IG, Hanbury A, Hanif S, Hilgers RD, Honrado Á, Hose DR, Houwing-Duistermaat J, Hubbard T, Janacek SH, Karanikas H, Kievits T, Kohler M, Kremer A, Lanfear J, Lengauer T, Maes E, Meert T, Müller W, Nickel D, Oledzki P, Pedersen B, Petkovic M, Pliakos K, Rattray M, I Màs JR, Schneider R, Sengstag T, Serra-Picamal X, Spek W, Vaas LAI, van Batenburg O, Vandelaer M, Varnai P, Villoslada P, Vizcaíno JA, Wubbe JPM, Zanetti G. Making sense of big data in health research: Towards an EU action plan. Genome Med 2016; 8:71. [PMID: 27338147 PMCID: PMC4919856 DOI: 10.1186/s13073-016-0323-y] [Citation(s) in RCA: 124] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Medicine and healthcare are undergoing profound changes. Whole-genome sequencing and high-resolution imaging technologies are key drivers of this rapid and crucial transformation. Technological innovation combined with automation and miniaturization has triggered an explosion in data production that will soon reach exabyte proportions. How are we going to deal with this exponential increase in data production? The potential of "big data" for improving health is enormous but, at the same time, we face a wide range of challenges to overcome urgently. Europe is very proud of its cultural diversity; however, exploitation of the data made available through advances in genomic medicine, imaging, and a wide range of mobile health applications or connected devices is hampered by numerous historical, technical, legal, and political barriers. European health systems and databases are diverse and fragmented. There is a lack of harmonization of data formats, processing, analysis, and data transfer, which leads to incompatibilities and lost opportunities. Legal frameworks for data sharing are evolving. Clinicians, researchers, and citizens need improved methods, tools, and training to generate, analyze, and query data effectively. Addressing these barriers will contribute to creating the European Single Market for health, which will improve health and healthcare for all Europeans.
Collapse
Affiliation(s)
- Charles Auffray
- European Institute for Systems Biology and Medicine, 1 avenue Claude Vellefaux, 75010, Paris, France.
- CIRI-UMR5308, CNRS-ENS-INSERM-UCBL, Université de Lyon, 50 avenue Tony Garnier, 69007, Lyon, France.
| | - Rudi Balling
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7 Avenue des Hauts Fourneaux, 4362, Esch-sur-Alzette, Luxembourg.
| | - Inês Barroso
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - László Bencze
- Health Services Management Training Centre, Faculty of Health and Public Services, Semmelweis University, Kútvölgyi út 2, 1125, Budapest, Hungary
| | - Mikael Benson
- Centre for Personalised Medicine, Linköping University, 581 85, Linköping, Sweden
| | - Jay Bergeron
- Translational & Bioinformatics, Pfizer Inc., 300 Technology Square, Cambridge, MA, 02139, USA
| | - Enrique Bernal-Delgado
- Institute for Health Sciences, IACS - IIS Aragon, San Juan Bosco 13, 50009, Zaragoza, Spain
| | - Niklas Blomberg
- ELIXIR, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Lazarettgasse 14, AKH BT25.2, 1090, Vienna, Austria
- Department of Laboratory Medicine, Medical University of Vienna, Lazarettgasse 14, AKH BT25.2, 1090, Vienna, Austria
- Max Planck Institute for Informatics, Campus E1 4, 66123, Saarbrücken, Germany
| | - Ana Conesa
- Príncipe Felipe Research Center, C/ Eduardo Primo Yúfera 3, 46012, Valencia, Spain
- University of Florida, Institute of Food and Agricultural Sciences (IFAS), 2033 Mowry Road, Gainesville, FL, 32610, USA
| | | | - Christophe Delogne
- Technology, Data & Analytics, KPMG Luxembourg, Société Coopérative, 39 Avenue John F. Kennedy, 1855, Luxembourg, Luxembourg
| | - Peter Devilee
- Department of Human Genetics, Department of Pathology, Leiden University Medical Centre, Einthovenweg 20, 2333 ZC, Leiden, The Netherlands
| | - Alberto Di Meglio
- Information Technology Department, European Organization for Nuclear Research (CERN), 385 Route de Meyrin, 1211, Geneva 23, Switzerland
| | - Marinus Eijkemans
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, 3508 GA, Utrecht, The Netherlands
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Norbert Graf
- Department of Pediatric Oncology/Hematology, Saarland University, Campus Homburg, Building 9, 66421, Homburg, Germany
| | - Vera Grimm
- Project Management Jülich, Forschungszentrum Jülich GmbH, Wilhelm-Johnen-Straße, 52428, Jülich, Germany
| | - Henk-Jan Guchelaar
- Department of Clinical Pharmacy & Toxicology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, The Netherlands
| | - Yi-Ke Guo
- Data Science Institute, Imperial College London, South Kensington, London, SW7 2AZ, UK
| | - Ivo Glynne Gut
- CNAG-CRG, Center for Genomic Regulation, Barcelona Institute for Science and Technology (BIST), C/Baldiri Reixac 4, 08029, Barcelona, Spain
| | - Allan Hanbury
- Institute of Software Technology and Interactive Systems, TU Wien, Favoritenstrasse 9-11/188, 1040, Vienna, Austria
| | - Shahid Hanif
- The Association of the British Pharmaceutical Industry, 7th Floor, Southside, 105 Victoria Street, London, SW1E 6QT, UK
| | - Ralf-Dieter Hilgers
- Department of Medical Statistics, RWTH-Aachen University, Universitätsklinikum Aachen, Pauwelsstraße 30, 52074, Aachen, Germany
| | - Ángel Honrado
- SYNAPSE Research Management Partners, Diputació 237, Àtic 3ª, 08007, Barcelona, Spain
| | - D Rod Hose
- Department of Infection, Immunity and Cardiovascular Disease and Insigneo Institute for In-Silico Medicine, Medical School, University of Sheffield, Beech Hill Road, Sheffield, S10 2RX, UK
| | | | - Tim Hubbard
- Department of Medical & Molecular Genetics, King's College London, London, SE1 9RT, UK
- Genomics England, London, EC1M 6BQ, UK
| | - Sophie Helen Janacek
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Haralampos Karanikas
- National and Kapodistrian University of Athens, Medical School, Xristou Lada 6, 10561, Athens, Greece
| | - Tim Kievits
- Vitromics Healthcare Holding B.V., Onderwijsboulevard 225, 5223 DE, 's-Hertogenbosch, The Netherlands
| | - Manfred Kohler
- Fraunhofer Institute for Molecular Biology and Applied Ecology ScreeningPort, Schnackenburgallee 114, 22525, Hamburg, Germany
| | - Andreas Kremer
- ITTM S.A., 9 avenue des Hauts Fourneaux, 4362, Esch-sur-Alzette, Luxembourg
| | - Jerry Lanfear
- Research Business Technology, Pfizer Ltd, GP4 Building, Granta Park, Cambridge, CB21 6GP, UK
| | - Thomas Lengauer
- Max Planck Institute for Informatics, Campus E1 4, 66123, Saarbrücken, Germany
| | - Edith Maes
- Health Economics & Outcomes Research, Deloitte Belgium, Berkenlaan 8A, 1831, Diegem, Belgium
| | - Theo Meert
- Janssen Pharmaceutica N.V., R&D G3O, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Werner Müller
- Faculty of Life Sciences, University of Manchester, AV Hill Building, Oxford Road, Manchester, M13 9PT, UK
| | - Dörthe Nickel
- UMR3664 IC/CNRS, Institut Curie, Section Recherche, Pavillon Pasteur, 26 rue d'Ulm, 75248, Paris cedex 05, France
| | - Peter Oledzki
- Linguamatics Ltd, 324 Cambridge Science Park Milton Rd, Cambridge, CB4 0WG, UK
| | - Bertrand Pedersen
- PwC Luxembourg, 2 rue Gerhard Mercator, 2182, Luxembourg, Luxembourg
| | - Milan Petkovic
- Philips, HighTechCampus 36, 5656AE, Eindhoven, The Netherlands
| | - Konstantinos Pliakos
- Department of Public Health and Primary Care, KU Leuven Kulak, Etienne Sabbelaan 53, 8500, Kortrijk, Belgium
| | - Magnus Rattray
- Faculty of Life Sciences, University of Manchester, AV Hill Building, Oxford Road, Manchester, M13 9PT, UK
| | - Josep Redón I Màs
- INCLIVA Health Research Institute, University of Valencia, CIBERobn ISCIII, Avenida Menéndez Pelayo 4 accesorio, 46010, Valencia, Spain
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7 Avenue des Hauts Fourneaux, 4362, Esch-sur-Alzette, Luxembourg
| | - Thierry Sengstag
- Swiss Institute of Bioinformatics (SIB) and University of Basel, Klingelbergstrasse 50/70, 4056, Basel, Switzerland
| | - Xavier Serra-Picamal
- Agency for Health Quality and Assessment of Catalonia (AQuAS), Carrer de Roc Boronat 81-95, 08005, Barcelona, Spain
| | - Wouter Spek
- EuroBioForum Foundation, Chrysantstraat 10, 3135 HG, Vlaardingen, The Netherlands
| | - Lea A I Vaas
- Fraunhofer Institute for Molecular Biology and Applied Ecology ScreeningPort, Schnackenburgallee 114, 22525, Hamburg, Germany
| | - Okker van Batenburg
- EuroBioForum Foundation, Chrysantstraat 10, 3135 HG, Vlaardingen, The Netherlands
| | - Marc Vandelaer
- Integrated BioBank of Luxembourg, 6 rue Nicolas-Ernest Barblé, 1210, Luxembourg, Luxembourg
| | - Peter Varnai
- Technopolis Group, 3 Pavilion Buildings, Brighton, BN1 1EE, UK
| | - Pablo Villoslada
- Hospital Clinic of Barcelona, Institute d'Investigacions Biomediques August Pi Sunyer (IDIBAPS), Rosello 149, 08036, Barcelona, Spain
| | - Juan Antonio Vizcaíno
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - John Peter Mary Wubbe
- European Platform for Patients' Organisations, Science and Industry (Epposi), De Meeûs Square 38-40, 1000, Brussels, Belgium
| | - Gianluigi Zanetti
- CRS4, Ed.1 POLARIS, 09129, Pula, Italy
- BBMRI-ERIC, Neue Stiftingtalstrasse 2/B/6, 8010, Graz, Austria
| |
Collapse
|
11
|
Garazha A, Ivanova A, Suntsova M, Malakhova G, Roumiantsev S, Zhavoronkov A, Buzdin A. New bioinformatic tool for quick identification of functionally relevant endogenous retroviral inserts in human genome. Cell Cycle 2016; 14:1476-84. [PMID: 25853282 PMCID: PMC4612461 DOI: 10.1080/15384101.2015.1022696] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of “domestication” of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.
Collapse
Affiliation(s)
- Andrew Garazha
- a Group for Genomic Regulation of Cell Signaling Systems ; Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry ; Moscow , Russia
| | | | | | | | | | | | | |
Collapse
|
12
|
Sobeslav V, Maresova P, Krejcar O, Franca TC, Kuca K. Use of cloud computing in biomedicine. J Biomol Struct Dyn 2016; 34:2688-2697. [DOI: 10.1080/07391102.2015.1127182] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
13
|
Luo J, Wu M, Gopukumar D, Zhao Y. Big Data Application in Biomedical Research and Health Care: A Literature Review. BIOMEDICAL INFORMATICS INSIGHTS 2016; 8:1-10. [PMID: 26843812 PMCID: PMC4720168 DOI: 10.4137/bii.s31559] [Citation(s) in RCA: 153] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 12/06/2015] [Accepted: 12/06/2015] [Indexed: 01/01/2023]
Abstract
Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care.
Collapse
Affiliation(s)
- Jake Luo
- College of Health Science, Department of Health Informatics and Administration, Center for Biomedical Data and Language Processing, University of Wisconsin–Milwaukee, Milwaukee, WI, USA
| | - Min Wu
- College of Health Science, Department of Health Informatics and Administration, Center for Biomedical Data and Language Processing, University of Wisconsin–Milwaukee, Milwaukee, WI, USA
| | - Deepika Gopukumar
- College of Health Science, Department of Health Informatics and Administration, Center for Biomedical Data and Language Processing, University of Wisconsin–Milwaukee, Milwaukee, WI, USA
| | - Yiqing Zhao
- College of Health Science, Department of Health Informatics and Administration, Center for Biomedical Data and Language Processing, University of Wisconsin–Milwaukee, Milwaukee, WI, USA
| |
Collapse
|
14
|
Kanbar LJ, Shalish W, Robles-Rubio CA, Precup D, Brown K, Sant'Anna GM, Kearney RE. Organizational principles of cloud storage to support collaborative biomedical research. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:1231-4. [PMID: 26736489 DOI: 10.1109/embc.2015.7318589] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This paper describes organizational guidelines and an anonymization protocol for the management of sensitive information in interdisciplinary, multi-institutional studies with multiple collaborators. This protocol is flexible, automated, and suitable for use in cloud-based projects as well as for publication of supplementary information in journal papers. A sample implementation of the anonymization protocol is illustrated for an ongoing study dealing with Automated Prediction of EXtubation readiness (APEX).
Collapse
|
15
|
Wallace MAG, Kormos TM, Pleil JD. Blood-borne biomarkers and bioindicators for linking exposure to health effects in environmental health science. JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH. PART B, CRITICAL REVIEWS 2016; 19:380-409. [PMID: 27759495 PMCID: PMC6147038 DOI: 10.1080/10937404.2016.1215772] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Environmental health science aims to link environmental pollution sources to adverse health outcomes to develop effective exposure intervention strategies that reduce long-term disease risks. Over the past few decades, the public health community recognized that health risk is driven by interaction between the human genome and external environment. Now that the human genetic code has been sequenced, establishing this "G × E" (gene-environment) interaction requires a similar effort to decode the human exposome, which is the accumulation of an individual's environmental exposures and metabolic responses throughout the person's lifetime. The exposome is composed of endogenous and exogenous chemicals, many of which are measurable as biomarkers in blood, breath, and urine. Exposure to pollutants is assessed by analyzing biofluids for the pollutant itself or its metabolic products. New methods are being developed to use a subset of biomarkers, termed bioindicators, to demonstrate biological changes indicative of future adverse health effects. Typically, environmental biomarkers are assessed using noninvasive (excreted) media, such as breath and urine. Blood is often avoided for biomonitoring due to practical reasons such as medical personnel, infectious waste, or clinical setting, despite the fact that blood represents the central compartment that interacts with every living cell and is the most relevant biofluid for certain applications and analyses. The aims of this study were to (1) review the current use of blood samples in environmental health research, (2) briefly contrast blood with other biological media, and (3) propose additional applications for blood analysis in human exposure research.
Collapse
Affiliation(s)
- M Ariel Geer Wallace
- a Exposure Methods and Measurement Division, National Exposure Research Laboratory, Office of Research and Development , U.S. Environmental Protection Agency , Research Triangle Park , North Carolina , USA
| | | | - Joachim D Pleil
- a Exposure Methods and Measurement Division, National Exposure Research Laboratory, Office of Research and Development , U.S. Environmental Protection Agency , Research Triangle Park , North Carolina , USA
| |
Collapse
|
16
|
Regan K, Payne PRO. From Molecules to Patients: The Clinical Applications of Translational Bioinformatics. Yearb Med Inform 2015; 10:164-9. [PMID: 26293863 PMCID: PMC4587059 DOI: 10.15265/iy-2015-005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVE In order to realize the promise of personalized medicine, Translational Bioinformatics (TBI) research will need to continue to address implementation issues across the clinical spectrum. In this review, we aim to evaluate the expanding field of TBI towards clinical applications, and define common themes and current gaps in order to motivate future research. METHODS Here we present the state-of-the-art of clinical implementation of TBI-based tools and resources. Our thematic analyses of a targeted literature search of recent TBI-related articles ranged across topics in genomics, data management, hypothesis generation, molecular epidemiology, diagnostics, therapeutics and personalized medicine. RESULTS Open areas of clinically-relevant TBI research identified in this review include developing data standards and best practices, publicly available resources, integrative systemslevel approaches, user-friendly tools for clinical support, cloud computing solutions, emerging technologies and means to address pressing legal, ethical and social issues. CONCLUSIONS There is a need for further research bridging the gap from foundational TBI-based theories and methodologies to clinical implementation. We have organized the topic themes presented in this review into four conceptual foci - domain analyses, knowledge engineering, computational architectures and computation methods alongside three stages of knowledge development in order to orient future TBI efforts to accelerate the goals of personalized medicine.
Collapse
Affiliation(s)
| | - P R O Payne
- Philip R.O. Payne, PhD, FACMI, The Ohio State University, Department of Biomedical Informatics, 250 Lincoln Tower, 1800 Cannon Drive, Columbus, OH 43210, USA, Tel: +1 614 292 4778, E-mail:
| |
Collapse
|
17
|
Toward a Literature-Driven Definition of Big Data in Healthcare. BIOMED RESEARCH INTERNATIONAL 2015; 2015:639021. [PMID: 26137488 PMCID: PMC4468280 DOI: 10.1155/2015/639021] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 02/04/2015] [Indexed: 11/17/2022]
Abstract
Objective. The aim of this study was to provide a definition of big data in healthcare. Methods. A systematic search of PubMed literature published until May 9, 2014, was conducted. We noted the number of statistical individuals (n) and the number of variables (p) for all papers describing a dataset. These papers were classified into fields of study. Characteristics attributed to big data by authors were also considered. Based on this analysis, a definition of big data was proposed. Results. A total of 196 papers were included. Big data can be defined as datasets with Log(n∗p) ≥ 7. Properties of big data are its great variety and high velocity. Big data raises challenges on veracity, on all aspects of the workflow, on extracting meaningful information, and on sharing information. Big data requires new computational methods that optimize data management. Related concepts are data reuse, false knowledge discovery, and privacy issues. Conclusion. Big data is defined by volume. Big data should not be confused with data reuse: data can be big without being reused for another purpose, for example, in omics. Inversely, data can be reused without being necessarily big, for example, secondary use of Electronic Medical Records (EMR) data.
Collapse
|
18
|
Simonyan V, Mazumder R. High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis. Genes (Basel) 2014; 5:957-81. [PMID: 25271953 PMCID: PMC4276921 DOI: 10.3390/genes5040957] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Revised: 09/22/2014] [Accepted: 09/22/2014] [Indexed: 12/30/2022] Open
Abstract
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.
Collapse
Affiliation(s)
- Vahan Simonyan
- Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, MD 20993, USA.
| | - Raja Mazumder
- Department of Biochemistry and Molecular Medicine, George Washington University, Washington, DC 20037, USA.
| |
Collapse
|
19
|
Xia J, Fang AC, Zhang X. A novel feature selection strategy for enhanced biomedical event extraction using the Turku system. BIOMED RESEARCH INTERNATIONAL 2014; 2014:205239. [PMID: 24800214 PMCID: PMC3997098 DOI: 10.1155/2014/205239] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 02/22/2014] [Accepted: 03/03/2014] [Indexed: 12/25/2022]
Abstract
Feature selection is of paramount importance for text-mining classifiers with high-dimensional features. The Turku Event Extraction System (TEES) is the best performing tool in the GENIA BioNLP 2009/2011 shared tasks, which relies heavily on high-dimensional features. This paper describes research which, based on an implementation of an accumulated effect evaluation (AEE) algorithm applying the greedy search strategy, analyses the contribution of every single feature class in TEES with a view to identify important features and modify the feature set accordingly. With an updated feature set, a new system is acquired with enhanced performance which achieves an increased F-score of 53.27% up from 51.21% for Task 1 under strict evaluation criteria and 57.24% according to the approximate span and recursive criterion.
Collapse
Affiliation(s)
- Jingbo Xia
- College of Science, Huazhong Agricultural University, Wuhan, Hubei 430070, China
- Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong
| | - Alex Chengyu Fang
- Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong
- The Halliday Centre for Intelligent Applications of Language Studies, City University of Hong Kong, Kowloon, Hong Kong
| | - Xing Zhang
- Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong
- The Halliday Centre for Intelligent Applications of Language Studies, City University of Hong Kong, Kowloon, Hong Kong
| |
Collapse
|
20
|
Zhang W, Zang J, Jing X, Sun Z, Yan W, Yang D, Shen B, Guo F. Identification of candidate miRNA biomarkers from miRNA regulatory network with application to prostate cancer. J Transl Med 2014; 12:66. [PMID: 24618011 PMCID: PMC4007708 DOI: 10.1186/1479-5876-12-66] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Accepted: 01/28/2014] [Indexed: 02/08/2023] Open
Abstract
Background MicroRNAs (miRNAs) are a class of non-coding regulatory RNAs approximately 22 nucleotides in length that play a role in a wide range of biological processes. Abnormal miRNA function has been implicated in various human cancers including prostate cancer (PCa). Altered miRNA expression may serve as a biomarker for cancer diagnosis and treatment. However, limited data are available on the role of cancer-specific miRNAs. Integrative computational bioinformatics approaches are effective for the detection of potential outlier miRNAs in cancer. Methods The human miRNA-mRNA target network was reconstructed by integrating multiple miRNA-mRNA interaction datasets. Paired miRNA and mRNA expression profiling data in PCa versus benign prostate tissue samples were used as another source of information. These datasets were analyzed with an integrated bioinformatics framework to identify potential PCa miRNA signatures. In vitro q-PCR experiments and further systematic analysis were used to validate these prediction results. Results Using this bioinformatics framework, we identified 39 miRNAs as potential PCa miRNA signatures. Among these miRNAs, 20 had previously been identified as PCa aberrant miRNAs by low-throughput methods, and 16 were shown to be deregulated in other cancers. In vitro q-PCR experiments verified the accuracy of these predictions. miR-648 was identified as a novel candidate PCa miRNA biomarker. Further functional and pathway enrichment analysis confirmed the association of the identified miRNAs with PCa progression. Conclusions Our analysis revealed the scale-free features of the human miRNA-mRNA interaction network and showed the distinctive topological features of existing cancer miRNA biomarkers from previously published studies. A novel cancer miRNA biomarker prediction framework was designed based on these observations and applied to prostate cancer study. This method could be applied for miRNA biomarker prediction in other cancers.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Bairong Shen
- Center for Systems Biology, Soochow University, Suzhou 215006, China.
| | | |
Collapse
|
21
|
High-Throughput Translational Medicine: Challenges and Solutions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 799:39-67. [DOI: 10.1007/978-1-4614-8778-4_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
22
|
Faustino RS, Arrell DK, Folmes CDL, Terzic A, Perez-Terzic C. Stem cell systems informatics for advanced clinical biodiagnostics: tracing molecular signatures from bench to bedside. Croat Med J 2013. [PMID: 23986272 PMCID: PMC3760656 DOI: 10.3325//cmj.2013.54.319] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Development of innovative high throughput technologies has enabled a variety of molecular landscapes to be interrogated with an unprecedented degree of detail. Emergence of next generation nucleotide sequencing methods, advanced proteomic techniques, and metabolic profiling approaches continue to produce a wealth of biological data that captures molecular frameworks underlying phenotype. The advent of these novel technologies has significant translational applications, as investigators can now explore molecular underpinnings of developmental states with a high degree of resolution. Application of these leading-edge techniques to patient samples has been successfully used to unmask nuanced molecular details of disease vs healthy tissue, which may provide novel targets for palliative intervention. To enhance such approaches, concomitant development of algorithms to reprogram differentiated cells in order to recapitulate pluripotent capacity offers a distinct advantage to advancing diagnostic methodology. Bioinformatic deconvolution of several “-omic” layers extracted from reprogrammed patient cells, could, in principle, provide a means by which the evolution of individual pathology can be developmentally monitored. Significant logistic challenges face current implementation of this novel paradigm of patient treatment and care, however, several of these limitations have been successfully addressed through continuous development of cutting edge in silico archiving and processing methods. Comprehensive elucidation of genomic, transcriptomic, proteomic, and metabolomic networks that define normal and pathological states, in combination with reprogrammed patient cells are thus poised to become high value resources in modern diagnosis and prognosis of patient disease.
Collapse
Affiliation(s)
- Randolph S Faustino
- C. Perez-Terzic, Mayo Clinic, 200 First Street SW, Rochester, MN, USA 55905,
| | | | | | | | | |
Collapse
|
23
|
Lin YC, Yu CS, Lin YJ. Enabling large-scale biomedical analysis in the cloud. BIOMED RESEARCH INTERNATIONAL 2013; 2013:185679. [PMID: 24288665 PMCID: PMC3832998 DOI: 10.1155/2013/185679] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 09/22/2013] [Indexed: 01/02/2023]
Abstract
Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable.
Collapse
Affiliation(s)
- Ying-Chih Lin
- Master's Program in Biomedical Informatics and Biomedical Engineering, Feng Chia University, No. 100 Wenhwa Road, Seatwen, Taichung 40724, Taiwan
- Department of Applied Mathematics, Feng Chia University, No. 100 Wenhwa Road, Seatwen, Taichung 40724, Taiwan
| | - Chin-Sheng Yu
- Master's Program in Biomedical Informatics and Biomedical Engineering, Feng Chia University, No. 100 Wenhwa Road, Seatwen, Taichung 40724, Taiwan
- Department of Information Engineering and Computer Science, Feng Chia University, No. 100 Wenhwa Road, Seatwen, Taichung 40724, Taiwan
| | - Yen-Jen Lin
- Department of Computer Science, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, Hsinchu 30013, Taiwan
| |
Collapse
|
24
|
Secure encapsulation and publication of biological services in the cloud computing environment. BIOMED RESEARCH INTERNATIONAL 2013; 2013:170580. [PMID: 24078906 PMCID: PMC3773971 DOI: 10.1155/2013/170580] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Accepted: 06/19/2013] [Indexed: 11/17/2022]
Abstract
Secure encapsulation and publication for bioinformatics software products based on web service are presented, and the basic function of biological information is realized in the cloud computing environment. In the encapsulation phase, the workflow and function of bioinformatics software are conducted, the encapsulation interfaces are designed, and the runtime interaction between users and computers is simulated. In the publication phase, the execution and management mechanisms and principles of the GRAM components are analyzed. The functions such as remote user job submission and job status query are implemented by using the GRAM components. The services of bioinformatics software are published to remote users. Finally the basic prototype system of the biological cloud is achieved.
Collapse
|
25
|
Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies. ISRN BIOINFORMATICS 2013; 2013:481545. [PMID: 25937948 PMCID: PMC4393068 DOI: 10.1155/2013/481545] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 08/07/2013] [Indexed: 01/31/2023]
Abstract
RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.
Collapse
|
26
|
Faustino RS, Arrell DK, Folmes CD, Terzic A, Perez-Terzic C. Stem cell systems informatics for advanced clinical biodiagnostics: tracing molecular signatures from bench to bedside. Croat Med J 2013; 54:319-29. [PMID: 23986272 PMCID: PMC3760656 DOI: 10.3325/cmj.2013.54.319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023] Open
Abstract
Development of innovative high throughput technologies has enabled a variety of molecular landscapes to be interrogated with an unprecedented degree of detail. Emergence of next generation nucleotide sequencing methods, advanced proteomic techniques, and metabolic profiling approaches continue to produce a wealth of biological data that captures molecular frameworks underlying phenotype. The advent of these novel technologies has significant translational applications, as investigators can now explore molecular underpinnings of developmental states with a high degree of resolution. Application of these leading-edge techniques to patient samples has been successfully used to unmask nuanced molecular details of disease vs healthy tissue, which may provide novel targets for palliative intervention. To enhance such approaches, concomitant development of algorithms to reprogram differentiated cells in order to recapitulate pluripotent capacity offers a distinct advantage to advancing diagnostic methodology. Bioinformatic deconvolution of several "-omic" layers extracted from reprogrammed patient cells, could, in principle, provide a means by which the evolution of individual pathology can be developmentally monitored. Significant logistic challenges face current implementation of this novel paradigm of patient treatment and care, however, several of these limitations have been successfully addressed through continuous development of cutting edge in silico archiving and processing methods. Comprehensive elucidation of genomic, transcriptomic, proteomic, and metabolomic networks that define normal and pathological states, in combination with reprogrammed patient cells are thus poised to become high value resources in modern diagnosis and prognosis of patient disease.
Collapse
Affiliation(s)
- Randolph S. Faustino
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | - D. Kent Arrell
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | - Clifford D.L. Folmes
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | - Andre Terzic
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA
| | - Carmen Perez-Terzic
- Division of Cardiovascular Diseases, Departments of Medicine, Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USA,Physical Medicine and Rehabilitation, Mayo Clinic College of Medicine, Rochester, MN, USA
| |
Collapse
|