1
|
Zitnik M, Li MM, Wells A, Glass K, Morselli Gysi D, Krishnan A, Murali TM, Radivojac P, Roy S, Baudot A, Bozdag S, Chen DZ, Cowen L, Devkota K, Gitter A, Gosline SJC, Gu P, Guzzi PH, Huang H, Jiang M, Kesimoglu ZN, Koyuturk M, Ma J, Pico AR, Pržulj N, Przytycka TM, Raphael BJ, Ritz A, Sharan R, Shen Y, Singh M, Slonim DK, Tong H, Yang XH, Yoon BJ, Yu H, Milenković T. Current and future directions in network biology. BIOINFORMATICS ADVANCES 2024; 4:vbae099. [PMID: 39143982 PMCID: PMC11321866 DOI: 10.1093/bioadv/vbae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 05/31/2024] [Accepted: 07/08/2024] [Indexed: 08/16/2024]
Abstract
Summary Network biology is an interdisciplinary field bridging computational and biological sciences that has proved pivotal in advancing the understanding of cellular functions and diseases across biological systems and scales. Although the field has been around for two decades, it remains nascent. It has witnessed rapid evolution, accompanied by emerging challenges. These stem from various factors, notably the growing complexity and volume of data together with the increased diversity of data types describing different tiers of biological organization. We discuss prevailing research directions in network biology, focusing on molecular/cellular networks but also on other biological network types such as biomedical knowledge graphs, patient similarity networks, brain networks, and social/contact networks relevant to disease spread. In more detail, we highlight areas of inference and comparison of biological networks, multimodal data integration and heterogeneous networks, higher-order network analysis, machine learning on networks, and network-based personalized medicine. Following the overview of recent breakthroughs across these five areas, we offer a perspective on future directions of network biology. Additionally, we discuss scientific communities, educational initiatives, and the importance of fostering diversity within the field. This article establishes a roadmap for an immediate and long-term vision for network biology. Availability and implementation Not applicable.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Michelle M Li
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Aydin Wells
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
| | - Deisy Morselli Gysi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, United States
- Department of Statistics, Federal University of Paraná, Curitiba, Paraná 81530-015, Brazil
- Department of Physics, Northeastern University, Boston, MA 02115, United States
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States
| | - Sushmita Roy
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Wisconsin Institute for Discovery, Madison, WI 53715, United States
| | - Anaïs Baudot
- Aix Marseille Université, INSERM, MMG, Marseille, France
| | - Serdar Bozdag
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- Department of Mathematics, University of North Texas, Denton, TX 76203, United States
| | - Danny Z Chen
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Kapil Devkota
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, United States
- Morgridge Institute for Research, Madison, WI 53715, United States
| | - Sara J C Gosline
- Biological Sciences Division, Pacific Northwest National Laboratory, Seattle, WA 98109, United States
| | - Pengfei Gu
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Pietro H Guzzi
- Department of Medical and Surgical Sciences, University Magna Graecia of Catanzaro, Catanzaro, 88100, Italy
| | - Heng Huang
- Department of Computer Science, University of Maryland College Park, College Park, MD 20742, United States
| | - Meng Jiang
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Ziynet Nesibe Kesimoglu
- Department of Computer Science and Engineering, University of North Texas, Denton, TX 76203, United States
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Mehmet Koyuturk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA 94158, United States
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, WC1E 6BT, England
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, 08010, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, 08034, Spain
| | - Teresa M Przytycka
- National Center of Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20814, United States
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
| | - Anna Ritz
- Department of Biology, Reed College, Portland, OR 97202, United States
| | - Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, 69978, Israel
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, United States
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, United States
| | - Donna K Slonim
- Department of Computer Science, Tufts University, Medford, MA 02155, United States
| | - Hanghang Tong
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| | - Xinan Holly Yang
- Department of Pediatrics, University of Chicago, Chicago, IL 60637, United States
| | - Byung-Jun Yoon
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Computational Science Initiative, Brookhaven National Laboratory, Upton, NY 11973, United States
| | - Haiyuan Yu
- Department of Computational Biology, Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, United States
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, United States
- Lucy Family Institute for Data and Society, University of Notre Dame, Notre Dame, IN 46556, United States
- Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, United States
| |
Collapse
|
2
|
Lobentanzer S, Rodriguez-Mier P, Bauer S, Saez-Rodriguez J. Molecular causality in the advent of foundation models. Mol Syst Biol 2024; 20:848-858. [PMID: 38890548 PMCID: PMC11297329 DOI: 10.1038/s44320-024-00041-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 03/18/2024] [Accepted: 03/21/2024] [Indexed: 06/20/2024] Open
Abstract
Correlation is not causation: this simple and uncontroversial statement has far-reaching implications. Defining and applying causality in biomedical research has posed significant challenges to the scientific community. In this perspective, we attempt to connect the partly disparate fields of systems biology, causal reasoning, and machine learning to inform future approaches in the field of systems biology and molecular medicine.
Collapse
Affiliation(s)
- Sebastian Lobentanzer
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| | - Pablo Rodriguez-Mier
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Institute for Computational Biomedicine, Heidelberg, Germany.
| |
Collapse
|
3
|
Kepper MM, Fowler LA, Kusters IS, Davis JW, Baqer M, Sagui-Henson S, Xiao Y, Tarfa A, Yi JC, Gibson B, Heron KE, Alberts NM, Burgermaster M, Njie-Carr VP, Klesges LM. Expanding a Behavioral View on Digital Health Access: Drivers and Strategies to Promote Equity. J Med Internet Res 2024; 26:e51355. [PMID: 39088246 PMCID: PMC11327633 DOI: 10.2196/51355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 05/02/2024] [Accepted: 05/27/2024] [Indexed: 08/02/2024] Open
Abstract
The potential and threat of digital tools to achieve health equity has been highlighted for over a decade, but the success of achieving equitable access to health technologies remains challenging. Our paper addresses renewed concerns regarding equity in digital health access that were deepened during the COVID-19 pandemic. Our viewpoint is that (1) digital health tools have the potential to improve health equity if equitable access is achieved, and (2) improving access and equity in digital health can be strengthened by considering behavioral science-based strategies embedded in all phases of tool development. Using behavioral, equity, and access frameworks allowed for a unique and comprehensive exploration of current drivers of digital health inequities. This paper aims to present a compilation of strategies that can potentially have an actionable impact on digital health equity. Multilevel factors drive unequal access, so strategies require action from tool developers, individual delivery agents, organizations, and systems to effect change. Strategies were shaped with a behavioral medicine focus as the field has a unique role in improving digital health access; arguably, all digital tools require the user (individual, provider, and health system) to change behavior by engaging with the technology to generate impact. This paper presents a model that emphasizes using multilevel strategies across design, delivery, dissemination, and sustainment stages to advance digital health access and foster health equity.
Collapse
Affiliation(s)
- Maura M Kepper
- Prevention Research Center, Washington University in St. Louis, St. Louis, MO, United States
| | - Lauren A Fowler
- Sexuality, Health, and Gender Center, Washington University in St. Louis School of Medicine, Saint Louis, MO, United States
| | - Isabelle S Kusters
- Department of Health, Human, and Biomedical Sciences, University of Houston-Clear Lake, Houston, TX, United States
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, TX, United States
| | - Jean W Davis
- College of Nursing, University of Central Florida, Orlando, FL, United States
| | - Manal Baqer
- Neamah Health Consulting, Boston, MA, United States
| | - Sara Sagui-Henson
- Clinical Strategy and Research Team, Modern Health, San Francisco, CA, United States
| | - Yunyu Xiao
- Department of Population Health Science, Weill Cornell Medicine, Cornell University, New York, NY, United States
| | - Adati Tarfa
- School of Medicine, Yale University, New Haven, CT, United States
| | - Jean C Yi
- Fred Hutchinson Cancer Center, Seattle, WA, United States
| | - Bryan Gibson
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Kristin E Heron
- Psychology Department, Old Dominion University, Norfolk, VA, United States
- Virginia Consortium Program in Clinical Psychology, Norfolk, VA, United States
| | - Nicole M Alberts
- Department of Psychology, Concordia University, Montreal, QC, Canada
| | - Marissa Burgermaster
- Department of Nutritional Sciences, University of Texas at Austin, Austin, TX, United States
- Department of Population Health, Dell Medical School, University of Texas at Austin, Austin, TX, United States
| | - Veronica Ps Njie-Carr
- Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD, United States
| | - Lisa M Klesges
- Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine, St. Louis, MO, United States
| |
Collapse
|
4
|
Bornkamp B, Zaoli S, Azzarito M, Martin R, Müller CP, Moloney C, Capestro G, Ohlssen D, Baillie M. Predicting subgroup treatment effects for a new study: Motivations, results and learnings from running a data challenge in a pharmaceutical corporation. Pharm Stat 2024; 23:495-510. [PMID: 38326967 DOI: 10.1002/pst.2368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 12/01/2023] [Accepted: 01/21/2024] [Indexed: 02/09/2024]
Abstract
We present the motivation, experience, and learnings from a data challenge conducted at a large pharmaceutical corporation on the topic of subgroup identification. The data challenge aimed at exploring approaches to subgroup identification for future clinical trials. To mimic a realistic setting, participants had access to 4 Phase III clinical trials to derive a subgroup and predict its treatment effect on a future study not accessible to challenge participants. A total of 30 teams registered for the challenge with around 100 participants, primarily from Biostatistics organization. We outline the motivation for running the challenge, the challenge rules, and logistics. Finally, we present the results of the challenge, the participant feedback as well as the learnings. We also present our view on the implications of the results on exploratory analyses related to treatment effect heterogeneity.
Collapse
Affiliation(s)
- Björn Bornkamp
- Global Drug Development, Novartis Pharma AG, Basel, Switzerland
| | - Silvia Zaoli
- Global Drug Development, Novartis Pharma AG, Basel, Switzerland
| | | | - Ruvie Martin
- Global Drug Development, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| | | | - Conor Moloney
- Global Drug Development, Novartis Pharma AG, Dublin, Ireland
| | - Giulia Capestro
- Global Drug Development, Novartis Pharma AG, Basel, Switzerland
| | - David Ohlssen
- Global Drug Development, Novartis Pharmaceuticals Corporation, East Hanover, New Jersey, USA
| | - Mark Baillie
- Global Drug Development, Novartis Pharma AG, Basel, Switzerland
| |
Collapse
|
5
|
Ahsen ME, Vogel R, Stolovitzky G. Optimal linear ensemble of binary classifiers. BIOINFORMATICS ADVANCES 2024; 4:vbae093. [PMID: 39011276 PMCID: PMC11249386 DOI: 10.1093/bioadv/vbae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/03/2024] [Accepted: 06/13/2024] [Indexed: 07/17/2024]
Abstract
Motivation The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data. Results To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data. We developed both an unsupervised (uMOCA) and a supervised (sMOCA) variant of MOCA. For uMOCA, we show how to infer the MOCA weights in an unsupervised way, which are optimal under the assumption of class-conditioned independent classifier predictions. When it is possible to use labels, sMOCA uses empirically computed MOCA weights. We demonstrate the performance of uMOCA and sMOCA using simulated data as well as actual data previously used in Dialogue on Reverse Engineering and Methods (DREAM) challenges. We also propose an application of sMOCA for transfer learning where we use pre-trained computational models from a domain where labeled data are abundant and apply them to a different domain with less abundant labeled data. Availability and implementation GitHub repository, https://github.com/robert-vogel/moca.
Collapse
Affiliation(s)
- Mehmet Eren Ahsen
- Department of Business Administration, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, United States
- Department of Biomedical and Translational Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, United States
| | - Robert Vogel
- Thomas J. Watson Research Center, IBM, New York, NY 10598, United States
- Department of Integrated Structural and Computational Biology, Scripps Research, La Jolla, CA 92037, United States
| | | |
Collapse
|
6
|
Bergquist T, Schaffter T, Yan Y, Yu T, Prosser J, Gao J, Chen G, Charzewski Ł, Nawalany Z, Brugere I, Retkute R, Prusokiene A, Prusokas A, Choi Y, Lee S, Choe J, Lee I, Kim S, Kang J, Mooney SD, Guinney J. Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine. J Am Med Inform Assoc 2023; 31:35-44. [PMID: 37604111 PMCID: PMC10746301 DOI: 10.1093/jamia/ocad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/05/2023] [Accepted: 08/08/2023] [Indexed: 08/23/2023] Open
Abstract
OBJECTIVE Applications of machine learning in healthcare are of high interest and have the potential to improve patient care. Yet, the real-world accuracy of these models in clinical practice and on different patient subpopulations remains unclear. To address these important questions, we hosted a community challenge to evaluate methods that predict healthcare outcomes. We focused on the prediction of all-cause mortality as the community challenge question. MATERIALS AND METHODS Using a Model-to-Data framework, 345 registered participants, coalescing into 25 independent teams, spread over 3 continents and 10 countries, generated 25 accurate models all trained on a dataset of over 1.1 million patients and evaluated on patients prospectively collected over a 1-year observation of a large health system. RESULTS The top performing team achieved a final area under the receiver operator curve of 0.947 (95% CI, 0.942-0.951) and an area under the precision-recall curve of 0.487 (95% CI, 0.458-0.499) on a prospectively collected patient cohort. DISCUSSION Post hoc analysis after the challenge revealed that models differ in accuracy on subpopulations, delineated by race or gender, even when they are trained on the same data. CONCLUSION This is the largest community challenge focused on the evaluation of state-of-the-art machine learning methods in a healthcare system performed to date, revealing both opportunities and pitfalls of clinical AI.
Collapse
Affiliation(s)
- Timothy Bergquist
- Sage Bionetworks, Seattle, WA, United States
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | | | - Yao Yan
- Sage Bionetworks, Seattle, WA, United States
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, United States
| | - Thomas Yu
- Sage Bionetworks, Seattle, WA, United States
| | - Justin Prosser
- Institute of Translational Health Sciences, University of Washington, Seattle, WA, United States
| | - Jifan Gao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
| | - Łukasz Charzewski
- Proacta, Warsaw, Poland
- Division of Biophysics, University of Warsaw, Warsaw, Poland
| | | | - Ivan Brugere
- Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Renata Retkute
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Alisa Prusokiene
- Plant and Molecular Sciences, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Augustinas Prusokas
- Department of Life Sciences, Imperial College London, London, United Kingdom
| | - Yonghwa Choi
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sanghoon Lee
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Junseok Choe
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Inggeol Lee
- Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sunkyu Kim
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Jaewoo Kang
- Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
- Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Justin Guinney
- Sage Bionetworks, Seattle, WA, United States
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| |
Collapse
|
7
|
Armato SG, Drukker K, Hadjiiski L. AI in medical imaging grand challenges: translation from competition to research benefit and patient care. Br J Radiol 2023; 96:20221152. [PMID: 37698542 PMCID: PMC10546459 DOI: 10.1259/bjr.20221152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 05/24/2023] [Accepted: 07/11/2023] [Indexed: 09/13/2023] Open
Abstract
Artificial intelligence (AI), in one form or another, has been a part of medical imaging for decades. The recent evolution of AI into approaches such as deep learning has dramatically accelerated the application of AI across a wide range of radiologic settings. Despite the promises of AI, developers and users of AI technology must be fully aware of its potential biases and pitfalls, and this knowledge must be incorporated throughout the AI system development pipeline that involves training, validation, and testing. Grand challenges offer an opportunity to advance the development of AI methods for targeted applications and provide a mechanism for both directing and facilitating the development of AI systems. In the process, a grand challenge centralizes (with the challenge organizers) the burden of providing a valid benchmark test set to assess performance and generalizability of participants' models and the collection and curation of image metadata, clinical/demographic information, and the required reference standard. The most relevant grand challenges are those designed to maximize the open-science nature of the competition, with code and trained models deposited for future public access. The ultimate goal of AI grand challenges is to foster the translation of AI systems from competition to research benefit and patient care. Rather than reference the many medical imaging grand challenges that have been organized by groups such as MICCAI, RSNA, AAPM, and grand-challenge.org, this review assesses the role of grand challenges in promoting AI technologies for research advancement and for eventual clinical implementation, including their promises and limitations.
Collapse
Affiliation(s)
- Samuel G Armato
- Department of Radiology, The University of Chicago, Chicago, Illinois, USA
| | - Karen Drukker
- Department of Radiology, The University of Chicago, Chicago, Illinois, USA
| | - Lubomir Hadjiiski
- Department of Radiology, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
8
|
Flanary VL, Fisher JL, Wilk EJ, Howton TC, Lasseigne BN. Computational Advancements in Cancer Combination Therapy Prediction. JCO Precis Oncol 2023; 7:e2300261. [PMID: 37824797 DOI: 10.1200/po.23.00261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 07/20/2023] [Accepted: 08/15/2023] [Indexed: 10/14/2023] Open
Abstract
Given the high attrition rate of de novo drug discovery and limited efficacy of single-agent therapies in cancer treatment, combination therapy prediction through in silico drug repurposing has risen as a time- and cost-effective alternative for identifying novel and potentially efficacious therapies for cancer. The purpose of this review is to provide an introduction to computational methods for cancer combination therapy prediction and to summarize recent studies that implement each of these methods. A systematic search of the PubMed database was performed, focusing on studies published within the past 10 years. Our search included reviews and articles of ongoing and retrospective studies. We prioritized articles with findings that suggest considerations for improving combination therapy prediction methods over providing a meta-analysis of all currently available cancer combination therapy prediction methods. Computational methods used for drug combination therapy prediction in cancer research include networks, regression-based machine learning, classifier machine learning models, and deep learning approaches. Each method class has its own advantages and disadvantages, so careful consideration is needed to determine the most suitable class when designing a combination therapy prediction method. Future directions to improve current combination therapy prediction technology include incorporation of disease pathobiology, drug characteristics, patient multiomics data, and drug-drug interactions to determine maximally efficacious and tolerable drug regimens for cancer. As computational methods improve in their capability to integrate patient, drug, and disease data, more comprehensive models can be developed to more accurately predict safe and efficacious combination drug therapies for cancer and other complex diseases.
Collapse
Affiliation(s)
- Victoria L Flanary
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Jennifer L Fisher
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Elizabeth J Wilk
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Timothy C Howton
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| | - Brittany N Lasseigne
- Department of Cell, Developmental and Integrative Biology, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL
| |
Collapse
|
9
|
Clunie DA, Flanders A, Taylor A, Erickson B, Bialecki B, Brundage D, Gutman D, Prior F, Seibert JA, Perry J, Gichoya JW, Kirby J, Andriole K, Geneslaw L, Moore S, Fitzgerald TJ, Tellis W, Xiao Y, Farahani K, Luo J, Rosenthal A, Kandarpa K, Rosen R, Goetz K, Babcock D, Xu B, Hsiao J. Report of the Medical Image De-Identification (MIDI) Task Group - Best Practices and Recommendations. ARXIV 2023:arXiv:2303.10473v2. [PMID: 37033463 PMCID: PMC10081345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/11/2023]
Affiliation(s)
| | | | | | | | | | | | | | - Fred Prior
- University of Arkansas for Medical Sciences
| | | | | | | | - Justin Kirby
- Frederick National Laboratory for Cancer Research
| | | | | | | | | | | | - Ying Xiao
- University of Pennsylvania Health System
| | | | - James Luo
- National Heart, Lung, and Blood Institute (NHLBI)
| | - Alex Rosenthal
- National Institute of Allergy and Infectious Diseases (NIAID)
| | - Kris Kandarpa
- National Institute of Biomedical Imaging and Bioengineering (NIBIB)
| | - Rebecca Rosen
- Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD)
| | | | - Debra Babcock
- National Institute of Neurological Disorders and Stroke (NINDS)
| | - Ben Xu
- National Institute on Alcohol Abuse and Alcoholism (NIAAA)
| | | |
Collapse
|
10
|
Steyaert S, Pizurica M, Nagaraj D, Khandelwal P, Hernandez-Boussard T, Gentles AJ, Gevaert O. Multimodal data fusion for cancer biomarker discovery with deep learning. NAT MACH INTELL 2023; 5:351-362. [PMID: 37693852 PMCID: PMC10484010 DOI: 10.1038/s42256-023-00633-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 02/17/2023] [Indexed: 09/12/2023]
Abstract
Technological advances now make it possible to study a patient from multiple angles with high-dimensional, high-throughput multi-scale biomedical data. In oncology, massive amounts of data are being generated ranging from molecular, histopathology, radiology to clinical records. The introduction of deep learning has significantly advanced the analysis of biomedical data. However, most approaches focus on single data modalities leading to slow progress in methods to integrate complementary data types. Development of effective multimodal fusion approaches is becoming increasingly important as a single modality might not be consistent and sufficient to capture the heterogeneity of complex diseases to tailor medical care and improve personalised medicine. Many initiatives now focus on integrating these disparate modalities to unravel the biological processes involved in multifactorial diseases such as cancer. However, many obstacles remain, including lack of usable data as well as methods for clinical validation and interpretation. Here, we cover these current challenges and reflect on opportunities through deep learning to tackle data sparsity and scarcity, multimodal interpretability, and standardisation of datasets.
Collapse
Affiliation(s)
- Sandra Steyaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | - Marija Pizurica
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
| | | | | | - Tina Hernandez-Boussard
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Andrew J Gentles
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University
- Department of Biomedical Data Science, Stanford University
| |
Collapse
|
11
|
Mooney SD. Technology Platforms and Approaches for Building and Evaluating Machine Learning Methods in Healthcare. J Appl Lab Med 2023; 8:194-202. [PMID: 36610427 PMCID: PMC10729736 DOI: 10.1093/jalm/jfac113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 10/18/2022] [Indexed: 01/09/2023]
Abstract
BACKGROUND Artificial intelligence (AI) methods are becoming increasingly commonly implemented in healthcare as decision support, business intelligence tools, or, in some cases, Food and Drug Administration-approved clinical decision-makers. Advanced lab-based diagnostic tools are increasingly becoming AI driven. The path from data to machine learning methods is an active area for research and quality improvement, and there are few established best practices. With data being generated at an unprecedented rate, there is a need for processes that enable data science investigation that protect patient privacy and minimize other business risks. New approaches for data sharing are being utilized that lower these risks. CONTENT In this short review, clinical and translational AI governance is introduced along with approaches for securely building, sharing, and validating accurate and fair models. This is a constantly evolving field, and there is much interest in collecting data using standards, sharing data, building new models, evaluating models, sharing models, and, of course, implementing models into practice. SUMMARY AI is an active area of research and development broadly for healthcare and laboratory testing. Robust data governance and machine learning methodological governance are required. New approaches for data sharing are enabling the development of models and their evaluation. Evaluation of methods is difficult, particularly when the evaluation is performed by the team developing the method, and should ideally be prospective. New technologies have enabled standardization of platforms for moving analytics and data science methods.
Collapse
Affiliation(s)
- Sean D Mooney
- Institute for Medical Data Science and Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| |
Collapse
|
12
|
Upshaw SJ, Jensen JD, Giorgi EA, Pokharel M, Lillie HM, Adams DR, John KK, Wu YP, Grossman D. Developing skin cancer education materials for darker skin populations: crowdsourced design, message targeting, and acral lentiginous melanoma. J Behav Med 2022:10.1007/s10865-022-00362-x. [PMID: 36125669 DOI: 10.1007/s10865-022-00362-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 09/28/2021] [Indexed: 11/26/2022]
Abstract
Despite decreased susceptibility, darker skin individuals who develop melanoma have worse survival. This disparity in melanoma mortality is the largest for any cancer, and partly driven by a lack of patient education materials targeted to darker skin populations in whom acral lentiginous melanoma (ALM) is the most common subtype. To address this communication disparity, the current study reports a multi-phase design process that leverages crowdsourcing and message testing to develop ALM-focused patient education materials for darker skin populations. Crowdsourced design was utilized to develop a pool of designs (phase 1), the pool was narrowed and thematically analyzed (phase 2), and select designs were evaluated via a message experiment (N = 1877). For darker skin populations, designs that depicted people enhanced knowledge of ALM through message memorability. The current study engages melanoma disparities by providing ALM patient education materials for darker skin populations vetted via a multi-phase process.
Collapse
Affiliation(s)
- Sean J Upshaw
- Moody College of Communication, University of Texas-Austin, Austin, TX, USA.
| | - Jakob D Jensen
- Department of Communication, University of Utah, Salt Lake City, UT, USA
| | - Elizabeth A Giorgi
- Department of Communication, University of Utah, Salt Lake City, UT, USA
| | - Manusheela Pokharel
- Department of Communication Studies, Texas State University, San Marcos, TX, USA
| | - Helen M Lillie
- Department of Communication Studies, University of Iowa, Iowa City, IA, USA
| | - Dallin R Adams
- Department of Communication, University of Utah, Salt Lake City, UT, USA
| | - Kevin K John
- School of Communications, Brigham Young University, Provo, UT, USA
| | - Yelena P Wu
- Department of Dermatology, University of Utah, Salt Lake City, UT, USA
- Huntsman Cancer Institute, Salt Lake City, UT, USA
| | - Douglas Grossman
- Department of Dermatology, University of Utah, Salt Lake City, UT, USA
- Huntsman Cancer Institute, Salt Lake City, UT, USA
| |
Collapse
|
13
|
Pan X, Lin X, Cao D, Zeng X, Yu PS, He L, Nussinov R, Cheng F. Deep learning for drug repurposing: Methods, databases, and applications. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1597] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Xiaoqin Pan
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Xuan Lin
- School of Computer Science Xiangtan University Xiangtan China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education Xiangtan University Xiangtan China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences Central South University Changsha China
| | - Xiangxiang Zeng
- School of Computer Science and Engineering Hunan University Changsha Hunan China
| | - Philip S. Yu
- Department of Computer Science University of Illinois at Chicago Chicago Illinois USA
| | - Lifang He
- Department of Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
| | - Ruth Nussinov
- Computational Structural Biology Section, Basic Science Program, Frederick National Laboratory for Cancer Research National Cancer Institute at Frederick Frederick Maryland USA
- Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine Tel Aviv University Tel Aviv Israel
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic Cleveland Ohio USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine Case Western Reserve University Cleveland Ohio USA
- Case Comprehensive Cancer Center Case Western Reserve University School of Medicine Cleveland Ohio USA
| |
Collapse
|
14
|
Kanduri C, Pavlović M, Scheffer L, Motwani K, Chernigovskaya M, Greiff V, Sandve GK. Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification. Gigascience 2022; 11:giac046. [PMID: 35639633 PMCID: PMC9154052 DOI: 10.1093/gigascience/giac046] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 12/23/2021] [Accepted: 04/08/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Machine learning (ML) methodology development for the classification of immune states in adaptive immune receptor repertoires (AIRRs) has seen a recent surge of interest. However, so far, there does not exist a systematic evaluation of scenarios where classical ML methods (such as penalized logistic regression) already perform adequately for AIRR classification. This hinders investigative reorientation to those scenarios where method development of more sophisticated ML approaches may be required. RESULTS To identify those scenarios where a baseline ML method is able to perform well for AIRR classification, we generated a collection of synthetic AIRR benchmark data sets encompassing a wide range of data set architecture-associated and immune state-associated sequence patterns (signal) complexity. We trained ≈1,700 ML models with varying assumptions regarding immune signal on ≈1,000 data sets with a total of ≈250,000 AIRRs containing ≈46 billion TCRβ CDR3 amino acid sequences, thereby surpassing the sample sizes of current state-of-the-art AIRR-ML setups by two orders of magnitude. We found that L1-penalized logistic regression achieved high prediction accuracy even when the immune signal occurs only in 1 out of 50,000 AIR sequences. CONCLUSIONS We provide a reference benchmark to guide new AIRR-ML classification methodology by (i) identifying those scenarios characterized by immune signal and data set complexity, where baseline methods already achieve high prediction accuracy, and (ii) facilitating realistic expectations of the performance of AIRR-ML models given training data set properties and assumptions. Our study serves as a template for defining specialized AIRR benchmark data sets for comprehensive benchmarking of AIRR-ML methods.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Milena Pavlović
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Lonneke Scheffer
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| | - Keshav Motwani
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida,
FL 32610, USA
| | - Maria Chernigovskaya
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, 0372, Norway
| | - Victor Greiff
- Department of Immunology and Oslo University Hospital, University of Oslo, Oslo, 0372, Norway
| | - Geir K Sandve
- Centre for Bioinformatics, Department of Informatics, University of Oslo, Oslo 0373, Norway
| |
Collapse
|
15
|
Dey S, Chakraborty P, Kwon BC, Dhurandhar A, Ghalwash M, Suarez Saiz FJ, Ng K, Sow D, Varshney KR, Meyer P. Human-centered explainability for life sciences, healthcare, and medical informatics. PATTERNS (NEW YORK, N.Y.) 2022; 3:100493. [PMID: 35607616 PMCID: PMC9122967 DOI: 10.1016/j.patter.2022.100493] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Rapid advances in artificial intelligence (AI) and availability of biological, medical, and healthcare data have enabled the development of a wide variety of models. Significant success has been achieved in a wide range of fields, such as genomics, protein folding, disease diagnosis, imaging, and clinical tasks. Although widely used, the inherent opacity of deep AI models has brought criticism from the research field and little adoption in clinical practice. Concurrently, there has been a significant amount of research focused on making such methods more interpretable, reviewed here, but inherent critiques of such explainability in AI (XAI), its requirements, and concerns with fairness/robustness have hampered their real-world adoption. We here discuss how user-driven XAI can be made more useful for different healthcare stakeholders through the definition of three key personas-data scientists, clinical researchers, and clinicians-and present an overview of how different XAI approaches can address their needs. For illustration, we also walk through several research and clinical examples that take advantage of XAI open-source tools, including those that help enhance the explanation of the results through visualization. This perspective thus aims to provide a guidance tool for developing explainability solutions for healthcare by empowering both subject matter experts, providing them with a survey of available tools, and explainability developers, by providing examples of how such methods can influence in practice adoption of solutions.
Collapse
Affiliation(s)
- Sanjoy Dey
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Prithwish Chakraborty
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Bum Chul Kwon
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Amit Dhurandhar
- IBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Mohamed Ghalwash
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
- Ain Shams University, Cairo, Egypt
| | | | - Kenney Ng
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Daby Sow
- IBM Research Security and Compliance, AI Industries, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Kush R. Varshney
- IBM Research AI, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Pablo Meyer
- Center for Computational Health, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA
| |
Collapse
|
16
|
Morales-Alvarez P, Ruiz P, Coughlin S, Molina R, Katsaggelos AK. Scalable Variational Gaussian Processes for Crowdsourcing: Glitch Detection in LIGO. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:1534-1551. [PMID: 32956038 DOI: 10.1109/tpami.2020.3025390] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the last years, crowdsourcing is transforming the way classification training sets are obtained. Instead of relying on a single expert annotator, crowdsourcing shares the labelling effort among a large number of collaborators. For instance, this is being applied in the laureate laser interferometer gravitational waves observatory (LIGO), in order to detect glitches which might hinder the identification of true gravitational-waves. The crowdsourcing scenario poses new challenging difficulties, as it has to deal with different opinions from a heterogeneous group of annotators with unknown degrees of expertise. Probabilistic methods, such as Gaussian processes (GP), have proven successful in modeling this setting. However, GPs do not scale up well to large data sets, which hampers their broad adoption in real-world problems (in particular LIGO). This has led to the very recent introduction of deep learning based crowdsourcing methods, which have become the state-of-the-art for this type of problems. However, the accurate uncertainty quantification provided by GPs has been partially sacrificed. This is an important aspect for astrophysicists in LIGO, since a glitch detection system should provide very accurate probability distributions of its predictions. In this work, we first leverage a standard sparse GP approximation (SVGP) to develop a GP-based crowdsourcing method that factorizes into mini-batches. This makes it able to cope with previously-prohibitive data sets. This first approach, which we refer to as scalable variational Gaussian processes for crowdsourcing (SVGPCR), brings back GP-based methods to a state-of-the-art level, and excels at uncertainty quantification. SVGPCR is shown to outperform deep learning based methods and previous probabilistic ones when applied to the LIGO data. Its behavior and main properties are carefully analyzed in a controlled experiment based on the MNIST data set. Moreover, recent GP inference techniques are also adapted to crowdsourcing and evaluated experimentally.
Collapse
|
17
|
Open Innovation in Times of Crisis: An Overview of the Healthcare Sector in Response to the COVID-19 Pandemic. JOURNAL OF OPEN INNOVATION: TECHNOLOGY, MARKET, AND COMPLEXITY 2022; 8. [PMCID: PMC9906727 DOI: 10.3390/joitmc8010021] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The COVID-19 pandemic has caused huge and disruptive technological changes in the healthcare sector, transforming the way businesses and societies function. To respond to the global health crisis, there have been numerous innovation projects in the healthcare sector, including the fast design and manufacturing of personal protective equipment (PPE) and medical devices, and testing, treatment, and vaccine technologies. Many of these innovative activities happen beyond organizational boundaries with collaboration and open innovation. In this paper, we review the current literature on open innovation strategy during the pandemic and adopt the co-evolution view of business ecosystems to address the context of change. Based on a detailed exploration of the COVID-19-related technologies in the UK and global healthcare sectors, we identify the key emerging themes of open innovation in crisis. Further discussions are conducted in relation to each theme. Our results and analysis can help provide policy recommendations for the healthcare sector, businesses, and society to recover from the crisis.
Collapse
|
18
|
The ability to classify patients based on gene-expression data varies by algorithm and performance metric. PLoS Comput Biol 2022; 18:e1009926. [PMID: 35275931 PMCID: PMC8942277 DOI: 10.1371/journal.pcbi.1009926] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 03/23/2022] [Accepted: 02/15/2022] [Indexed: 01/02/2023] Open
Abstract
By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist-and most support diverse hyperparameters-so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.
Collapse
|
19
|
Douglass EF, Allaway RJ, Szalai B, Wang W, Tian T, Fernández-Torras A, Realubit R, Karan C, Zheng S, Pessia A, Tanoli Z, Jafari M, Wan F, Li S, Xiong Y, Duran-Frigola M, Bertoni M, Badia-i-Mompel P, Mateo L, Guitart-Pla O, Chung V, Tang J, Zeng J, Aloy P, Saez-Rodriguez J, Guinney J, Gerhard DS, Califano A. A community challenge for a pancancer drug mechanism of action inference from perturbational profile data. Cell Rep Med 2022; 3:100492. [PMID: 35106508 PMCID: PMC8784774 DOI: 10.1016/j.xcrm.2021.100492] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 08/08/2021] [Accepted: 12/15/2021] [Indexed: 12/14/2022]
Abstract
The Columbia Cancer Target Discovery and Development (CTD2) Center is developing PANACEA, a resource comprising dose-responses and RNA sequencing (RNA-seq) profiles of 25 cell lines perturbed with ∼400 clinical oncology drugs, to study a tumor-specific drug mechanism of action. Here, this resource serves as the basis for a DREAM Challenge assessing the accuracy and sensitivity of computational algorithms for de novo drug polypharmacology predictions. Dose-response and perturbational profiles for 32 kinase inhibitors are provided to 21 teams who are blind to the identity of the compounds. The teams are asked to predict high-affinity binding targets of each compound among ∼1,300 targets cataloged in DrugBank. The best performing methods leverage gene expression profile similarity analysis as well as deep-learning methodologies trained on individual datasets. This study lays the foundation for future integrative analyses of pharmacogenomic data, reconciliation of polypharmacology effects in different tumor contexts, and insights into network-based assessments of drug mechanisms of action. Drug-perturbed RNA sequencing data can be used to identify drug targets Technology-based drug-target definitions often subsume literature definitions Literature and screening datasets provide complementary information on drug mechanisms
Collapse
Affiliation(s)
- Eugene F. Douglass
- Department of Systems Biology, Columbia University Irving Medical Center, 1130 Saint Nicholas Ave., New York, NY 10032, USA
- Pharmaceutical and Biomedical Sciences, University of Georgia, 250 W. Green Street, Athens, GA 30602, USA
| | - Robert J. Allaway
- Computational Oncology Group, Sage Bionetworks, 2901 Third Ave., Ste 330, Seattle, WA 98121, USA
| | - Bence Szalai
- Semmelweis University, Faculty of Medicine, Department of Physiology, Budapest, Hungary
| | - Wenyu Wang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Adrià Fernández-Torras
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Ron Realubit
- Department of Systems Biology, Columbia University Irving Medical Center, 1130 Saint Nicholas Ave., New York, NY 10032, USA
| | - Charles Karan
- Department of Systems Biology, Columbia University Irving Medical Center, 1130 Saint Nicholas Ave., New York, NY 10032, USA
| | - Shuyu Zheng
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Alberto Pessia
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Ziaurrehman Tanoli
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Mohieddin Jafari
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Fangping Wan
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Yuanpeng Xiong
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
| | - Miquel Duran-Frigola
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Martino Bertoni
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Pau Badia-i-Mompel
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Lídia Mateo
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Oriol Guitart-Pla
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Verena Chung
- Computational Oncology Group, Sage Bionetworks, 2901 Third Ave., Ste 330, Seattle, WA 98121, USA
| | | | - Jing Tang
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Patrick Aloy
- Joint IRB-BSC-CRG Program in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Justin Guinney
- Computational Oncology Group, Sage Bionetworks, 2901 Third Ave., Ste 330, Seattle, WA 98121, USA
| | - Daniela S. Gerhard
- Office of Cancer Genomics, National Cancer Institute, NIH, Bethesda, MD 20892, USA
| | - Andrea Califano
- Department of Systems Biology, Columbia University Irving Medical Center, 1130 Saint Nicholas Ave., New York, NY 10032, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, 1130 Saint Nicholas Ave., New York, NY 10032, USA
- Department of Medicine, Columbia University Irving Medical Center, 630 W 168th Street, New York, NY 10032, USA
- Department of Biochemistry & Molecular Biophysics, Columbia University Irving Medical Center, 701 W 168th Street, New York, NY 10032, USA
- Department of Biomedical Informatics, Columbia University Irving Medical Center, 622 W 168th Street, New York, NY 10032, USA
- Corresponding author
| |
Collapse
|
20
|
Aktı Ş, Kamar D, Özlü ÖA, Soydemir I, Akcan M, Kul A, Rekik I. A comparative study of machine learning methods for predicting the evolution of brain connectivity from a baseline timepoint. J Neurosci Methods 2022; 368:109475. [PMID: 34995648 DOI: 10.1016/j.jneumeth.2022.109475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 12/27/2021] [Accepted: 01/02/2022] [Indexed: 01/21/2023]
Abstract
BACKGROUND Predicting the evolution of the brain network, also called connectome, by foreseeing changes in the connectivity weights linking pairs of anatomical regions makes it possible to spot connectivity-related neurological disorders in earlier stages and detect the development of potential connectomic anomalies. Remarkably, such a challenging prediction problem remains least explored in the predictive connectomics literature. It is a known fact that machine learning (ML) methods have proven their predictive abilities in a wide variety of computer vision problems. However, ML techniques specifically tailored for the prediction of brain connectivity evolution trajectory from a single timepoint are almost absent. NEW METHOD To fill this gap, we organized a Kaggle competition where 20 competing teams designed advanced machine learning pipelines for predicting the brain connectivity evolution from a single timepoint. The teams developed their ML pipelines with combination of data pre-processing, dimensionality reduction and learning methods. Each ML framework inputs a baseline brain connectivity matrix observed at baseline timepoint t0 and outputs the brain connectivity map at a follow-up timepoint t1. The longitudinal OASIS-2 dataset was used for model training and evaluation. Both random data split and 5-fold cross-validation strategies were used for ranking and evaluating the generalizability and scalability of each competing ML pipeline. RESULTS Utilizing an inclusive approach, we ranked the methods based on two complementary evaluation metrics (mean absolute error (MAE) and Pearson Correlation Coefficient (PCC)) and their performances using different training and testing data perturbation strategies (single random split and cross-validation). The final rank was calculated using the rank product for each competing team across all evaluation measures and validation strategies. Furthermore, we added statistical significance values to each proposed pipeline. CONCLUSION In support of open science, the developed 20 ML pipelines along with the connectomic dataset are made available on GitHub (https://github.com/basiralab/Kaggle-BrainNetPrediction-Toolbox). The outcomes of this competition are anticipated to lead the further development of predictive models that can foresee the evolution of the brain connectivity over time, as well as other types of networks (e.g., genetic networks).
Collapse
Affiliation(s)
- Şeymanur Aktı
- Faculty of Computer and Informatics, Istanbul Technical University, Turkey.
| | - Doğay Kamar
- Faculty of Computer and Informatics, Istanbul Technical University, Turkey.
| | - Özgür Anıl Özlü
- Faculty of Computer and Informatics, Istanbul Technical University, Turkey
| | - Ihsan Soydemir
- Faculty of Computer and Informatics, Istanbul Technical University, Turkey
| | - Muhammet Akcan
- Faculty of Computer and Informatics, Istanbul Technical University, Turkey
| | - Abdullah Kul
- Faculty of Computer and Informatics, Istanbul Technical University, Turkey
| | - Islem Rekik
- BASIRA lab, Faculty of Computer and Informatics, Istanbul Technical University, Istanbul, Turkey; School of Science and Engineering, Computing, University of Dundee, UK.
| |
Collapse
|
21
|
Kong W, Midena G, Chen Y, Athanasiadis P, Wang T, Rousu J, He L, Aittokallio T. Systematic review of computational methods for drug combination prediction. Comput Struct Biotechnol J 2022; 20:2807-2814. [PMID: 35685365 PMCID: PMC9168078 DOI: 10.1016/j.csbj.2022.05.055] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 05/27/2022] [Accepted: 05/27/2022] [Indexed: 12/26/2022] Open
Abstract
Synergistic effects between drugs are rare and highly context-dependent and patient-specific. Hence, there is a need to develop novel approaches to stratify patients for optimal therapy regimens, especially in the context of personalized design of combinatorial treatments. Computational methods enable systematic in-silico screening of combination effects, and can thereby prioritize most potent combinations for further testing, among the massive number of potential combinations. To help researchers to choose a prediction method that best fits for various real-world applications, we carried out a systematic literature review of 117 computational methods developed to date for drug combination prediction, and classified the methods in terms of their combination prediction tasks and input data requirements. Most current methods focus on prediction or classification of combination synergy, and only a few methods consider the efficacy and potential toxicity of the combinations, which are the key determinants of therapeutic success of drug treatments. Furthermore, there is a need to further develop methods that enable dose-specific predictions of combination effects across multiple doses, which is important for clinical translation of the predictions, as well as model-based identification of biomarkers predictive of heterogeneous drug combination responses. Even if most of the computational methods reviewed focus on anticancer applications, many of the modelling approaches are also applicable to antiviral and other diseases or indications.
Collapse
|
22
|
Daradkeh M. The Relationship Between Persuasion Cues and Idea Adoption in Virtual Crowdsourcing Communities. INTERNATIONAL JOURNAL OF KNOWLEDGE MANAGEMENT 2022. [DOI: 10.4018/ijkm.291708] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Building on the elaboration likelihood model (ELM) and absorptive capacity, this study develops a four-dimensional model of idea adoption in Virtual Crowdsourcing Communities (VCCs) and examines the influence of different persuasion cues on idea adoption. The research model was tested using hierarchical logistic regression based on a dataset from the Tableau community. The results show that both community recognition of users and community recognition of ideas are positively related to idea adoption. Proactive user engagement has a significant positive impact on idea adoption, while reactive user engagement has no significant impact. Idea content quality, represented by idea length and supporting arguments, has an inverted U-shaped relationship with idea adoption. Community absorptive capacity positively moderates the curvilinear relationship between idea content quality and idea adoption. These results contribute to a better elucidation of the persuasion mechanisms underlying idea adoption in VCCs, and thus provide important implications for open innovation research and practice.
Collapse
Affiliation(s)
- Mohammad Daradkeh
- University of Dubai, United Arab Emirates & Yarmouk University, Jordan
| |
Collapse
|
23
|
Innovation in business Intelligence systems. INTERNATIONAL JOURNAL OF INFORMATION SYSTEMS IN THE SERVICE SECTOR 2022. [DOI: 10.4018/ijisss.302885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Innovation crowdsourcing communities play a central role for companies to advance their innovation capabilities and portfolio by leveraging crowd intelligence and knowledge. However, it remains unclear how the mechanisms and structure of innovation crowdsourcing communities affect firms' innovation performance. Based on the open innovation theory and knowledge-based view (KBV), this study develops a research model to investigate how the structure and mechanisms of innovation crowdsourcing influence firms' knowledge management and innovation performance. The model was tested using structural equation modeling based on a dataset from the Microsoft community for business intelligence tools. The results show that both organizational and technical mechanisms of the community positively influence the community structure. The community structure positively influences knowledge acquisition, knowledge transformation, and the size and diversity of crowd participation. In turn, innovation crowdsourcing mechanisms and knowledge transformation have a strong influence on innovation performance.
Collapse
|
24
|
Fitzpatrick R, Stefan MI. Validation Through Collaboration: Encouraging Team Efforts to Ensure Internal and External Validity of Computational Models of Biochemical Pathways. Neuroinformatics 2022; 20:277-284. [PMID: 35543917 PMCID: PMC9537119 DOI: 10.1007/s12021-022-09584-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/17/2022] [Indexed: 01/09/2023]
Abstract
Computational modelling of biochemical reaction pathways is an increasingly important part of neuroscience research. In order to be useful, computational models need to be valid in two senses: First, they need to be consistent with experimental data and able to make testable predictions (external validity). Second, they need to be internally consistent and independently reproducible (internal validity). Here, we discuss both types of validity and provide a brief overview of tools and technologies used to ensure they are met. We also suggest the introduction of new collaborative technologies to ensure model validity: an incentivised experimental database for external validity and reproducibility audits for internal validity. Both rely on FAIR principles and on collaborative science practices.
Collapse
Affiliation(s)
- Richard Fitzpatrick
- Centre for Discovery Brain Sciences, University of Edinburgh, Edinburgh, UK ,School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Melanie I. Stefan
- Centre for Discovery Brain Sciences, University of Edinburgh, Edinburgh, UK ,ZJU-UoE Institute, Zhejiang University, Haining, China
| |
Collapse
|
25
|
Gabor A, Tognetti M, Driessen A, Tanevski J, Guo B, Cao W, Shen H, Yu T, Chung V, Bodenmiller B, Saez‐Rodriguez J. Cell-to-cell and type-to-type heterogeneity of signaling networks: insights from the crowd. Mol Syst Biol 2021; 17:e10402. [PMID: 34661974 PMCID: PMC8522707 DOI: 10.15252/msb.202110402] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 09/27/2021] [Accepted: 09/28/2021] [Indexed: 12/30/2022] Open
Abstract
Recent technological developments allow us to measure the status of dozens of proteins in individual cells. This opens the way to understand the heterogeneity of complex multi-signaling networks across cells and cell types, with important implications to understand and treat diseases such as cancer. These technologies are, however, limited to proteins for which antibodies are available and are fairly costly, making predictions of new markers and of existing markers under new conditions a valuable alternative. To assess our capacity to make such predictions and boost further methodological development, we organized the Single Cell Signaling in Breast Cancer DREAM challenge. We used a mass cytometry dataset, covering 36 markers in over 4,000 conditions totaling 80 million single cells across 67 breast cancer cell lines. Through four increasingly difficult subchallenges, the participants predicted missing markers, new conditions, and the time-course response of single cells to stimuli in the presence and absence of kinase inhibitors. The challenge results show that despite the stochastic nature of signal transduction in single cells, the signaling events are tightly controlled and machine learning methods can accurately predict new experimental data.
Collapse
Affiliation(s)
- Attila Gabor
- Institute for Computational BiomedicineHeidelberg University and Heidelberg University HospitalFaculty of MedicineBioquantHeidelbergGermany
| | - Marco Tognetti
- Department of Quantitative Biomedicine & Institute of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Institute of Molecular Systems BiologyETH ZurichZurichSwitzerland
- Molecular Life Science PhD ProgramLife Science Zurich Graduate SchoolETH Zurich and University of ZurichZurichSwitzerland
| | - Alice Driessen
- Institute for Computational BiomedicineHeidelberg University and Heidelberg University HospitalFaculty of MedicineBioquantHeidelbergGermany
| | - Jovan Tanevski
- Institute for Computational BiomedicineHeidelberg University and Heidelberg University HospitalFaculty of MedicineBioquantHeidelbergGermany
| | - Baosen Guo
- Division of AI & BioinformaticsShenzhen Digital Life InstituteShenzhenChina
| | - Wencai Cao
- Division of AI & BioinformaticsShenzhen Digital Life InstituteShenzhenChina
| | - He Shen
- Division of AI & BioinformaticsShenzhen Digital Life InstituteShenzhenChina
| | | | | | | | - Bernd Bodenmiller
- Department of Quantitative Biomedicine & Institute of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
| | - Julio Saez‐Rodriguez
- Institute for Computational BiomedicineHeidelberg University and Heidelberg University HospitalFaculty of MedicineBioquantHeidelbergGermany
| |
Collapse
|
26
|
Gong W, Granados AA, Hu J, Jones MG, Raz O, Salvador-Martínez I, Zhang H, Chow KHK, Kwak IY, Retkute R, Prusokiene A, Prusokas A, Khodaverdian A, Zhang R, Rao S, Wang R, Rennert P, Saipradeep VG, Sivadasan N, Rao A, Joseph T, Srinivasan R, Peng J, Han L, Shang X, Garry DJ, Yu T, Chung V, Mason M, Liu Z, Guan Y, Yosef N, Shendure J, Telford MJ, Shapiro E, Elowitz MB, Meyer P. Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees. Cell Syst 2021; 12:810-826.e4. [PMID: 34146472 DOI: 10.1016/j.cels.2021.05.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 02/01/2021] [Accepted: 05/11/2021] [Indexed: 12/20/2022]
Abstract
The recent advent of CRISPR and other molecular tools enabled the reconstruction of cell lineages based on induced DNA mutations and promises to solve the ones of more complex organisms. To date, no lineage reconstruction algorithms have been rigorously examined for their performance and robustness across dataset types and number of cells. To benchmark such methods, we decided to organize a DREAM challenge using in vitro experimental intMEMOIR recordings and in silico data for a C. elegans lineage tree of about 1,000 cells and a Mus musculus tree of 10,000 cells. Some of the 22 approaches submitted had excellent performance, but structural features of the trees prevented optimal reconstructions. Using smaller sub-trees as training sets proved to be a good approach for tuning algorithms to reconstruct larger trees. The simulation and reconstruction methods here generated delineate a potential way forward for solving larger cell lineage trees such as in mouse.
Collapse
Affiliation(s)
- Wuming Gong
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114, USA
| | | | - Jingyuan Hu
- Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX 77030, USA
| | - Matthew G Jones
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, CA, USA; Integrative Program of Quantitative Biology, University of California, San Francisco, San Francisco, CA, USA
| | - Ofir Raz
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Irepan Salvador-Martínez
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Hanrui Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Ke-Huan K Chow
- California Institute of Technology, Pasadena, CA 91125, USA
| | - Il-Youp Kwak
- Department of Applied Statistics, College of Business & Economics, Chung-Ang University, 84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea
| | - Renata Retkute
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, UK
| | - Alisa Prusokiene
- School of Natural and Environmental Sciences, Newcastle University, Newcastle NE1 7RU, UK
| | | | - Alex Khodaverdian
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Richard Zhang
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Suhas Rao
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Robert Wang
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Phil Rennert
- EC Wise Inc., 1299 4th St #505, San Rafael, CA 94901, USA
| | | | - Naveen Sivadasan
- TCS Research and Innovation, Tata Consultancy Services, Hyderabad 500019, India
| | - Aditya Rao
- TCS Research and Innovation, Tata Consultancy Services, Hyderabad 500019, India
| | - Thomas Joseph
- TCS Research and Innovation, Tata Consultancy Services, Hyderabad 500019, India
| | - Rajgopal Srinivasan
- TCS Research and Innovation, Tata Consultancy Services, Hyderabad 500019, India
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Lu Han
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Daniel J Garry
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114, USA
| | - Thomas Yu
- Sage Bionetworks, 2901 3rd Ave #330, Seattle, WA 98121, USA
| | - Verena Chung
- Sage Bionetworks, 2901 3rd Ave #330, Seattle, WA 98121, USA
| | - Michael Mason
- Sage Bionetworks, 2901 3rd Ave #330, Seattle, WA 98121, USA
| | - Zhandong Liu
- Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nir Yosef
- Department of Electrical Engineering & Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, Seattle, WA, USA
| | - Maximilian J Telford
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| | - Ehud Shapiro
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 761001, Israel
| | | | - Pablo Meyer
- T.J. Watson Research Center, IBM, Healthcare & Life Sciences, 1101 Kitchawan Rd 10598, Yorktown Heights, NY 10598, USA.
| |
Collapse
|
27
|
Banda JM, Tekumalla R, Wang G, Yu J, Liu T, Ding Y, Artemova E, Tutubalina E, Chowell G. A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration. EPIDEMIOLOGIA 2021; 2:315-324. [PMID: 36417228 PMCID: PMC9620940 DOI: 10.3390/epidemiologia2030024] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 07/28/2021] [Accepted: 07/29/2021] [Indexed: 12/14/2022] Open
Abstract
As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.
Collapse
Affiliation(s)
- Juan M. Banda
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA;
| | - Ramya Tekumalla
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA;
| | - Guanyu Wang
- Missouri School of Journalism, University of Missouri, Columbia, MO 65201, USA;
| | - Jingyuan Yu
- Department of Social Psychology, Universitat Autònoma de Barcelona, 08035 Barcelona, Spain;
| | - Tuo Liu
- Department of Psychology, Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany;
| | - Yuning Ding
- Language Technology Lab, Universität Duisburg-Essen, 47057 Duisburg, Germany;
| | - Ekaterina Artemova
- Faculty of Computer Science, Higher School of Economics—National Research University, 101000 Moscow, Russia;
| | - Elena Tutubalina
- Faculty of Chemistry, Kazan Federal University, 420008 Kazan, Russia;
| | - Gerardo Chowell
- Department of Population Health Sciences, Georgia State University, Atlanta, GA 30303, USA;
| |
Collapse
|
28
|
Cricelli L, Grimaldi M, Vermicelli S. Crowdsourcing and open innovation: a systematic literature review, an integrated framework and a research agenda. REVIEW OF MANAGERIAL SCIENCE 2021. [DOI: 10.1007/s11846-021-00482-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
AbstractIn recent years, Open Innovation (OI) and crowdsourcing have been very popular topics in the innovation management literature, attracting significant interest and attention, and inspiring a rich production of publications. Although these two topics share common themes and address similar managerial challenges, to the best of our knowledge, there is no systematic literature review that digs deep into the intersection of both fields. To fill in this gap a joint review of crowdsourcing and OI topics is both timely and of interest. Therefore, the main objective of this study is to carry out a comprehensive, systematic, and objective review of academic research to help shed light on the relationship between OI and crowdsourcing. For this purpose, we reviewed the literature published on these two topics between 2008 and 2019, applying two bibliometric techniques, co-citation and co-word analysis. We obtained the following results: (i) we provide a qualitative analysis of the emerging and trending themes, (ii) we discuss a characterization of the intersection between OI and crowdsourcing, identifying four dimensions (strategic, managerial, behavioral, and technological), (iii) we present a schematic reconceptualization of the thematic clusters, proposing an integrated view. We conclude by suggesting promising opportunities for future research.
Collapse
|
29
|
Seaby EG, Ennis S. Challenges in the diagnosis and discovery of rare genetic disorders using contemporary sequencing technologies. Brief Funct Genomics 2021; 19:243-258. [PMID: 32393978 DOI: 10.1093/bfgp/elaa009] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Next generation sequencing (NGS) has revolutionised rare disease diagnostics. Concomitant with advancing technologies has been a rise in the number of new gene disorders discovered and diagnoses made for patients and their families. However, despite the trend towards whole exome and whole genome sequencing, diagnostic rates remain suboptimal. On average, only ~30% of patients receive a molecular diagnosis. National sequencing projects launched in the last 5 years are integrating clinical diagnostic testing with research avenues to widen the spectrum of known genetic disorders. Consequently, efforts to diagnose genetic disorders in a clinical setting are now often shared with efforts to prioritise candidate variants for the detection of new disease genes. Herein we discuss some of the biggest obstacles precluding molecular diagnosis and discovery of new gene disorders. We consider bioinformatic and analytical challenges faced when interpreting next generation sequencing data and showcase some of the newest tools available to mitigate these issues. We consider how incomplete penetrance, non-coding variation and structural variants are likely to impact diagnostic rates, and we further discuss methods for uplifting novel gene discovery by adopting a gene-to-patient-based approach.
Collapse
|
30
|
Tarca AL, Pataki BÁ, Romero R, Sirota M, Guan Y, Kutum R, Gomez-Lopez N, Done B, Bhatti G, Yu T, Andreoletti G, Chaiworapongsa T, Hassan SS, Hsu CD, Aghaeepour N, Stolovitzky G, Csabai I, Costello JC. Crowdsourcing assessment of maternal blood multi-omics for predicting gestational age and preterm birth. Cell Rep Med 2021; 2:100323. [PMID: 34195686 PMCID: PMC8233692 DOI: 10.1016/j.xcrm.2021.100323] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 01/18/2021] [Accepted: 05/20/2021] [Indexed: 12/15/2022]
Abstract
Identification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, remains challenging given the syndromic nature of the disease. We report a longitudinal multi-omics study coupled with a DREAM challenge to develop predictive models of PTB. The findings indicate that whole-blood gene expression predicts ultrasound-based gestational ages in normal and complicated pregnancies (r = 0.83) and, using data collected before 37 weeks of gestation, also predicts the delivery date in both normal pregnancies (r = 0.86) and those with spontaneous preterm birth (r = 0.75). Based on samples collected before 33 weeks in asymptomatic women, our analysis suggests that expression changes preceding preterm prelabor rupture of the membranes are consistent across time points and cohorts and involve leukocyte-mediated immunity. Models built from plasma proteomic data predict spontaneous preterm delivery with intact membranes with higher accuracy and earlier in pregnancy than transcriptomic models (AUROC = 0.76 versus AUROC = 0.6 at 27-33 weeks of gestation).
Collapse
Affiliation(s)
- Adi L. Tarca
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Department of Computer Science, Wayne State University College of Engineering, Detroit, MI 48202, USA
| | - Bálint Ármin Pataki
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Roberto Romero
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48201, USA
- Detroit Medical Center, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Florida International University, Miami, FL 33199, USA
| | - Marina Sirota
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Rintu Kutum
- Informatics and Big Data Unit, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Nardhy Gomez-Lopez
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Department of Biochemistry, Microbiology, and Immunology, Wayne State University School of Medicine, Detroit, MI 48201 USA
| | - Bogdan Done
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
| | - Gaurav Bhatti
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
| | | | - Gaia Andreoletti
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Tinnakorn Chaiworapongsa
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
| | - The DREAM Preterm Birth Prediction Challenge Consortium
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Department of Computer Science, Wayne State University College of Engineering, Detroit, MI 48202, USA
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
- Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48201, USA
- Detroit Medical Center, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Florida International University, Miami, FL 33199, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Informatics and Big Data Unit, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Department of Biochemistry, Microbiology, and Immunology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Sage Bionetworks, Seattle, WA, USA
- Office of Women’s Health, Integrative Biosciences Center, Wayne State University, Detroit, MI 48202, USA
- Department of Physiology, Wayne State University School of Medicine, Detroit, MI 48201, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Department of Pediatrics, and Department of Biomedical Data Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Sonia S. Hassan
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Office of Women’s Health, Integrative Biosciences Center, Wayne State University, Detroit, MI 48202, USA
- Department of Physiology, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - Chaur-Dong Hsu
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Department of Physiology, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative, and Pain Medicine, Department of Pediatrics, and Department of Biomedical Data Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Gustavo Stolovitzky
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Istvan Csabai
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
| | - James C. Costello
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
31
|
Cichońska A, Ravikumar B, Allaway RJ, Wan F, Park S, Isayev O, Li S, Mason M, Lamb A, Tanoli Z, Jeon M, Kim S, Popova M, Capuzzi S, Zeng J, Dang K, Koytiger G, Kang J, Wells CI, Willson TM, Oprea TI, Schlessinger A, Drewry DH, Stolovitzky G, Wennerberg K, Guinney J, Aittokallio T. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat Commun 2021; 12:3307. [PMID: 34083538 PMCID: PMC8175708 DOI: 10.1038/s41467-021-23165-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 04/15/2021] [Indexed: 12/31/2022] Open
Abstract
Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.
Collapse
Affiliation(s)
- Anna Cichońska
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Department of Computer Science, Helsinki Institute for Information Technology (HIIT), Aalto University, Espoo, Finland
- Department of Computing, University of Turku, Turku, Finland
| | - Balaguru Ravikumar
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | | | - Fangping Wan
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Sungjoon Park
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Michael Mason
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Andrew Lamb
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - Minji Jeon
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Sunkyu Kim
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Mariya Popova
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Stephen Capuzzi
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Kristen Dang
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | | | - Jaewoo Kang
- Department of Computer Science and Engineering, Korea University, Seoul, Republic of Korea
| | - Carrow I Wells
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Timothy M Willson
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Tudor I Oprea
- Translational Informatics Division and Comprehensive Cancer Center, University of New Mexico School of Medicine, Albuquerque, NM, USA
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David H Drewry
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | | | - Krister Wennerberg
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark.
| | - Justin Guinney
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA.
| | - Tero Aittokallio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
- Department of Computer Science, Helsinki Institute for Information Technology (HIIT), Aalto University, Espoo, Finland.
- Department of Mathematics and Statistics, University of Turku, Turku, Finland.
- Institute for Cancer Research, Oslo University Hospital, Oslo, Norway.
- Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway.
| |
Collapse
|
32
|
Meyer P, Saez-Rodriguez J. Advances in systems biology modeling: 10 years of crowdsourcing DREAM challenges. Cell Syst 2021; 12:636-653. [PMID: 34139170 DOI: 10.1016/j.cels.2021.05.015] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 03/29/2021] [Accepted: 05/18/2021] [Indexed: 02/07/2023]
Abstract
Computational and mathematical models are key to obtain a system-level understanding of biological processes, but their limitations have to be clearly defined to allow their proper application and interpretation. Crowdsourced benchmarks in the form of challenges provide an unbiased assessment of methods, and for the past decade, the Dialogue for Reverse Engineering Assessment and Methods (DREAM) organized more than 15 systems biology challenges. From transcription factor binding to dynamical network models, from signaling networks to gene regulation, from whole-cell models to cell-lineage reconstruction, and from single-cell positioning in a tissue to drug combinations and cell survival, the breadth is broad. To celebrate the 5-year anniversary of Cell Systems, we review the genesis of these systems biology challenges and discuss how interlocking the forward- and reverse-modeling paradigms allows to push the rim of systems biology. This approach will persist for systems levels approaches in biology and medicine.
Collapse
Affiliation(s)
- Pablo Meyer
- IBM T.J. Watson Research Center, Yorktown Heights, NY, USA.
| | - Julio Saez-Rodriguez
- Institute for Computational Biomedicine, Heidelberg University Hospital and Heidelberg University, Faculty of Medicine, Bioquant, Heidelberg 69120, Germany
| |
Collapse
|
33
|
Bülow RD, Dimitrov D, Boor P, Saez-Rodriguez J. How will artificial intelligence and bioinformatics change our understanding of IgA Nephropathy in the next decade? Semin Immunopathol 2021; 43:739-752. [PMID: 33835214 PMCID: PMC8551101 DOI: 10.1007/s00281-021-00847-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 02/17/2021] [Indexed: 01/16/2023]
Abstract
IgA nephropathy (IgAN) is the most common glomerulonephritis. It is characterized by the deposition of immune complexes containing immunoglobulin A (IgA) in the kidney’s glomeruli, triggering an inflammatory process. In many patients, the disease has a progressive course, eventually leading to end-stage kidney disease. The current understanding of IgAN’s pathophysiology is incomplete, with the involvement of several potential players, including the mucosal immune system, the complement system, and the microbiome. Dissecting this complex pathophysiology requires an integrated analysis across molecular, cellular, and organ scales. Such data can be obtained by employing emerging technologies, including single-cell sequencing, next-generation sequencing, proteomics, and complex imaging approaches. These techniques generate complex “big data,” requiring advanced computational methods for their analyses and interpretation. Here, we introduce such methods, focusing on the broad areas of bioinformatics and artificial intelligence and discuss how they can advance our understanding of IgAN and ultimately improve patient care. The close integration of advanced experimental and computational technologies with medical and clinical expertise is essential to improve our understanding of human diseases. We argue that IgAN is a paradigmatic disease to demonstrate the value of such a multidisciplinary approach.
Collapse
Affiliation(s)
- Roman David Bülow
- University Hospital RWTH Aachen, Institute of Pathology, Aachen, Germany
| | - Daniel Dimitrov
- Faculty of Medicine, Heidelberg University, Heidelberg, Germany
- Institute for Computational Biomedicine, Heidelberg University Hospital, Bioquant, Heidelberg, Germany
| | - Peter Boor
- University Hospital RWTH Aachen, Institute of Pathology, Aachen, Germany.
- Department of Nephrology and Immunology, University Hospital RWTH Aachen, Aachen, Germany.
| | - Julio Saez-Rodriguez
- Faculty of Medicine, Heidelberg University, Heidelberg, Germany.
- Institute for Computational Biomedicine, Heidelberg University Hospital, Bioquant, Heidelberg, Germany.
- Faculty of Medicine, Joint Research Centre for Computational Biomedicine (JRC-COMBINE), 52074, RWTH Aachen University, Aachen, Germany.
- Molecular Medicine Partnership Unit, European Molecular Biology Laboratory and Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
34
|
Mohr SE, Tattikota SG, Xu J, Zirin J, Hu Y, Perrimon N. Methods and tools for spatial mapping of single-cell RNAseq clusters in Drosophila. Genetics 2021; 217:6156631. [PMID: 33713129 DOI: 10.1093/genetics/iyab019] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 02/02/2021] [Indexed: 01/26/2023] Open
Abstract
Single-cell RNA sequencing (scRNAseq) experiments provide a powerful means to identify clusters of cells that share common gene expression signatures. A major challenge in scRNAseq studies is to map the clusters to specific anatomical regions along the body and within tissues. Existing data, such as information obtained from large-scale in situ RNA hybridization studies, cell type specific transcriptomics, gene expression reporters, antibody stainings, and fluorescent tagged proteins, can help to map clusters to anatomy. However, in many cases, additional validation is needed to precisely map the spatial location of cells in clusters. Several approaches are available for spatial resolution in Drosophila, including mining of existing datasets, and use of existing or new tools for direct or indirect detection of RNA, or direct detection of proteins. Here, we review available resources and emerging technologies that will facilitate spatial mapping of scRNAseq clusters at high resolution in Drosophila. Importantly, we discuss the need, available approaches, and reagents for multiplexing gene expression detection in situ, as in most cases scRNAseq clusters are defined by the unique coexpression of sets of genes.
Collapse
Affiliation(s)
- Stephanie E Mohr
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Sudhir Gopal Tattikota
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Jun Xu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Jonathan Zirin
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Yanhui Hu
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Norbert Perrimon
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA.,Howard Hughes Medical Institute, Boston, MA 02115, USA
| |
Collapse
|
35
|
Vermicelli S, Cricelli L, Grimaldi M. How can crowdsourcing help tackle the COVID‐19 pandemic? An explorative overview of innovative collaborative practices. R&D MANAGEMENT 2021; 51:183-194. [PMCID: PMC7753275 DOI: 10.1111/radm.12443] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 08/27/2020] [Accepted: 10/22/2020] [Indexed: 05/23/2023]
Abstract
The COVID‐19 pandemic has caused unprecedented public health and economic crises. As a response to face the current emergency, science and innovation communities are realizing a fundamental contribution to tackle the crisis. During the past few months, we have witnessed an impressive number of initiatives to encourage networking opportunities, to foster interactions between the different stakeholders involved (health care, industry, governments, academics, ordinary people), and to develop innovative solutions and collaborative infrastructures in support of the health sector. Adopting an open and collaborative approach and joining forces is essential in the fight against the COVID‐19 crisis. Also, the involvement of crowds as innovation partners can be of great support. Therefore, our work aims to review and classify those initiatives, based on the crowdsourcing model, that have been put into place to face the emergency generated by the novel coronavirus pandemic. We illustrate the 16 crowdsourcing initiatives devoted to the SARS‐CoV‐2 outbreak that we identified, detailing their development and implementation. Then, we propose a classification of them, along two dimensions: type of crowdsourcing configuration and kind of tasks, being able to find a relationship between these two aspects. Evidence from the analyzed projects suggests that across disparate domains, crowdsourcing can be an effective strategy in the response to the COVID‐19 pandemic. To conclude, we suggest some important implications for innovation best practices and lessons that can be learned for the future: crowdsourcing, harnessing the power of crowds and online communities, can help tackle the COVID‐19 pandemic, by providing original, actionable, quick, and low‐cost solutions to the challenges of the current health and economic crisis.
Collapse
Affiliation(s)
- Silvia Vermicelli
- Department of Enterprise EngineeringUniversity of Rome ‘Tor Vergata’Viale del Politecnico, 1 – 00133RomeItaly
| | - Livio Cricelli
- Department of Industrial EngineeringUniversity of Naples “Federico II”Piazzale Tecchio 80NaplesItaly
| | - Michele Grimaldi
- Department of Civil and Mechanical EngineeringUniversity of Cassino and Southern LazioVia G. Di Biasio 43CassinoFRItaly
| |
Collapse
|
36
|
Manco L, Maffei N, Strolin S, Vichi S, Bottazzi L, Strigari L. Basic of machine learning and deep learning in imaging for medical physicists. Phys Med 2021; 83:194-205. [DOI: 10.1016/j.ejmp.2021.03.026] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 03/07/2021] [Accepted: 03/16/2021] [Indexed: 02/08/2023] Open
|
37
|
Microbiome Metadata Standards: Report of the National Microbiome Data Collaborative's Workshop and Follow-On Activities. mSystems 2021; 6:6/1/e01194-20. [PMID: 33622857 PMCID: PMC8573954 DOI: 10.1128/msystems.01194-20] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Microbiome samples are inherently defined by the environment in which they are found. Therefore, data that provide context and enable interpretation of measurements produced from biological samples, often referred to as metadata, are critical. Important contributions have been made in the development of community-driven metadata standards; however, these standards have not been uniformly embraced by the microbiome research community. To understand how these standards are being adopted, or the barriers to adoption, across research domains, institutions, and funding agencies, the National Microbiome Data Collaborative (NMDC) hosted a workshop in October 2019. This report provides a summary of discussions that took place throughout the workshop, as well as outcomes of the working groups initiated at the workshop.
Collapse
|
38
|
Moreno-Indias I, Lahti L, Nedyalkova M, Elbere I, Roshchupkin G, Adilovic M, Aydemir O, Bakir-Gungor B, Santa Pau ECD, D’Elia D, Desai MS, Falquet L, Gundogdu A, Hron K, Klammsteiner T, Lopes MB, Marcos-Zambrano LJ, Marques C, Mason M, May P, Pašić L, Pio G, Pongor S, Promponas VJ, Przymus P, Saez-Rodriguez J, Sampri A, Shigdel R, Stres B, Suharoschi R, Truu J, Truică CO, Vilne B, Vlachakis D, Yilmaz E, Zeller G, Zomer AL, Gómez-Cabrero D, Claesson MJ. Statistical and Machine Learning Techniques in Human Microbiome Studies: Contemporary Challenges and Solutions. Front Microbiol 2021; 12:635781. [PMID: 33692771 PMCID: PMC7937616 DOI: 10.3389/fmicb.2021.635781] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/28/2021] [Indexed: 12/23/2022] Open
Abstract
The human microbiome has emerged as a central research topic in human biology and biomedicine. Current microbiome studies generate high-throughput omics data across different body sites, populations, and life stages. Many of the challenges in microbiome research are similar to other high-throughput studies, the quantitative analyses need to address the heterogeneity of data, specific statistical properties, and the remarkable variation in microbiome composition across individuals and body sites. This has led to a broad spectrum of statistical and machine learning challenges that range from study design, data processing, and standardization to analysis, modeling, cross-study comparison, prediction, data science ecosystems, and reproducible reporting. Nevertheless, although many statistics and machine learning approaches and tools have been developed, new techniques are needed to deal with emerging applications and the vast heterogeneity of microbiome data. We review and discuss emerging applications of statistical and machine learning techniques in human microbiome studies and introduce the COST Action CA18131 "ML4Microbiome" that brings together microbiome researchers and machine learning experts to address current challenges such as standardization of analysis pipelines for reproducibility of data analysis results, benchmarking, improvement, or development of existing and new tools and ontologies.
Collapse
Affiliation(s)
- Isabel Moreno-Indias
- Instituto de Investigación Biomédica de Málaga (IBIMA), Unidad de Gestión Clìnica de Endocrinologìa y Nutrición, Hospital Clìnico Universitario Virgen de la Victoria, Universidad de Málaga, Málaga, Spain
- Centro de Investigación Biomeìdica en Red de Fisiopatologtìa de la Obesidad y la Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, Spain
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Miroslava Nedyalkova
- Human Genetics and Disease Mechanisms, Latvian Biomedical Research and Study Centre, Riga, Latvia
| | - Ilze Elbere
- Latvian Biomedical Research and Study Centre, Riga, Latvia
| | | | - Muhamed Adilovic
- Department of Genetics and Bioengineering, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Onder Aydemir
- Department of Electrical and Electronics Engineering, Karadeniz Technical University, Trabzon, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | | | - Domenica D’Elia
- Department for Biomedical Sciences, Institute for Biomedical Technologies, National Research Council, Bari, Italy
| | - Mahesh S. Desai
- Department of Infection and Immunity, Luxembourg Institute of Health, Esch-sur-Alzette, Luxembourg
- Odense Research Center for Anaphylaxis, Department of Dermatology and Allergy Center, Odense University Hospital, University of Southern Denmark, Odense, Denmark
| | - Laurent Falquet
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Aycan Gundogdu
- Department of Microbiology and Clinical Microbiology, Faculty of Medicine, Erciyes University, Kayseri, Turkey
- Metagenomics Laboratory, Genome and Stem Cell Center (GenKök), Erciyes University, Kayseri, Turkey
| | - Karel Hron
- Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czechia
| | | | - Marta B. Lopes
- NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), FCT, UNL, Caparica, Portugal
- Centro de Matemática e Aplicações (CMA), FCT, UNL, Caparica, Portugal
| | - Laura Judith Marcos-Zambrano
- Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain
| | - Cláudia Marques
- CINTESIS, NOVA Medical School, NMS, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Michael Mason
- Computational Oncology, Sage Bionetworks, Seattle, WA, United States
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Lejla Pašić
- Sarajevo Medical School, University Sarajevo School of Science and Technology, Sarajevo, Bosnia and Herzegovina
| | - Gianvito Pio
- Department of Computer Science, University of Bari Aldo Moro, Bari, Italy
| | - Sándor Pongor
- Faculty of Information Tehnology and Bionics, Pázmány University, Budapest, Hungary
| | - Vasilis J. Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Piotr Przymus
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University, Toruñ, Poland
| | - Julio Saez-Rodriguez
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine and Heidelberg University Hospital, Heidelberg, Germany
| | - Alexia Sampri
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Rajesh Shigdel
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Blaz Stres
- Jozef Stefan Institute, Ljubljana, Slovenia
- Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
- Faculty of Civil and Geodetic Engineering, University of Ljubljana, Ljubljana, Slovenia
| | - Ramona Suharoschi
- Molecular Nutrition and Proteomics Lab, Faculty of the Food Science and Technology, Institute of Life Sciences, University of Agricultural Sciences and Veterinary Medicine of Cluj-Napoca, Cluj-Napoca, Romania
| | - Jaak Truu
- Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Ciprian-Octavian Truică
- Department of Computer Science and Engineering, Faculty of Automatic Control and Computers, University Politehnica of Bucharest, Bucharest, Romania
| | - Baiba Vilne
- Bioinformatics Research Unit, Riga Stradins University, Riga, Latvia
| | - Dimitrios Vlachakis
- Laboratory of Genetics, Department of Biotechnology, School of Applied Biology and Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Ercument Yilmaz
- Department of Computer Technologies, Karadeniz Technical University, Trabzon, Turkey
| | - Georg Zeller
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Aldert L. Zomer
- Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands
| | - David Gómez-Cabrero
- Navarrabiomed, Complejo Hospitalario de Navarra (CHN), IdiSNA, Universidad Pública de Navarra (UPNA), Pamplona, Spain
| | - Marcus J. Claesson
- School of Microbiology and APC Microbiome Ireland, University College Cork, Cork, Ireland
| |
Collapse
|
39
|
Vincent BG, Szustakowski JD, Doshi P, Mason M, Guinney J, Carbone DP. Pursuing Better Biomarkers for Immunotherapy Response in Cancer Through a Crowdsourced Data Challenge. JCO Precis Oncol 2021; 5:51-54. [PMID: 34994587 PMCID: PMC9848594 DOI: 10.1200/po.20.00371] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Affiliation(s)
- Benjamin G. Vincent
- Department of Medicine, Division of
Hematology/Oncology, Department of Microbiology and Immunology, Curriculum in
Bioinformatics and Computational Biology, Computational Medicine Program,
University of North Carolina at Chapel Hill, Chapel Hill, NC,Benjamin G. Vincent, MD, University of North Carolina at Chapel
Hill, 5206 Marsico Hall, Chapel Hill, NC 27599; Twitter: @BenjaminGVincen;
@UNC_Lineberger; @CompMedUNC; e-mail:
| | | | | | | | | | - David P. Carbone
- The Ohio State University Comprehensive
Cancer Center, Columbus, OH
| |
Collapse
|
40
|
Ulahannan JP, Narayanan N, Thalhath N, Prabhakaran P, Chaliyeduth S, Suresh SP, Mohammed M, Rajeevan E, Joseph S, Balakrishnan A, Uthaman J, Karingamadathil M, Thomas ST, Sureshkumar U, Balan S, Vellichirammal NN. A citizen science initiative for open data and visualization of COVID-19 outbreak in Kerala, India. J Am Med Inform Assoc 2020; 27:1913-1920. [PMID: 32761211 PMCID: PMC7454688 DOI: 10.1093/jamia/ocaa203] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 08/05/2020] [Indexed: 01/10/2023] Open
Abstract
Objective India reported its first coronavirus disease 2019 (COVID-19) case in the state of Kerala and an outbreak initiated subsequently. The Department of Health Services, Government of Kerala, initially released daily updates through daily textual bulletins for public awareness to control the spread of the disease. However, these unstructured data limit upstream applications, such as visualization, and analysis, thus demanding refinement to generate open and reusable datasets. Materials and Methods Through a citizen science initiative, we leveraged publicly available and crowd-verified data on COVID-19 outbreak in Kerala from the government bulletins and media outlets to generate reusable datasets. This was further visualized as a dashboard through a front-end Web application and a JSON (JavaScript Object Notation) repository, which serves as an application programming interface for the front end. Results From the sourced data, we provided real-time analysis, and daily updates of COVID-19 cases in Kerala, through a user-friendly bilingual dashboard (https://covid19kerala.info/) for nonspecialists. To ensure longevity and reusability, the dataset was deposited in an open-access public repository for future analysis. Finally, we provide outbreak trends and demographic characteristics of the individuals affected with COVID-19 in Kerala during the first 138 days of the outbreak. Discussion We anticipate that our dataset can form the basis for future studies, supplemented with clinical and epidemiological data from the individuals affected with COVID-19 in Kerala. Conclusions We reported a citizen science initiative on the COVID-19 outbreak in Kerala to collect and deposit data in a structured format, which was utilized for visualizing the outbreak trend and describing demographic characteristics of affected individuals.
Collapse
Affiliation(s)
| | | | - Nishad Thalhath
- School of Library, Information and Media Studies, University of Tsukuba, Tsukuba, Japan
| | - Prem Prabhakaran
- Department of Advanced Materials and Chemical Engineering, Hannam University, Daejeon, South Korea
| | - Sreekanth Chaliyeduth
- Centre for Cognitive and Brain Sciences, Indian Institute of Technology Gandhinagar, Gandhinagar, India
| | - Sooraj P Suresh
- Department of Humanities and Social Sciences, National Institute of Technology Tiruchirappalli, Tiruchirappalli, India
| | - Musfir Mohammed
- Embedded Analytics, ML and Data Sciences, Experion Technologies, Thiruvananthapuram, India
| | - E Rajeevan
- Department of Philosophy, Government Brennen College, Kannur University, Kannur, India
| | - Sindhu Joseph
- Department of Travel and Tourism Management, Govinda Pai Memorial Government College, Kannur University, Kannur, India
| | | | - Jeevan Uthaman
- Department of Marine Geophysics, Cochin University of Science and Technology, Kochi, India
| | | | - Sunil Thonikkuzhiyil Thomas
- Department of Electronics, College of Engineering Attingal, APJ Abdul Kalam Technical University, Thiruvananthapuram, India
| | - Unnikrishnan Sureshkumar
- Astronomical Observatory of the Jagiellonian University, Faculty of Physics, Astronomy and Applied Science, Jagiellonian University, Kraków, Poland
| | - Shabeesh Balan
- Laboratory for Molecular Psychiatry, RIKEN Center for Brain Science, Wako, Japan
| | | | | |
Collapse
|
41
|
Thompson DC, Bentzien J. Crowdsourcing and open innovation in drug discovery: recent contributions and future directions. Drug Discov Today 2020; 25:2284-2293. [PMID: 33011343 PMCID: PMC7529695 DOI: 10.1016/j.drudis.2020.09.020] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 08/27/2020] [Accepted: 09/17/2020] [Indexed: 01/03/2023]
Abstract
The past decade has seen significant growth in the use of 'crowdsourcing' and open innovation approaches to engage 'citizen scientists' to perform novel scientific research. Here, we quantify and summarize the current state of adoption of open innovation by major pharmaceutical companies. We also highlight recent crowdsourcing and open innovation research contributions to the field of drug discovery, and interesting future directions.
Collapse
Affiliation(s)
| | - Jörg Bentzien
- Alkermes, Inc. 852 Winter Street, Waltham, MA 02451-1420, USA
| |
Collapse
|
42
|
Maier-Hein L, Reinke A, Kozubek M, Martel AL, Arbel T, Eisenmann M, Hanbury A, Jannin P, Müller H, Onogur S, Saez-Rodriguez J, van Ginneken B, Kopp-Schneider A, Landman BA. BIAS: Transparent reporting of biomedical image analysis challenges. Med Image Anal 2020; 66:101796. [PMID: 32911207 PMCID: PMC7441980 DOI: 10.1016/j.media.2020.101796] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/12/2020] [Accepted: 07/27/2020] [Indexed: 12/12/2022]
Abstract
The number of biomedical image analysis challenges organized per year is steadily increasing. These international competitions have the purpose of benchmarking algorithms on common data sets, typically to identify the best method for a given problem. Recent research, however, revealed that common practice related to challenge reporting does not allow for adequate interpretation and reproducibility of results. To address the discrepancy between the impact of challenges and the quality (control), the Biomedical Image Analysis ChallengeS (BIAS) initiative developed a set of recommendations for the reporting of challenges. The BIAS statement aims to improve the transparency of the reporting of a biomedical image analysis challenge regardless of field of application, image modality or task category assessed. This article describes how the BIAS statement was developed and presents a checklist which authors of biomedical image analysis challenges are encouraged to include in their submission when giving a paper on a challenge into review. The purpose of the checklist is to standardize and facilitate the review process and raise interpretability and reproducibility of challenge results by making relevant information explicit.
Collapse
Affiliation(s)
- Lena Maier-Hein
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, Heidelberg 69120, Germany.
| | - Annika Reinke
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, Heidelberg 69120, Germany
| | - Michal Kozubek
- Centre for Biomedical Image Analysis, Masaryk University, Botanická 68a, Brno 60200, Czech Republic
| | - Anne L Martel
- Physical Sciences, Sunnybrook Research Institute, 2075 Bayview Avenue, Rm M6-609, Toronto ON M4N 3M5, Canada; Department Medical Biophysics, University of Toronto, 101 College St Suite 15-701, Toronto, ON M5G 1L7, Canada
| | - Tal Arbel
- Centre for Intelligent Machines, McGill University, 3480 University Street, McConnell Engineering Building, Room 425, Montreal QC H3A 0E9, Canada
| | - Matthias Eisenmann
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, Heidelberg 69120, Germany
| | - Allan Hanbury
- Institute of Information Systems Engineering, Technische Universität (TU) Wien, Favoritenstraße 9-11/194-04, Vienna 1040, Austria; Complexity Science Hub Vienna, Josefstädter Straße 39, Vienna 1080, Austria
| | - Pierre Jannin
- Laboratoire Traitement du Signal et de l'Image (LTSI) - UMR_S 1099, Université de Rennes 1, Inserm, Rennes, Cedex 35043, France
| | - Henning Müller
- University of Applied Sciences Western Switzerland (HES-SO), Rue du Technopole 3, Sierre 3960, Switzerland; Medical Faculty, University of Geneva, Rue Gabrielle-Perret-Gentil 4, Geneva 1211, Switzerland
| | - Sinan Onogur
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, Heidelberg 69120, Germany
| | - Julio Saez-Rodriguez
- Institute of Computational Biomedicine, Heidelberg University, Faculty of Medicine, Im Neuenheimer Feld 267, Heidelberg 69120, Germany; Heidelberg University Hospital, Im Neuenheimer Feld 267, Heidelberg 69120, Germany; Joint Research Centre for Computational Biomedicine, Rheinisch-Westfälische Technische Hochschule (RWTH) Aachen, Faculty of Medicine, Aachen 52074, Germany
| | - Bram van Ginneken
- Department of Radiology and Nuclear Medicine, Medical Image Analysis, Radboud University Center, Nijmegen 6525 GA, The Netherlands
| | - Annette Kopp-Schneider
- Division of Biostatistics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 581, Heidelberg, 69120, Germany
| | - Bennett A Landman
- Electrical Engineering, Vanderbilt University, Nashville, Tennessee TN 37235-1679, USA
| |
Collapse
|
43
|
Tanevski J, Nguyen T, Truong B, Karaiskos N, Ahsen ME, Zhang X, Shu C, Xu K, Liang X, Hu Y, Pham HV, Xiaomei L, Le TD, Tarca AL, Bhatti G, Romero R, Karathanasis N, Loher P, Chen Y, Ouyang Z, Mao D, Zhang Y, Zand M, Ruan J, Hafemeister C, Qiu P, Tran D, Nguyen T, Gabor A, Yu T, Guinney J, Glaab E, Krause R, Banda P, Stolovitzky G, Rajewsky N, Saez-Rodriguez J, Meyer P. Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data. Life Sci Alliance 2020; 3:e202000867. [PMID: 32972997 PMCID: PMC7536825 DOI: 10.26508/lsa.202000867] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 08/26/2020] [Accepted: 08/31/2020] [Indexed: 11/24/2022] Open
Abstract
Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues.
Collapse
Affiliation(s)
- Jovan Tanevski
- Institute for Computational Biomedicine, Faculty of Medicine, Heidelberg University Hospital and Heidelberg University, Heidelberg, Germany
- Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
| | | | - Buu Truong
- University of South Australia, Mawson Lakes, Australia
| | - Nikos Karaiskos
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Mehmet Eren Ahsen
- Icahn School of Medicine at Mount Sinai, New York City, NY, USA
- University of Illinois, Urbana-Champaign, Champaign, IL, USA
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
- Department of Pediatrics, Columbia University Irving Medical Center, New York, NY, USA
| | - Chang Shu
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
| | - Xiaoyu Liang
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, USA
| | - Ying Hu
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Bethesda, MD, USA
| | - Hoang Vv Pham
- University of South Australia, Mawson Lakes, Australia
| | - Li Xiaomei
- University of South Australia, Mawson Lakes, Australia
| | - Thuc D Le
- University of South Australia, Mawson Lakes, Australia
| | - Adi L Tarca
- Department of Obstetrics and Gynecology and Department of Computer Science, Wayne State University, Detroit, MI, USA
| | - Gaurav Bhatti
- Perinatology Research Branch, National Institute of Child Health and Human Development (NICHD)/National Insitutes of Health (NIH)/ Department of Health & Human Services (DHHS), Bethesda, MD, USA
- Perinatology Research Branch, NICHD/NIH/DHHS, Detroit, MI, USA
| | - Roberto Romero
- Perinatology Research Branch, National Institute of Child Health and Human Development (NICHD)/National Insitutes of Health (NIH)/ Department of Health & Human Services (DHHS), Bethesda, MD, USA
- Perinatology Research Branch, NICHD/NIH/DHHS, Detroit, MI, USA
| | | | - Phillipe Loher
- Computational Medicine Center, Thomas Jefferson University, Philadelphia, PA, USA
| | - Yang Chen
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | | | - Maryam Zand
- University of Texas at San Antonio, San Antonio, TX, USA
| | - Jianhua Ruan
- University of Texas at San Antonio, San Antonio, TX, USA
| | | | - Peng Qiu
- Georgia Institute of Technology, Atlanta, GA, USA
- Emory University, Atlanta, GA, USA
| | - Duc Tran
- University of Nevada, Reno, NV, USA
| | | | - Attila Gabor
- Institute for Computational Biomedicine, Faculty of Medicine, Heidelberg University Hospital and Heidelberg University, Heidelberg, Germany
| | | | | | - Enrico Glaab
- Biomedical Data Science Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur Alzette, Luxembourg
| | - Roland Krause
- Bioinformatics Core Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur Alzette, Luxembourg
| | - Peter Banda
- Bioinformatics Core Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur Alzette, Luxembourg
| | - Gustavo Stolovitzky
- International Buisness Machines (IBM) T.J. Watson Research Center, Yorktown Heights, NY, USA
| | - Nikolaus Rajewsky
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Julio Saez-Rodriguez
- Institute for Computational Biomedicine, Faculty of Medicine, Heidelberg University Hospital and Heidelberg University, Heidelberg, Germany
- Joint Research Centre for Computational Biomedicine, Faculty of Medicine, RWTH Aachen University, Aachen, Germany
| | - Pablo Meyer
- International Buisness Machines (IBM) T.J. Watson Research Center, Yorktown Heights, NY, USA
| |
Collapse
|
44
|
Stumpf MPH. Multi-model and network inference based on ensemble estimates: avoiding the madness of crowds. J R Soc Interface 2020; 17:20200419. [PMID: 33081645 PMCID: PMC7653378 DOI: 10.1098/rsif.2020.0419] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Recent progress in theoretical systems biology, applied mathematics and computational statistics allows us to compare the performance of different candidate models at describing a particular biological system quantitatively. Model selection has been applied with great success to problems where a small number-typically less than 10-of models are compared, but recent studies have started to consider thousands and even millions of candidate models. Often, however, we are left with sets of models that are compatible with the data, and then we can use ensembles of models to make predictions. These ensembles can have very desirable characteristics, but as I show here are not guaranteed to improve on individual estimators or predictors. I will show in the cases of model selection and network inference when we can trust ensembles, and when we should be cautious. The analyses suggest that the careful construction of an ensemble-choosing good predictors-is of paramount importance, more than had perhaps been realized before: merely adding different methods does not suffice. The success of ensemble network inference methods is also shown to rest on their ability to suppress false-positive results. A Jupyter notebook which allows carrying out an assessment of ensemble estimators is provided.
Collapse
Affiliation(s)
- Michael P H Stumpf
- School of BioSciences and School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia.,Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
45
|
Wells DK, van Buuren MM, Dang KK, Hubbard-Lucey VM, Sheehan KCF, Campbell KM, Lamb A, Ward JP, Sidney J, Blazquez AB, Rech AJ, Zaretsky JM, Comin-Anduix B, Ng AHC, Chour W, Yu TV, Rizvi H, Chen JM, Manning P, Steiner GM, Doan XC, Merghoub T, Guinney J, Kolom A, Selinsky C, Ribas A, Hellmann MD, Hacohen N, Sette A, Heath JR, Bhardwaj N, Ramsdell F, Schreiber RD, Schumacher TN, Kvistborg P, Defranoux NA. Key Parameters of Tumor Epitope Immunogenicity Revealed Through a Consortium Approach Improve Neoantigen Prediction. Cell 2020; 183:818-834.e13. [PMID: 33038342 DOI: 10.1016/j.cell.2020.09.015] [Citation(s) in RCA: 261] [Impact Index Per Article: 65.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/08/2020] [Accepted: 09/03/2020] [Indexed: 12/15/2022]
Abstract
Many approaches to identify therapeutically relevant neoantigens couple tumor sequencing with bioinformatic algorithms and inferred rules of tumor epitope immunogenicity. However, there are no reference data to compare these approaches, and the parameters governing tumor epitope immunogenicity remain unclear. Here, we assembled a global consortium wherein each participant predicted immunogenic epitopes from shared tumor sequencing data. 608 epitopes were subsequently assessed for T cell binding in patient-matched samples. By integrating peptide features associated with presentation and recognition, we developed a model of tumor epitope immunogenicity that filtered out 98% of non-immunogenic peptides with a precision above 0.70. Pipelines prioritizing model features had superior performance, and pipeline alterations leveraging them improved prediction performance. These findings were validated in an independent cohort of 310 epitopes prioritized from tumor sequencing data and assessed for T cell binding. This data resource enables identification of parameters underlying effective anti-tumor immunity and is available to the research community.
Collapse
Affiliation(s)
- Daniel K Wells
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA.
| | - Marit M van Buuren
- Division of Molecular Oncology and Immunology, the Netherlands Cancer Institute, Amsterdam, the Netherlands; T Cell Immunology, Biopharmaceutical New Technologies (BioNTech) Corporation, BioNTech US, Cambridge, MA, USA
| | - Kristen K Dang
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | | | - Kathleen C F Sheehan
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, St. Louis, MO, USA; The Andrew M. and Jane M. Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, St. Louis, MO, USA
| | - Katie M Campbell
- Division of Hematology and Oncology, Department of Medicine, Johnson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Andrew Lamb
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Jeffrey P Ward
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - John Sidney
- Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA
| | - Ana B Blazquez
- Division of Hematology and Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Andrew J Rech
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Jesse M Zaretsky
- Division of Hematology and Oncology, Department of Medicine, Johnson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Begonya Comin-Anduix
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Department of Surgery, David Geffen School of Medicine, Johnson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - William Chour
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Thomas V Yu
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Hira Rizvi
- Druckenmiller Center for Lung Cancer Research, MSKCC, New York, NY, USA
| | - Jia M Chen
- Division of Hematology and Oncology, Department of Medicine, Johnson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Patrice Manning
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
| | | | - Xengie C Doan
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Taha Merghoub
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Department of Medicine, MSKCC, New York, NY, USA; Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Justin Guinney
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA; Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Adam Kolom
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Anna-Maria Kellen Clinical Accelerator, Cancer Research Institute, New York, NY, USA
| | - Cheryl Selinsky
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
| | - Antoni Ribas
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Division of Hematology and Oncology, Department of Medicine, Johnson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Matthew D Hellmann
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Druckenmiller Center for Lung Cancer Research, MSKCC, New York, NY, USA; Department of Medicine, MSKCC, New York, NY, USA; Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Nir Hacohen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Alessandro Sette
- Division of Hematology and Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - James R Heath
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Institute for Systems Biology, Seattle, WA, USA
| | - Nina Bhardwaj
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Division of Hematology and Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Fred Ramsdell
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
| | - Robert D Schreiber
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, St. Louis, MO, USA; The Andrew M. and Jane M. Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, St. Louis, MO, USA
| | - Ton N Schumacher
- Division of Molecular Oncology and Immunology, Oncode Institute, the Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Pia Kvistborg
- Division of Molecular Oncology and Immunology, the Netherlands Cancer Institute, Amsterdam, the Netherlands
| | | |
Collapse
|
46
|
Diaz JE, Ahsen ME, Schaffter T, Chen X, Realubit RB, Karan C, Califano A, Losic B, Stolovitzky G. The transcriptomic response of cells to a drug combination is more than the sum of the responses to the monotherapies. eLife 2020; 9:52707. [PMID: 32945258 PMCID: PMC7546737 DOI: 10.7554/elife.52707] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Accepted: 08/17/2020] [Indexed: 12/13/2022] Open
Abstract
Our ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature of the synergistic combination was unique relative to that of either constituent monotherapy. The sequential activation of transcription factors in time in the gene regulatory network was implicated. The nature of this transcriptional cascade suggests that drug synergy may ensue when the transcriptional responses elicited by two unrelated individual drugs are correlated. We used these results as the basis of a simple prediction algorithm attaining an AUROC of 0.77 in the prediction of synergistic drug combinations in an independent dataset.
Collapse
Affiliation(s)
- Jennifer El Diaz
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, United States.,Department of Cell, Developmental, and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, United States.,IBM Computational Biology Center, IBM Research, Yorktown Heights, United States
| | - Mehmet Eren Ahsen
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, United States.,IBM Computational Biology Center, IBM Research, Yorktown Heights, United States.,Department of Business Administration, University of Illinois at Urbana-Champaign, Champaign, United States
| | - Thomas Schaffter
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, United States.,IBM Computational Biology Center, IBM Research, Yorktown Heights, United States
| | - Xintong Chen
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Ronald B Realubit
- Department of Systems Biology, Columbia University, New York, United States.,Sulzberger Columbia Genome Center, High Throughput Screening Facility, Columbia University Medical Center, New York, United States
| | - Charles Karan
- Department of Systems Biology, Columbia University, New York, United States.,Sulzberger Columbia Genome Center, High Throughput Screening Facility, Columbia University Medical Center, New York, United States
| | - Andrea Califano
- Department of Systems Biology, Columbia University, New York, United States.,Department of Biomedical Informatics, Columbia University, New York, United States.,Department of Biochemistry and Molecular Biophysics, Columbia University, New York, United States.,Department of Medicine, Columbia University, New York, United States
| | - Bojan Losic
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, United States.,Tisch Cancer Institute, Cancer Immunology, Icahn School of Medicine at Mount Sinai, New York, United States.,Diabetes, Obesity and Metabolism Institute, Icahn School of Medicine at Mount Sinai, New York, United States.,Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, United States
| | - Gustavo Stolovitzky
- Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai, New York, United States.,IBM Computational Biology Center, IBM Research, Yorktown Heights, United States.,Department of Systems Biology, Columbia University, New York, United States.,Department of Biomedical Informatics, Columbia University, New York, United States
| |
Collapse
|
47
|
Yang M, Petralia F, Li Z, Li H, Ma W, Song X, Kim S, Lee H, Yu H, Lee B, Bae S, Heo E, Kaczmarczyk J, Stępniak P, Warchoł M, Yu T, Calinawan AP, Boutros PC, Payne SH, Reva B, Boja E, Rodriguez H, Stolovitzky G, Guan Y, Kang J, Wang P, Fenyö D, Saez-Rodriguez J, Aderinwale T, Afyounian E, Agrawal P, Ali M, Amadoz A, Azuaje F, Bachman J, Bae S, Bhalla S, Carbonell-Caballero J, Chakraborty P, Chaudhary K, Choi Y, Choi Y, Çubuk C, Dhanda SK, Dopazo J, Elo LL, Fóthi Á, Gevaert O, Granberg K, Greiner R, Heo E, Hidalgo MR, Jayaswal V, Jeon H, Jeon M, Kalmady SV, Kambara Y, Kang J, Kang K, Kaoma T, Kaur H, Kazan H, Kesar D, Kesseli J, Kim D, Kim K, Kim SY, Kim S, Kumar S, Lee B, Lee H, Liu Y, Luethy R, Mahajan S, Mahmoudian M, Muller A, Nazarov PV, Nguyen H, Nykter M, Okuda S, Park S, Pal Singh Raghava G, Rajapakse JC, Rantapero T, Ryu H, Salavert F, Saraei S, Sharma R, Siitonen A, Sokolov A, Subramanian K, Suni V, Suomi T, Tranchevent LC, Usmani SS, Välikangas T, Vega R, Zhong H. Community Assessment of the Predictability of Cancer Protein and Phosphoprotein Levels from Genomics and Transcriptomics. Cell Syst 2020; 11:186-195.e9. [DOI: 10.1016/j.cels.2020.06.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 03/12/2020] [Accepted: 06/29/2020] [Indexed: 10/23/2022]
|
48
|
Bergquist T, Yan Y, Schaffter T, Yu T, Pejaver V, Hammarlund N, Prosser J, Guinney J, Mooney S. Piloting a model-to-data approach to enable predictive analytics in health care through patient mortality prediction. J Am Med Inform Assoc 2020; 27:1393-1400. [PMID: 32638010 PMCID: PMC7526463 DOI: 10.1093/jamia/ocaa083] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 04/16/2020] [Accepted: 05/06/2020] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE The development of predictive models for clinical application requires the availability of electronic health record (EHR) data, which is complicated by patient privacy concerns. We showcase the "Model to Data" (MTD) approach as a new mechanism to make private clinical data available for the development of predictive models. Under this framework, we eliminate researchers' direct interaction with patient data by delivering containerized models to the EHR data. MATERIALS AND METHODS We operationalize the MTD framework using the Synapse collaboration platform and an on-premises secure computing environment at the University of Washington hosting EHR data. Containerized mortality prediction models developed by a model developer, were delivered to the University of Washington via Synapse, where the models were trained and evaluated. Model performance metrics were returned to the model developer. RESULTS The model developer was able to develop 3 mortality prediction models under the MTD framework using simple demographic features (area under the receiver-operating characteristic curve [AUROC], 0.693), demographics and 5 common chronic diseases (AUROC, 0.861), and the 1000 most common features from the EHR's condition/procedure/drug domains (AUROC, 0.921). DISCUSSION We demonstrate the feasibility of the MTD framework to facilitate the development of predictive models on private EHR data, enabled by common data models and containerization software. We identify challenges that both the model developer and the health system information technology group encountered and propose future efforts to improve implementation. CONCLUSIONS The MTD framework lowers the barrier of access to EHR data and can accelerate the development and evaluation of clinical prediction models.
Collapse
Affiliation(s)
- Timothy Bergquist
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Yao Yan
- Molecular Engineering and Sciences Institute, University of Washington, Seattle, Washington, USA
| | | | - Thomas Yu
- Sage Bionetworks, Seattle, Washington, USA
| | - Vikas Pejaver
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Noah Hammarlund
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Justin Prosser
- Institute for Translational Health Sciences, University of Washington, Seattle, Washington, USA
| | - Justin Guinney
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA.,Sage Bionetworks, Seattle, Washington, USA
| | - Sean Mooney
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| |
Collapse
|
49
|
Agany DD, Pietri JE, Gnimpieba EZ. Assessment of vector-host-pathogen relationships using data mining and machine learning. Comput Struct Biotechnol J 2020; 18:1704-1721. [PMID: 32670510 PMCID: PMC7340972 DOI: 10.1016/j.csbj.2020.06.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/19/2020] [Accepted: 06/19/2020] [Indexed: 12/15/2022] Open
Abstract
Infectious diseases, including vector-borne diseases transmitted by arthropods, are a leading cause of morbidity and mortality worldwide. In the era of big data, addressing broad-scale, fundamental questions regarding the complex dynamics of these diseases will increasingly require the integration of diverse datasets to produce new biological knowledge. This review provides a current snapshot of the systematic assessment of the relationships between microbial pathogens, arthropod vectors and mammalian hosts using data mining and machine learning. We employ PRISMA to identify 32 key papers relevant to this topic. Our analysis shows an increasing use of data mining and machine learning tasks and techniques, including prediction, classification, clustering, association rules mining, and deep learning, over the last decade. However, it also reveals a number of critical challenges in applying these to the study of vector-host-pathogen interactions at various systems biology levels. Here, relevant studies, current limitations and future directions are discussed. Furthermore, the quality of data in relevant papers was assessed using the FAIR (Findable, Accessible, Interoperable, Reusable) compliance criteria to evaluate and encourage reproducibility and shareability of research outcomes. Although shortcomings in their application remain, data mining and machine learning have significant potential to break new ground in understanding fundamental aspects of vector-host-pathogen relationships and their application in this field should be encouraged. In particular, while predictive modeling, feature engineering and supervised machine learning are already being used in the field, other data mining and machine learning methods such as deep learning and association rules analysis lag behind and should be implemented in combination with established methods to accelerate hypothesis and knowledge generation in the domain.
Collapse
Affiliation(s)
- Diing D.M. Agany
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| | - Jose E. Pietri
- University of South Dakota, Sanford School of Medicine, Division of Basic Biomedical Sciences, Vermillion, SD, United States
| | - Etienne Z. Gnimpieba
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| |
Collapse
|
50
|
Kaminuma E, Baba Y, Mochizuki M, Matsumoto H, Ozaki H, Okayama T, Kato T, Oki S, Fujisawa T, Nakamura Y, Arita M, Ogasawara O, Kashima H, Takagi T. DDBJ Data Analysis Challenge: a machine learning competition to predict Arabidopsis chromatin feature annotations from DNA sequences. Genes Genet Syst 2020; 95:43-50. [PMID: 32213716 DOI: 10.1266/ggs.19-00034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Recently, the prospect of applying machine learning tools for automating the process of annotation analysis of large-scale sequences from next-generation sequencers has raised the interest of researchers. However, finding research collaborators with knowledge of machine learning techniques is difficult for many experimental life scientists. One solution to this problem is to utilise the power of crowdsourcing. In this report, we describe how we investigated the potential of crowdsourced modelling for a life science task by conducting a machine learning competition, the DNA Data Bank of Japan (DDBJ) Data Analysis Challenge. In the challenge, participants predicted chromatin feature annotations from DNA sequences with competing models. The challenge engaged 38 participants, with a cumulative total of 360 model submissions. The performance of the top model resulted in an area under the curve (AUC) score of 0.95. Over the course of the competition, the overall performance of the submitted models improved by an AUC score of 0.30 from the first submitted model. Furthermore, the 1st- and 2nd-ranking models utilised external data such as genomic location and gene annotation information with specific domain knowledge. The effect of incorporating this domain knowledge led to improvements of approximately 5%-9%, as measured by the AUC scores. This report suggests that machine learning competitions will lead to the development of highly accurate machine learning models for use by experimental scientists unfamiliar with the complexities of data science.
Collapse
Affiliation(s)
- Eli Kaminuma
- Center for Information Biology, National Institute of Genetics
| | - Yukino Baba
- Graduate School of Informatics, Kyoto University
| | | | - Hirotaka Matsumoto
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research
| | - Haruka Ozaki
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research
| | | | - Takuya Kato
- Graduate School of Information Science and Technology, The University of Tokyo
| | - Shinya Oki
- Graduate School of Medical Sciences, Kyushu University
| | | | | | - Masanori Arita
- Center for Information Biology, National Institute of Genetics
| | - Osamu Ogasawara
- Center for Information Biology, National Institute of Genetics
| | | | - Toshihisa Takagi
- Center for Information Biology, National Institute of Genetics.,Graduate School of Science, The University of Tokyo
| |
Collapse
|