1
|
Momin MM, Wray NR, Lee SH. R2ROC: an efficient method of comparing two or more correlated AUC from out-of-sample prediction using polygenic scores. Hum Genet 2024; 143:1193-1205. [PMID: 38902498 DOI: 10.1007/s00439-024-02682-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 05/29/2024] [Indexed: 06/22/2024]
Abstract
Polygenic risk scores (PRSs) enable early prediction of disease risk. Evaluating PRS performance for binary traits commonly relies on the area under the receiver operating characteristic curve (AUC). However, the widely used DeLong's method for comparative significance tests suffer from limitations, including computational time and the lack of a one-to-one mapping between test statistics based on AUC and R 2 . To overcome these limitations, we propose a novel approach that leverages the Delta method to derive the variance and covariance of AUC values, enabling a comprehensive and efficient comparative significance test. Our approach offers notable advantages over DeLong's method, including reduced computation time (up to 150-fold), making it suitable for large-scale analyses and ideal for integration into machine learning frameworks. Furthermore, our method allows for a direct one-to-one mapping between AUC and R 2 values for comparative significance tests, providing enhanced insights into the relationship between these measures and facilitating their interpretation. We validated our proposed approach through simulations and applied it to real data comparing PRSs for diabetes and coronary artery disease (CAD) prediction in a cohort of 28,880 European individuals. The PRSs were derived using genome-wide association study summary statistics from two distinct sources. Our approach enabled a comprehensive and informative comparison of the PRSs, shedding light on their respective predictive abilities for diabetes and CAD. This advancement contributes to the assessment of genetic risk factors and personalized disease prediction, supporting better healthcare decision-making.
Collapse
Affiliation(s)
- Md Moksedul Momin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- Department of Genetics and Animal Breeding, Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University (CVASU), Khulshi, Chattogram, 4225, Bangladesh.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| | - Naomi R Wray
- Department of Psychiatry, Medical Sciences Division, University of Oxford, Oxford, OX3 7JX, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, OX3 7LF, UK
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| |
Collapse
|
2
|
Souter NE, Bhagwat N, Racey C, Wilkinson R, Duncan NW, Samuel G, Lannelongue L, Selvan R, Rae CL. Measuring and reducing the carbon footprint of fMRI preprocessing in fMRIPrep. Hum Brain Mapp 2024; 45:e70003. [PMID: 39185668 PMCID: PMC11345634 DOI: 10.1002/hbm.70003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 07/18/2024] [Accepted: 08/06/2024] [Indexed: 08/27/2024] Open
Abstract
Computationally expensive data processing in neuroimaging research places demands on energy consumption-and the resulting carbon emissions contribute to the climate crisis. We measured the carbon footprint of the functional magnetic resonance imaging (fMRI) preprocessing tool fMRIPrep, testing the effect of varying parameters on estimated carbon emissions and preprocessing performance. Performance was quantified using (a) statistical individual-level task activation in regions of interest and (b) mean smoothness of preprocessed data. Eight variants of fMRIPrep were run with 257 participants who had completed an fMRI stop signal task (the same data also used in the original validation of fMRIPrep). Some variants led to substantial reductions in carbon emissions without sacrificing data quality: for instance, disabling FreeSurfer surface reconstruction reduced carbon emissions by 48%. We provide six recommendations for minimising emissions without compromising performance. By varying parameters and computational resources, neuroimagers can substantially reduce the carbon footprint of their preprocessing. This is one aspect of our research carbon footprint over which neuroimagers have control and agency to act upon.
Collapse
Affiliation(s)
| | - Nikhil Bhagwat
- McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute – Hospital)McGill UniversityMontrealQuebecCanada
| | - Chris Racey
- School of PsychologyUniversity of SussexBrightonUK
- Sussex NeuroscienceUniversity of SussexBrightonUK
| | - Reese Wilkinson
- Department of Physics and AstronomyUniversity of SussexBrightonUK
| | - Niall W. Duncan
- Graduate Institute of Mind, Brain and ConsciousnessTaipei Medical UniversityTaipeiTaiwan
| | - Gabrielle Samuel
- Department of Global Health and Social Medicine, King's College LondonLondonUK
| | - Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary CareUniversity of CambridgeCambridgeUK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary CareUniversity of CambridgeCambridgeUK
- Victor Phillip Dahdaleh Heart and Lung Research InstituteUniversity of CambridgeCambridgeUK
- Health Data Research UK CambridgeWellcome Genome Campus and University of CambridgeCambridgeUK
| | - Raghavendra Selvan
- Department of Computer ScienceUniversity of CopenhagenCopenhagenDenmark
- Department of NeuroscienceUniversity of CopenhagenCopenhagenDenmark
| | | |
Collapse
|
3
|
Li R, Romano JD, Chen Y, Moore JH. Centralized and Federated Models for the Analysis of Clinical Data. Annu Rev Biomed Data Sci 2024; 7:179-199. [PMID: 38723657 DOI: 10.1146/annurev-biodatasci-122220-115746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
The progress of precision medicine research hinges on the gathering and analysis of extensive and diverse clinical datasets. With the continued expansion of modalities, scales, and sources of clinical datasets, it becomes imperative to devise methods for aggregating information from these varied sources to achieve a comprehensive understanding of diseases. In this review, we describe two important approaches for the analysis of diverse clinical datasets, namely the centralized model and federated model. We compare and contrast the strengths and weaknesses inherent in each model and present recent progress in methodologies and their associated challenges. Finally, we present an outlook on the opportunities that both models hold for the future analysis of clinical data.
Collapse
Affiliation(s)
- Ruowang Li
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA;
| | - Joseph D Romano
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jason H Moore
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, California, USA;
| |
Collapse
|
4
|
Srivastava P, Benegas Coll M, Götz S, Nueda MJ, Conesa A. scMaSigPro: Differential Expression Analysis along Single-Cell Trajectories. Bioinformatics 2024; 40:btae443. [PMID: 38976653 PMCID: PMC11269465 DOI: 10.1093/bioinformatics/btae443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 06/27/2024] [Accepted: 07/04/2024] [Indexed: 07/10/2024] Open
Abstract
MOTIVATION Understanding the dynamics of gene expression across different cellular states is crucial for discerning the mechanisms underneath cellular differentiation. Genes that exhibit variation in mean expression as a function of Pseudotime and between branching trajectories are expected to govern cell fate decisions. We introduce scMaSigPro, a method for the identification of differential gene expression patterns along Pseudotime and branching paths simultaneously. RESULTS We assessed the performance of scMaSigPro using synthetic and public datasets. Our evaluation shows that scMaSigPro outperforms existing methods in controlling the False Positive Rate and is computationally efficient. AVAILABILITY AND IMPLEMENTATION scMaSigPro is available as a free R package (version 4.0 or higher) under the GPL(≥2) license on GitHub at 'github.com/BioBam/scMaSigPro' and archived with version 0.03 on Zenodo at 'zenodo.org/records/12568922'.
Collapse
Affiliation(s)
- Priyansh Srivastava
- BioBam Bioinformatics S.L., Valencia, 46024, Spain
- Department of Computer Science, University of Valencia, Valencia, 46100, Spain
| | | | - Stefan Götz
- BioBam Bioinformatics S.L., Valencia, 46024, Spain
| | - María José Nueda
- Mathematics Department, University of Alicante, Alicante, 03690, Spain
| | - Ana Conesa
- Institute for Integrative Systems Biology (I2SysBio), Consejo Superior de Investigaciones Cientıficas (CSIC), Paterna, 46980, Spain
| |
Collapse
|
5
|
Bayarri G, Andrio P, Gelpí JL, Hospital A, Orozco M. Using interactive Jupyter Notebooks and BioConda for FAIR and reproducible biomolecular simulation workflows. PLoS Comput Biol 2024; 20:e1012173. [PMID: 38900779 PMCID: PMC11189206 DOI: 10.1371/journal.pcbi.1012173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2024] Open
Abstract
Interactive Jupyter Notebooks in combination with Conda environments can be used to generate FAIR (Findable, Accessible, Interoperable and Reusable/Reproducible) biomolecular simulation workflows. The interactive programming code accompanied by documentation and the possibility to inspect intermediate results with versatile graphical charts and data visualization is very helpful, especially in iterative processes, where parameters might be adjusted to a particular system of interest. This work presents a collection of FAIR notebooks covering various areas of the biomolecular simulation field, such as molecular dynamics (MD), protein-ligand docking, molecular checking/modeling, molecular interactions, and free energy perturbations. Workflows can be launched with myBinder or easily installed in a local system. The collection of notebooks aims to provide a compilation of demonstration workflows, and it is continuously updated and expanded with examples using new methodologies and tools.
Collapse
Affiliation(s)
- Genís Bayarri
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Pau Andrio
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Josep Lluís Gelpí
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Department of Biochemistry and Biomedicine, University of Barcelona, Barcelona, Spain
| | - Adam Hospital
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute of Science and Technology, Barcelona, Spain
- Department of Biochemistry and Biomedicine, University of Barcelona, Barcelona, Spain
| |
Collapse
|
6
|
Freese T, Elzinga N, Heinemann M, Lerch MM, Feringa BL. The relevance of sustainable laboratory practices. RSC SUSTAINABILITY 2024; 2:1300-1336. [PMID: 38725867 PMCID: PMC11078267 DOI: 10.1039/d4su00056k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 03/15/2024] [Indexed: 05/12/2024]
Abstract
Scientists are of key importance to the society to advocate awareness of the climate crisis and its underlying scientific evidence and provide solutions for a sustainable future. As much as scientific research has led to great achievements and benefits, traditional laboratory practices come with unintended environmental consequences. Scientists, while providing solutions to climate problems and educating the young innovators of the future, are also part of the problem: excessive energy consumption, (hazardous) waste generation, and resource depletion. Through their own research operations, science, research and laboratories have a significant carbon footprint and contribute to the climate crisis. Climate change requires a rapid response across all sectors of society, modeled by inspiring leaders. A broader scientific community that takes concrete actions would serve as an important step in convincing the general public of similar actions. Over the past years, grassroots movements across the sciences have recognized the overlooked impact of the scientific enterprise, and so-called Green Lab initiatives emerged seeking to address the environmental footprint of research. Driven by the voluntary efforts of researchers and staff, they educate peers, develop sustainability guidelines, write scientific publications and maintain accreditation frameworks. With this perspective we want to advocate for and spark leadership to promote a systemic change in laboratory practices and approach to research. Comprehensive evidence for the environmental impact of laboratories and their root-causes is presented, expanded with data from a current case study of the University of Groningen showcasing annual savings of 398 763 € as well as 477.1 tons of CO2e. This is followed by guidelines for sustainable lab practices and hands-on advice on how to achieve a systemic change at research institutions and industry. How can we expect industry, politics, and society to change, if we as scientists are not changing either? Scientists should lead by example and practice the change they want to see.
Collapse
Affiliation(s)
- Thomas Freese
- Stratingh Institute for Chemistry, University of Groningen Nijenborgh 4 9747 AG Groningen The Netherlands
| | - Nils Elzinga
- Green Office, University of Groningen Broerstraat 5 9712 CP Groningen The Netherlands
| | - Matthias Heinemann
- Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen Nijenborgh 4 9747 AG Groningen The Netherlands
| | - Michael M Lerch
- Stratingh Institute for Chemistry, University of Groningen Nijenborgh 4 9747 AG Groningen The Netherlands
| | - Ben L Feringa
- Stratingh Institute for Chemistry, University of Groningen Nijenborgh 4 9747 AG Groningen The Netherlands
| |
Collapse
|
7
|
Janoš J, Figueira Nunes JP, Hollas D, Slavíček P, Curchod BFE. Predicting the photodynamics of cyclobutanone triggered by a laser pulse at 200 nm and its MeV-UED signals-A trajectory surface hopping and XMS-CASPT2 perspective. J Chem Phys 2024; 160:144305. [PMID: 38591685 DOI: 10.1063/5.0203105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/14/2024] [Indexed: 04/10/2024] Open
Abstract
This work is part of a prediction challenge that invited theoretical/computational chemists to predict the photochemistry of cyclobutanone in the gas phase, excited at 200 nm by a laser pulse, and the expected signal that will be recorded during a time-resolved megaelectronvolt ultrafast electron diffraction (MeV-UED). We present here our theoretical predictions based on a combination of trajectory surface hopping with XMS-CASPT2 (for the nonadiabatic molecular dynamics) and Born-Oppenheimer molecular dynamics with MP2 (for the athermal ground-state dynamics following internal conversion), coined (NA+BO)MD. The initial conditions were sampled from Born-Oppenheimer molecular dynamics coupled to a quantum thermostat. Our simulations indicate that the main photoproducts after 2 ps of dynamics are CO + cyclopropane (50%), CO + propene (10%), and ethene and ketene (34%). The photoexcited cyclobutanone in its second excited electronic state S2 can follow two pathways for its nonradiative decay: (i) a ring-opening in S2 and a subsequent rapid decay to the ground electronic state, where the photoproducts are formed, or (ii) a transfer through a closed-ring conical intersection to S1, where cyclobutanone ring opens and then funnels to the ground state. Lifetimes for the photoproduct and electronic populations were determined. We calculated a stationary MeV-UED signal [difference pair distribution function-ΔPDF(r)] for each (interpolated) pathway as well as a time-resolved signal [ΔPDF(r,t) and ΔI/I(s,t)] for the full swarm of (NA+BO)MD trajectories. Furthermore, our analysis provides time-independent basis functions that can be used to fit the time-dependent experimental UED signals [both ΔPDF(r,t) and ΔI/I(s,t)] and potentially recover the population of photoproducts. We also offer a detailed analysis of the limitations of our model and their potential impact on the predicted experimental signals.
Collapse
Affiliation(s)
- Jiří Janoš
- Department of Physical Chemistry, University of Chemistry and Technology, Technická 5, Prague 6 166 28, Czech Republic
- Centre for Computational Chemistry, School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom
| | | | - Daniel Hollas
- Centre for Computational Chemistry, School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom
| | - Petr Slavíček
- Department of Physical Chemistry, University of Chemistry and Technology, Technická 5, Prague 6 166 28, Czech Republic
| | - Basile F E Curchod
- Centre for Computational Chemistry, School of Chemistry, University of Bristol, Bristol BS8 1TS, United Kingdom
| |
Collapse
|
8
|
Serrador L, Villani FP, Moccia S, Santos CP. Knowledge distillation on individual vertebrae segmentation exploiting 3D U-Net. Comput Med Imaging Graph 2024; 113:102350. [PMID: 38340574 DOI: 10.1016/j.compmedimag.2024.102350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 02/01/2024] [Accepted: 02/01/2024] [Indexed: 02/12/2024]
Abstract
Recent advances in medical imaging have highlighted the critical development of algorithms for individual vertebral segmentation on computed tomography (CT) scans. Essential for diagnostic accuracy and treatment planning in orthopaedics, neurosurgery and oncology, these algorithms face challenges in clinical implementation, including integration into healthcare systems. Consequently, our focus lies in exploring the application of knowledge distillation (KD) methods to train shallower networks capable of efficiently segmenting vertebrae in CT scans. This approach aims to reduce segmentation time, enhance suitability for emergency cases, and optimize computational and memory resource efficiency. Building upon prior research in the field, a two-step segmentation approach was employed. Firstly, the spine's location was determined by predicting a heatmap, indicating the probability of each voxel belonging to the spine. Subsequently, an iterative segmentation of vertebrae was performed from the top to the bottom of the CT volume over the located spine, using a memory instance to record the already segmented vertebrae. KD methods were implemented by training a teacher network with performance similar to that found in the literature, and this knowledge was distilled to a shallower network (student). Two KD methods were applied: (1) using the soft outputs of both networks and (2) matching logits. Two publicly available datasets, comprising 319 CT scans from 300 patients and a total of 611 cervical, 2387 thoracic, and 1507 lumbar vertebrae, were used. To ensure dataset balance and robustness, effective data augmentation methods were applied, including cleaning the memory instance to replicate the first vertebra segmentation. The teacher network achieved an average Dice similarity coefficient (DSC) of 88.22% and a Hausdorff distance (HD) of 7.71 mm, showcasing performance similar to other approaches in the literature. Through knowledge distillation from the teacher network, the student network's performance improved, with an average DSC increasing from 75.78% to 84.70% and an HD decreasing from 15.17 mm to 8.08 mm. Compared to other methods, our teacher network exhibited up to 99.09% fewer parameters, 90.02% faster inference time, 88.46% shorter total segmentation time, and 89.36% less associated carbon (CO2) emission rate. Regarding our student network, it featured 75.00% fewer parameters than our teacher, resulting in a 36.15% reduction in inference time, a 33.33% decrease in total segmentation time, and a 42.96% reduction in CO2 emissions. This study marks the first exploration of applying KD to the problem of individual vertebrae segmentation in CT, demonstrating the feasibility of achieving comparable performance to existing methods using smaller neural networks.
Collapse
Affiliation(s)
- Luís Serrador
- Center for MicroElectroMechanical Systems (CMEMS), University of Minho, Guimaraes, Portugal; Clinical Academic Center of Braga (2CA-Braga), Hospital of Braga, Braga, Portugal.
| | | | - Sara Moccia
- The BioRobotics Institute and Department of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Italy
| | - Cristina P Santos
- Center for MicroElectroMechanical Systems (CMEMS), University of Minho, Guimaraes, Portugal; Clinical Academic Center of Braga (2CA-Braga), Hospital of Braga, Braga, Portugal
| |
Collapse
|
9
|
Moyano-Fernández C, Rueda J, Delgado J, Ausín T. May Artificial Intelligence take health and sustainability on a honeymoon? Towards green technologies for multidimensional health and environmental justice. Glob Bioeth 2024; 35:2322208. [PMID: 38476503 PMCID: PMC10930144 DOI: 10.1080/11287462.2024.2322208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 02/19/2024] [Indexed: 03/14/2024] Open
Abstract
The application of Artificial Intelligence (AI) in healthcare and epidemiology undoubtedly has many benefits for the population. However, due to its environmental impact, the use of AI can produce social inequalities and long-term environmental damages that may not be thoroughly contemplated. In this paper, we propose to consider the impacts of AI applications in medical care from the One Health paradigm and long-term global health. From health and environmental justice, rather than settling for a short and fleeting green honeymoon between health and sustainability caused by AI, it should aim for a lasting marriage. To this end, we conclude by proposing that, in the upcoming years, it could be valuable and necessary to promote more interconnected health, call for environmental cost transparency, and increase green responsibility. Highlights Using AI in medicine and epidemiology has some benefits in the short term.AI usage may cause social inequalities and environmental damage in the long term.Health justice should be rethought from the One Health perspective.Going beyond anthropocentric and myopic cost-benefit analysis would expand health justice to include an environmental dimension.Greening AI would help to reconcile public and global health measures.
Collapse
Affiliation(s)
| | - Jon Rueda
- FiloLab Scientific Unit of Excellence, University of Granada, Granada, Spain
| | - Janet Delgado
- Department of Philosophy 1, Faculty of Philosophy, University of Granada, Granada, Spain
| | - Txetxu Ausín
- Institute of Philosophy, Spanish National Research Council, Madrid, Spain
| |
Collapse
|
10
|
Lannelongue L, Inouye M. Pitfalls of machine learning models for protein-protein interaction networks. Bioinformatics 2024; 40:btae012. [PMID: 38200587 PMCID: PMC10868344 DOI: 10.1093/bioinformatics/btae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/24/2023] [Accepted: 01/09/2024] [Indexed: 01/12/2024] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools, based on classic machine learning, have been successful at predicting PPIs in silico, but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and discrepancies between algorithms that remain unexplained. RESULTS To better understand the underlying inference mechanisms that underpin these models, we designed an open-source framework for benchmarking that accounts for a range of biological and statistical pitfalls while facilitating reproducibility. We use it to shed light on the impact of network topology and how different algorithms deal with highly connected proteins. By studying functional genomics-based and sequence-based models on human PPIs, we show their complementarity as the former performs best on lone proteins while the latter specializes in interactions involving hubs. We also show that algorithm design has little impact on performance with functional genomic data. We replicate our results between both human and S. cerevisiae data and demonstrate that models using functional genomics are better suited to PPI prediction across species. With rapidly increasing amounts of sequence and functional genomics data, our study provides a principled foundation for future construction, comparison, and application of PPI networks. AVAILABILITY AND IMPLEMENTATION The code and data are available on GitHub: https://github.com/Llannelongue/B4PPI.
Collapse
Affiliation(s)
- Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, 3004 Victoria, Australia
- British Heart Foundation Centre of Research Excellence, University of Cambridge, CB2 0BB Cambridge, United Kingdom
| |
Collapse
|
11
|
Doo FX, Vosshenrich J, Cook TS, Moy L, Almeida EP, Woolen SA, Gichoya JW, Heye T, Hanneman K. Environmental Sustainability and AI in Radiology: A Double-Edged Sword. Radiology 2024; 310:e232030. [PMID: 38411520 PMCID: PMC10902597 DOI: 10.1148/radiol.232030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 10/21/2023] [Accepted: 11/17/2023] [Indexed: 02/28/2024]
Abstract
According to the World Health Organization, climate change is the single biggest health threat facing humanity. The global health care system, including medical imaging, must manage the health effects of climate change while at the same time addressing the large amount of greenhouse gas (GHG) emissions generated in the delivery of care. Data centers and computational efforts are increasingly large contributors to GHG emissions in radiology. This is due to the explosive increase in big data and artificial intelligence (AI) applications that have resulted in large energy requirements for developing and deploying AI models. However, AI also has the potential to improve environmental sustainability in medical imaging. For example, use of AI can shorten MRI scan times with accelerated acquisition times, improve the scheduling efficiency of scanners, and optimize the use of decision-support tools to reduce low-value imaging. The purpose of this Radiology in Focus article is to discuss this duality at the intersection of environmental sustainability and AI in radiology. Further discussed are strategies and opportunities to decrease AI-related emissions and to leverage AI to improve sustainability in radiology, with a focus on health equity. Co-benefits of these strategies are explored, including lower cost and improved patient outcomes. Finally, knowledge gaps and areas for future research are highlighted.
Collapse
Affiliation(s)
- Florence X. Doo
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland,
Baltimore, MD (F.X.D.); Department of Radiology, University Hospital Basel,
Basel, Switzerland (J.V., T.H.); Department of Radiology, New York University,
New York, NY (J.V., L.M.); Department of Radiology, Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pa (T.S.C.); Joint Department
of Medical Imaging, University Health Network, Toronto, Ontario, Canada
(E.P.R.P.A., K.H.); Department of Radiology and Biomedical Imaging, University
of California San Francisco, San Francisco, Calif (S.A.W.); Department of
Radiology and Imaging Sciences, Emory University, Atlanta, Ga (J.W.G.); Toronto
General Hospital Research Institute, University Health Network, University of
Toronto, 585 University Ave, 1 PMB-298, Toronto, ON, Cananda M5G 2N2 (K.H.); and
Department of Medical Imaging, University Medical Imaging Toronto, University of
Toronto, Toronto, Ontario, Canada (K.H.)
| | - Jan Vosshenrich
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland,
Baltimore, MD (F.X.D.); Department of Radiology, University Hospital Basel,
Basel, Switzerland (J.V., T.H.); Department of Radiology, New York University,
New York, NY (J.V., L.M.); Department of Radiology, Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pa (T.S.C.); Joint Department
of Medical Imaging, University Health Network, Toronto, Ontario, Canada
(E.P.R.P.A., K.H.); Department of Radiology and Biomedical Imaging, University
of California San Francisco, San Francisco, Calif (S.A.W.); Department of
Radiology and Imaging Sciences, Emory University, Atlanta, Ga (J.W.G.); Toronto
General Hospital Research Institute, University Health Network, University of
Toronto, 585 University Ave, 1 PMB-298, Toronto, ON, Cananda M5G 2N2 (K.H.); and
Department of Medical Imaging, University Medical Imaging Toronto, University of
Toronto, Toronto, Ontario, Canada (K.H.)
| | - Tessa S. Cook
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland,
Baltimore, MD (F.X.D.); Department of Radiology, University Hospital Basel,
Basel, Switzerland (J.V., T.H.); Department of Radiology, New York University,
New York, NY (J.V., L.M.); Department of Radiology, Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pa (T.S.C.); Joint Department
of Medical Imaging, University Health Network, Toronto, Ontario, Canada
(E.P.R.P.A., K.H.); Department of Radiology and Biomedical Imaging, University
of California San Francisco, San Francisco, Calif (S.A.W.); Department of
Radiology and Imaging Sciences, Emory University, Atlanta, Ga (J.W.G.); Toronto
General Hospital Research Institute, University Health Network, University of
Toronto, 585 University Ave, 1 PMB-298, Toronto, ON, Cananda M5G 2N2 (K.H.); and
Department of Medical Imaging, University Medical Imaging Toronto, University of
Toronto, Toronto, Ontario, Canada (K.H.)
| | - Linda Moy
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland,
Baltimore, MD (F.X.D.); Department of Radiology, University Hospital Basel,
Basel, Switzerland (J.V., T.H.); Department of Radiology, New York University,
New York, NY (J.V., L.M.); Department of Radiology, Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pa (T.S.C.); Joint Department
of Medical Imaging, University Health Network, Toronto, Ontario, Canada
(E.P.R.P.A., K.H.); Department of Radiology and Biomedical Imaging, University
of California San Francisco, San Francisco, Calif (S.A.W.); Department of
Radiology and Imaging Sciences, Emory University, Atlanta, Ga (J.W.G.); Toronto
General Hospital Research Institute, University Health Network, University of
Toronto, 585 University Ave, 1 PMB-298, Toronto, ON, Cananda M5G 2N2 (K.H.); and
Department of Medical Imaging, University Medical Imaging Toronto, University of
Toronto, Toronto, Ontario, Canada (K.H.)
| | - Eduardo P.R.P. Almeida
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland,
Baltimore, MD (F.X.D.); Department of Radiology, University Hospital Basel,
Basel, Switzerland (J.V., T.H.); Department of Radiology, New York University,
New York, NY (J.V., L.M.); Department of Radiology, Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pa (T.S.C.); Joint Department
of Medical Imaging, University Health Network, Toronto, Ontario, Canada
(E.P.R.P.A., K.H.); Department of Radiology and Biomedical Imaging, University
of California San Francisco, San Francisco, Calif (S.A.W.); Department of
Radiology and Imaging Sciences, Emory University, Atlanta, Ga (J.W.G.); Toronto
General Hospital Research Institute, University Health Network, University of
Toronto, 585 University Ave, 1 PMB-298, Toronto, ON, Cananda M5G 2N2 (K.H.); and
Department of Medical Imaging, University Medical Imaging Toronto, University of
Toronto, Toronto, Ontario, Canada (K.H.)
| | - Sean A. Woolen
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland,
Baltimore, MD (F.X.D.); Department of Radiology, University Hospital Basel,
Basel, Switzerland (J.V., T.H.); Department of Radiology, New York University,
New York, NY (J.V., L.M.); Department of Radiology, Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pa (T.S.C.); Joint Department
of Medical Imaging, University Health Network, Toronto, Ontario, Canada
(E.P.R.P.A., K.H.); Department of Radiology and Biomedical Imaging, University
of California San Francisco, San Francisco, Calif (S.A.W.); Department of
Radiology and Imaging Sciences, Emory University, Atlanta, Ga (J.W.G.); Toronto
General Hospital Research Institute, University Health Network, University of
Toronto, 585 University Ave, 1 PMB-298, Toronto, ON, Cananda M5G 2N2 (K.H.); and
Department of Medical Imaging, University Medical Imaging Toronto, University of
Toronto, Toronto, Ontario, Canada (K.H.)
| | - Judy Wawira Gichoya
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland,
Baltimore, MD (F.X.D.); Department of Radiology, University Hospital Basel,
Basel, Switzerland (J.V., T.H.); Department of Radiology, New York University,
New York, NY (J.V., L.M.); Department of Radiology, Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pa (T.S.C.); Joint Department
of Medical Imaging, University Health Network, Toronto, Ontario, Canada
(E.P.R.P.A., K.H.); Department of Radiology and Biomedical Imaging, University
of California San Francisco, San Francisco, Calif (S.A.W.); Department of
Radiology and Imaging Sciences, Emory University, Atlanta, Ga (J.W.G.); Toronto
General Hospital Research Institute, University Health Network, University of
Toronto, 585 University Ave, 1 PMB-298, Toronto, ON, Cananda M5G 2N2 (K.H.); and
Department of Medical Imaging, University Medical Imaging Toronto, University of
Toronto, Toronto, Ontario, Canada (K.H.)
| | - Tobias Heye
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland,
Baltimore, MD (F.X.D.); Department of Radiology, University Hospital Basel,
Basel, Switzerland (J.V., T.H.); Department of Radiology, New York University,
New York, NY (J.V., L.M.); Department of Radiology, Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pa (T.S.C.); Joint Department
of Medical Imaging, University Health Network, Toronto, Ontario, Canada
(E.P.R.P.A., K.H.); Department of Radiology and Biomedical Imaging, University
of California San Francisco, San Francisco, Calif (S.A.W.); Department of
Radiology and Imaging Sciences, Emory University, Atlanta, Ga (J.W.G.); Toronto
General Hospital Research Institute, University Health Network, University of
Toronto, 585 University Ave, 1 PMB-298, Toronto, ON, Cananda M5G 2N2 (K.H.); and
Department of Medical Imaging, University Medical Imaging Toronto, University of
Toronto, Toronto, Ontario, Canada (K.H.)
| | - Kate Hanneman
- From the University of Maryland Medical Intelligent Imaging (UM2ii)
Center, Department of Radiology and Nuclear Medicine, University of Maryland,
Baltimore, MD (F.X.D.); Department of Radiology, University Hospital Basel,
Basel, Switzerland (J.V., T.H.); Department of Radiology, New York University,
New York, NY (J.V., L.M.); Department of Radiology, Perelman School of Medicine
at the University of Pennsylvania, Philadelphia, Pa (T.S.C.); Joint Department
of Medical Imaging, University Health Network, Toronto, Ontario, Canada
(E.P.R.P.A., K.H.); Department of Radiology and Biomedical Imaging, University
of California San Francisco, San Francisco, Calif (S.A.W.); Department of
Radiology and Imaging Sciences, Emory University, Atlanta, Ga (J.W.G.); Toronto
General Hospital Research Institute, University Health Network, University of
Toronto, 585 University Ave, 1 PMB-298, Toronto, ON, Cananda M5G 2N2 (K.H.); and
Department of Medical Imaging, University Medical Imaging Toronto, University of
Toronto, Toronto, Ontario, Canada (K.H.)
| |
Collapse
|
12
|
Maier-Hein L, Reinke A, Godau P, Tizabi MD, Buettner F, Christodoulou E, Glocker B, Isensee F, Kleesiek J, Kozubek M, Reyes M, Riegler MA, Wiesenfarth M, Kavur AE, Sudre CH, Baumgartner M, Eisenmann M, Heckmann-Nötzel D, Rädsch T, Acion L, Antonelli M, Arbel T, Bakas S, Benis A, Blaschko MB, Cardoso MJ, Cheplygina V, Cimini BA, Collins GS, Farahani K, Ferrer L, Galdran A, van Ginneken B, Haase R, Hashimoto DA, Hoffman MM, Huisman M, Jannin P, Kahn CE, Kainmueller D, Kainz B, Karargyris A, Karthikesalingam A, Kofler F, Kopp-Schneider A, Kreshuk A, Kurc T, Landman BA, Litjens G, Madani A, Maier-Hein K, Martel AL, Mattson P, Meijering E, Menze B, Moons KGM, Müller H, Nichyporuk B, Nickel F, Petersen J, Rajpoot N, Rieke N, Saez-Rodriguez J, Sánchez CI, Shetty S, van Smeden M, Summers RM, Taha AA, Tiulpin A, Tsaftaris SA, Van Calster B, Varoquaux G, Jäger PF. Metrics reloaded: recommendations for image analysis validation. Nat Methods 2024; 21:195-212. [PMID: 38347141 PMCID: PMC11182665 DOI: 10.1038/s41592-023-02151-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 12/12/2023] [Indexed: 02/15/2024]
Abstract
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.
Collapse
Affiliation(s)
- Lena Maier-Hein
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany.
- German Cancer Research Center (DKFZ) Heidelberg, HI Helmholtz Imaging, Heidelberg, Germany.
- Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany.
- Medical Faculty, Heidelberg University, Heidelberg, Germany.
- National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and University Medical Center Heidelberg, Heidelberg, Germany.
| | - Annika Reinke
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany.
- German Cancer Research Center (DKFZ) Heidelberg, HI Helmholtz Imaging, Heidelberg, Germany.
- Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany.
| | - Patrick Godau
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany
- Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and University Medical Center Heidelberg, Heidelberg, Germany
| | - Minu D Tizabi
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and University Medical Center Heidelberg, Heidelberg, Germany
| | - Florian Buettner
- German Cancer Consortium (DKTK), partner site Frankfurt/Mainz, a partnership between DKFZ and UCT Frankfurt-Marburg, Frankfurt am Main, Germany
- German Cancer Research Center (DKFZ) Heidelberg, Heidelberg, Germany
- Department of Medicine, Goethe University Frankfurt, Frankfurt am Main, Germany
- Department of Informatics, Goethe University Frankfurt, Frankfurt am Main, Germany
- Frankfurt Cancer Insititute, Frankfurt am Main, Germany
| | - Evangelia Christodoulou
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany
| | - Ben Glocker
- Department of Computing, Imperial College London, South Kensington Campus, London, UK
| | - Fabian Isensee
- German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Heidelberg, Germany
- German Cancer Research Center (DKFZ) Heidelberg, HI Applied Computer Vision Lab, Heidelberg, Germany
| | - Jens Kleesiek
- Institute for AI in Medicine, University Medicine Essen, Essen, Germany
| | - Michal Kozubek
- Centre for Biomedical Image Analysis and Faculty of Informatics, Masaryk University, Brno, Czech Republic
| | - Mauricio Reyes
- ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland
- Department of Radiation Oncology, University Hospital Bern, University of Bern, Bern, Switzerland
| | - Michael A Riegler
- Simula Metropolitan Center for Digital Engineering, Oslo, Norway
- Department of Computer Science, UiT The Arctic University of Norway, Tromsø, Norway
| | - Manuel Wiesenfarth
- German Cancer Research Center (DKFZ) Heidelberg, Division of Biostatistics, Heidelberg, Germany
| | - A Emre Kavur
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany
- German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Heidelberg, Germany
- German Cancer Research Center (DKFZ) Heidelberg, HI Applied Computer Vision Lab, Heidelberg, Germany
| | - Carole H Sudre
- MRC Unit for Lifelong Health and Ageing at UCL and Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
- School of Biomedical Engineering and Imaging Science, King's College London, London, UK
| | - Michael Baumgartner
- German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Heidelberg, Germany
| | - Matthias Eisenmann
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany
| | - Doreen Heckmann-Nötzel
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany
- National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and University Medical Center Heidelberg, Heidelberg, Germany
| | - Tim Rädsch
- German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems, Heidelberg, Germany
- German Cancer Research Center (DKFZ) Heidelberg, HI Helmholtz Imaging, Heidelberg, Germany
| | - Laura Acion
- Instituto de Cálculo, CONICET - Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Michela Antonelli
- School of Biomedical Engineering and Imaging Science, King's College London, London, UK
- Centre for Medical Image Computing, University College London, London, UK
| | - Tal Arbel
- Centre for Intelligent Machines and MILA (Québec Artificial Intelligence Institute), McGill University, Montréal, Quebec, Canada
| | - Spyridon Bakas
- Division of Computational Pathology, Department of Pathology & Laboratory Medicine, Indiana University School of Medicine, IU Health Information and Translational Sciences Building, Indianapolis, IN, USA
- Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania, Philadelphia, PA, USA
| | - Arriel Benis
- Department of Digital Medical Technologies, Holon Institute of Technology, Holon, Israel
- European Federation for Medical Informatics, Le Mont-sur-Lausanne, Switzerland
| | - Matthew B Blaschko
- Center for Processing Speech and Images, Department of Electrical Engineering, KU Leuven, Leuven, Belgium
| | - M Jorge Cardoso
- School of Biomedical Engineering and Imaging Science, King's College London, London, UK
| | - Veronika Cheplygina
- Department of Computer Science, IT University of Copenhagen, Copenhagen, Denmark
| | - Beth A Cimini
- Imaging Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gary S Collins
- Centre for Statistics in Medicine, University of Oxford, Nuffield Orthopaedic Centre, Oxford, UK
| | - Keyvan Farahani
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Bethesda, MD, USA
| | - Luciana Ferrer
- Instituto de Investigación en Ciencias de la Computación (ICC), CONICET-UBA, Ciudad Autónoma de Buenos Aires, Buenos Aires, Argentina
| | - Adrian Galdran
- BCN Medtech, Universitat Pompeu Fabra, Barcelona, Spain
- Australian Institute for Machine Learning AIML, University of Adelaide, Adelaide, South Australia, Australia
| | - Bram van Ginneken
- Fraunhofer MEVIS, Bremen, Germany
- Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Robert Haase
- Technische Universität (TU) Dresden, DFG Cluster of Excellence 'Physics of Life', Dresden, Germany
- Center for Systems Biology, Dresden, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Leipzig University, Leipzig, Germany
| | - Daniel A Hashimoto
- Department of Surgery, Perelman School of Medicine, Philadelphia, PA, USA
- General Robotics Automation Sensing and Perception Laboratory, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael M Hoffman
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Merel Huisman
- Department of Radiology and Nuclear Medicine, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Pierre Jannin
- Laboratoire Traitement du Signal et de l'Image - UMR_S 1099, Université de Rennes 1, Rennes, France
- INSERM, Paris, France
| | - Charles E Kahn
- Department of Radiology and Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Dagmar Kainmueller
- Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Biomedical Image Analysis and HI Helmholtz Imaging, Berlin, Germany
- Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Bernhard Kainz
- Department of Computing, Faculty of Engineering, Imperial College London, London, UK
- Department AIBE, Friedrich-Alexander-Universität (FAU), Erlangen-Nürnberg, Germany
| | | | | | | | - Annette Kopp-Schneider
- German Cancer Research Center (DKFZ) Heidelberg, Division of Biostatistics, Heidelberg, Germany
| | - Anna Kreshuk
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Tahsin Kurc
- Department of Biomedical Informatics, Stony Brook University, Health Science Center, Stony Brook, NY, USA
| | | | - Geert Litjens
- Department of Pathology, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Amin Madani
- Department of Surgery, University Health Network, Philadelphia, PA, USA
| | - Klaus Maier-Hein
- German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Heidelberg, Germany
- Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany
| | - Anne L Martel
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Physical Sciences, Sunnybrook Research Institute, Toronto, Ontario, Canada
| | - Peter Mattson
- Google, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Erik Meijering
- School of Computer Science and Engineering, University of New South Wales, UNSW Sydney, Kensington, New South Wales, Australia
| | - Bjoern Menze
- Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Henning Müller
- Information Systems Institute, University of Applied Sciences Western Switzerland (HES-SO), Sierre, Switzerland
- Medical Faculty, University of Geneva, Geneva, Switzerland
| | - Brennan Nichyporuk
- MILA (Québec Artificial Intelligence Institute), Montréal, Quebec, Canada
| | - Felix Nickel
- Department of General, Visceral and Thoracic Surgery, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Jens Petersen
- German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Heidelberg, Germany
| | - Nasir Rajpoot
- Tissue Image Analytics Laboratory, Department of Computer Science, University of Warwick, Coventry, UK
| | | | - Julio Saez-Rodriguez
- Institute for Computational Biomedicine, Heidelberg University, Heidelberg, Germany
- Faculty of Medicine, Heidelberg University Hospital, Heidelberg, Germany
| | - Clara I Sánchez
- Informatics Institute, Faculty of Science, University of Amsterdam, Amsterdam, the Netherlands
| | | | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, the Netherlands
| | - Ronald M Summers
- National Institutes of Health Clinical Center, Bethesda, MD, USA
| | - Abdel A Taha
- Institute of Information Systems Engineering, TU Wien, Vienna, Austria
| | - Aleksei Tiulpin
- Research Unit of Health Sciences and Technology, Faculty of Medicine, University of Oulu, Oulu, Finland
- Neurocenter Oulu, Oulu University Hospital, Oulu, Finland
| | | | - Ben Van Calster
- Department of Development and Regeneration and EPI-centre, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
| | - Gaël Varoquaux
- Parietal project team, INRIA Saclay-Île de France, Palaiseau, France
| | - Paul F Jäger
- German Cancer Research Center (DKFZ) Heidelberg, HI Helmholtz Imaging, Heidelberg, Germany.
- German Cancer Research Center (DKFZ) Heidelberg, Interactive Machine Learning Group, Heidelberg, Germany.
| |
Collapse
|
13
|
Wilkinson R, Mleczko MM, Brewin RJW, Gaston KJ, Mueller M, Shutler JD, Yan X, Anderson K. Environmental impacts of earth observation data in the constellation and cloud computing era. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 909:168584. [PMID: 37979853 DOI: 10.1016/j.scitotenv.2023.168584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 11/10/2023] [Accepted: 11/12/2023] [Indexed: 11/20/2023]
Abstract
Numbers of Earth Observation (EO) satellites have increased exponentially over the past decade reaching the current population of 1193 (January 2023). Consequently, EO data volumes have mushroomed and data storage and processing have migrated to the cloud. Whilst attention has been given to the launch and in-orbit environmental impacts of satellites, EO data environmental footprints have been overlooked. These issues require urgent attention given data centre water and energy consumption, high carbon emissions for computer component manufacture, and difficulty of recycling computer components. Doing so is essential if the environmental good of EO is to withstand scrutiny. We provide the first assessment of the EO data life-cycle and estimate that the current size of the global EO data collection is ~807 PB, increasing by ~100 PB/year. Storage of this data volume generates annual CO2 equivalent emissions of 4101 t. Major state-funded EO providers use 57 of their own data centres globally, and a further 178 private cloud services, with considerable duplication of datasets across repositories. We explore scenarios for the environmental cost of performing EO functions on the cloud compared to desktop machines. A simple band arithmetic function applied to a Landsat 9 scene using Google Earth Engine (GEE) generated CO2 equivalent (e) emissions of 0.042-0.69 g CO2e (locally) and 0.13-0.45 g CO2e (European data centre; values multiply by nine for Australian data centre). Computation-based emissions scale rapidly for more intense processes and when testing code. When using cloud services such as GEE, users have no choice about the data centre used and we push for EO providers to be more transparent about the location-specific impacts of EO work, and to provide tools for measuring the environmental cost of cloud computation. The EO community as a whole needs to critically consider the broad suite of EO data life-cycle impacts.
Collapse
Affiliation(s)
- R Wilkinson
- Environment and Sustainability Institute, University of Exeter, Penryn Campus, Cornwall TR10 9FE, United Kingdom
| | - M M Mleczko
- Environment and Sustainability Institute, University of Exeter, Penryn Campus, Cornwall TR10 9FE, United Kingdom
| | - R J W Brewin
- Department of Earth and Environmental Science, University of Exeter, Penryn Campus, Cornwall TR10 9FE, United Kingdom
| | - K J Gaston
- Environment and Sustainability Institute, University of Exeter, Penryn Campus, Cornwall TR10 9FE, United Kingdom
| | - M Mueller
- Environment and Sustainability Institute, University of Exeter, Penryn Campus, Cornwall TR10 9FE, United Kingdom
| | - J D Shutler
- Department of Earth and Environmental Science, University of Exeter, Penryn Campus, Cornwall TR10 9FE, United Kingdom
| | - X Yan
- Environment and Sustainability Institute, University of Exeter, Penryn Campus, Cornwall TR10 9FE, United Kingdom
| | - K Anderson
- Environment and Sustainability Institute, University of Exeter, Penryn Campus, Cornwall TR10 9FE, United Kingdom.
| |
Collapse
|
14
|
Samuel S, Mietchen D. Computational reproducibility of Jupyter notebooks from biomedical publications. Gigascience 2024; 13:giad113. [PMID: 38206590 PMCID: PMC10783158 DOI: 10.1093/gigascience/giad113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 08/09/2023] [Accepted: 12/08/2023] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. APPROACH We address computational reproducibility at 2 levels: (i) using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks associated with publications indexed in the biomedical literature repository PubMed Central. We identified such notebooks by mining the article's full text, trying to locate them on GitHub, and attempting to rerun them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. (ii) This study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over the course of 2 years, during which the corpus of Jupyter notebooks from articles indexed in PubMed Central has grown in a highly dynamic fashion. RESULTS Out of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 publications, 22,578 notebooks were written in Python, including 15,817 that had their dependencies declared in standard requirement files and that we attempted to rerun automatically. For 10,388 of these, all declared dependencies could be installed successfully, and we reran them to assess reproducibility. Of these, 1,203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. CONCLUSIONS We zoom in on common problems and practices, highlight trends, and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.
Collapse
Affiliation(s)
- Sheeba Samuel
- Heinz-Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena, Jena 07743, Germany
- Michael Stifel Center Jena, Jena 07743, Germany
| | - Daniel Mietchen
- Ronin Institute, Montclair 07043-2314, NJ, United States
- Institute for Globally Distributed Open Research and Education (IGDORE)
- FIZ Karlsruhe—Leibniz Institute for Information Infrastructure, Berlin 76344, Germany
| |
Collapse
|
15
|
Mehmood Y, Bajwa UI. Brain tumor grade classification using the ConvNext architecture. Digit Health 2024; 10:20552076241284920. [PMID: 39372816 PMCID: PMC11452878 DOI: 10.1177/20552076241284920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 09/02/2024] [Indexed: 10/08/2024] Open
Abstract
Objective Brain tumor grade is an important aspect of brain tumor diagnosis and helps to plan for treatment. Traditional methods of diagnosis, including biopsy and manual examination of medical images, are either invasive or may result in inaccurate diagnoses. This study proposes a brain tumor grade classification technique using a modern convolutional neural network (CNN) architecture called ConvNext that inputs magnetic resonance imaging (MRI) data. Methods Deep learning-based techniques are replacing invasive procedures for consistent, accurate, and non-invasive diagnosis of brain tumors. A well-known challenge of using deep learning architectures in medical imaging is data scarcity. Modern-day architectures have huge trainable parameters and require massive datasets to achieve the desired accuracy and avoid overfitting. Therefore, transfer learning is popular among researchers using medical imaging data. Recently, transformer-based architectures have surpassed CNNs for image data. However, recently proposed CNNs have achieved superior accuracy by introducing some tweaks inspired by vision transformers. This study proposed a technique to extract features from the ConvNext architecture and feed these features to a fully connected neural network for final classification. Results The proposed study achieved state-of-the-art performance on the BraTS 2019 dataset using pre-trained ConvNext. The best accuracy of 99.5% was achieved when three MRI sequences were input as three channels of the pre-trained CNN. Conclusion The study demonstrated the efficacy of the representations learned by a modern CNN architecture, which has a higher inductive bias for the image data than vision transformers for brain tumor grade classification.
Collapse
Affiliation(s)
- Yasar Mehmood
- Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Punjab, Pakistan
| | - Usama Ijaz Bajwa
- Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Punjab, Pakistan
| |
Collapse
|
16
|
Fay CD, Corcoran B, Diamond D. Green IoT Event Detection for Carbon-Emission Monitoring in Sensor Networks. SENSORS (BASEL, SWITZERLAND) 2023; 24:162. [PMID: 38203023 PMCID: PMC10781252 DOI: 10.3390/s24010162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 12/23/2023] [Accepted: 12/26/2023] [Indexed: 01/12/2024]
Abstract
This research addresses the intersection of low-power microcontroller technology and binary classification of events in the context of carbon-emission reduction. The study introduces an innovative approach leveraging microcontrollers for real-time event detection in a homogeneous hardware/firmware manner and faced with limited resources. This showcases their efficiency in processing sensor data and reducing power consumption without the need for extensive training sets. Two case studies focusing on landfill CO2 emissions and home energy usage demonstrate the feasibility and effectiveness of this approach. The findings highlight significant power savings achieved by minimizing data transmission during non-event periods (94.8-99.8%), in addition to presenting a sustainable alternative to traditional resource-intensive AI/ML platforms that comparatively draw and produce 20,000 times the amount of power and carbon emissions, respectively.
Collapse
Affiliation(s)
- Cormac D. Fay
- SMART Infrastructure Facility, Engineering and Information Sciences, University of Wollongong, Wollongong, NSW 2522, Australia
| | - Brian Corcoran
- School of Mechanical and Manufacturing Engineering, Faculty of Engineering and Computing, Dublin City University, Glasnevin, D09 V209 Dublin, Ireland;
| | - Dermot Diamond
- Insight Centre for Data Analytics, Dublin City University, Glasnevin, D09 V209 Dublin, Ireland;
| |
Collapse
|
17
|
Lannelongue L, Inouye M. Environmental Impacts of Machine Learning Applications in Protein Science. Cold Spring Harb Perspect Biol 2023; 15:a041473. [PMID: 38040454 PMCID: PMC10691472 DOI: 10.1101/cshperspect.a041473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2023]
Abstract
Computing tools and machine learning models play an increasingly important role in biology and are now an essential part of discoveries in protein science. The growing energy needs of modern algorithms have raised concerns in the computational science community in light of the climate emergency. In this work, we summarize the different ways in which protein science can negatively impact the environment and we present the carbon footprint of some popular protein algorithms: molecular simulations, inference of protein-protein interactions, and protein structure prediction. We show that large deep learning models such as AlphaFold and ESMFold can have carbon footprints reaching over 100 tonnes of CO2e in some cases. The magnitude of these impacts highlights the importance of monitoring and mitigating them, and we list actions scientists can take to achieve more sustainable protein computational science.
Collapse
Affiliation(s)
- Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB2 0SR, United Kingdom
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB2 0SR, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0BB, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB2 0SR, United Kingdom
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB2 0SR, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0BB, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge CB2 0BB, United Kingdom
| |
Collapse
|
18
|
Bouza L, Bugeau A, Lannelongue L. How to estimate carbon footprint when training deep learning models? A guide and review. ENVIRONMENTAL RESEARCH COMMUNICATIONS 2023; 5:115014. [PMID: 38022395 PMCID: PMC10661046 DOI: 10.1088/2515-7620/acf81b] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 08/23/2023] [Accepted: 09/08/2023] [Indexed: 12/01/2023]
Abstract
Machine learning and deep learning models have become essential in the recent fast development of artificial intelligence in many sectors of the society. It is now widely acknowledge that the development of these models has an environmental cost that has been analyzed in many studies. Several online and software tools have been developed to track energy consumption while training machine learning models. In this paper, we propose a comprehensive introduction and comparison of these tools for AI practitioners wishing to start estimating the environmental impact of their work. We review the specific vocabulary, the technical requirements for each tool. We compare the energy consumption estimated by each tool on two deep neural networks for image processing and on different types of servers. From these experiments, we provide some advice for better choosing the right tool and infrastructure.
Collapse
Affiliation(s)
- Lucía Bouza
- Université Paris Cité, CNRS, MAP5 UMR 8145, 75006, Paris, France
| | - Aurélie Bugeau
- Univ. Bordeaux, Bordeaux INP, CNRS, LaBRI, Talence, France
- IUF, France
| | - Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
19
|
Seydel C. Greening the lab. Nat Methods 2023; 20:1449-1453. [PMID: 37730893 DOI: 10.1038/s41592-023-02024-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
|
20
|
Zhu Z, Tang R, Li C, An X, He L. Promises of Plasmonic Antenna-Reactor Systems in Gas-Phase CO 2 Photocatalysis. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2302568. [PMID: 37338243 PMCID: PMC10460874 DOI: 10.1002/advs.202302568] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Revised: 05/26/2023] [Indexed: 06/21/2023]
Abstract
Sunlight-driven photocatalytic CO2 reduction provides intriguing opportunities for addressing the energy and environmental crises faced by humans. The rational combination of plasmonic antennas and active transition metal-based catalysts, known as "antenna-reactor" (AR) nanostructures, allows the simultaneous optimization of optical and catalytic performances of photocatalysts, and thus holds great promise for CO2 photocatalysis. Such design combines the favorable absorption, radiative, and photochemical properties of the plasmonic components with the great catalytic potentials and conductivities of the reactor components. In this review, recent developments of photocatalysts based on plasmonic AR systems for various gas-phase CO2 reduction reactions with emphasis on the electronic structure of plasmonic and catalytic metals, plasmon-driven catalytic pathways, and the role of AR complex in photocatalytic processes are summarized. Perspectives in terms of challenges and future research in this area are also highlighted.
Collapse
Affiliation(s)
- Zhijie Zhu
- Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, 215123, P. R. China
| | - Rui Tang
- Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, 215123, P. R. China
| | - Chaoran Li
- Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, 215123, P. R. China
- Jiangsu Key Laboratory for Carbon-Based Functional Materials & Devices, Soochow University, Suzhou, Jiangsu, 215123, P. R. China
| | - Xingda An
- Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, 215123, P. R. China
- Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Soochow University, Suzhou, Jiangsu, 215123, P. R. China
| | - Le He
- Institute of Functional Nano & Soft Materials (FUNSOM), Soochow University, Suzhou, 215123, P. R. China
- Jiangsu Key Laboratory of Advanced Negative Carbon Technologies, Soochow University, Suzhou, Jiangsu, 215123, P. R. China
| |
Collapse
|
21
|
The carbon footprint of computational research. NATURE COMPUTATIONAL SCIENCE 2023; 3:659. [PMID: 38177321 DOI: 10.1038/s43588-023-00506-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2024]
|
22
|
Lannelongue L, Aronson HEG, Bateman A, Birney E, Caplan T, Juckes M, McEntyre J, Morris AD, Reilly G, Inouye M. GREENER principles for environmentally sustainable computational science. NATURE COMPUTATIONAL SCIENCE 2023; 3:514-521. [PMID: 38177425 DOI: 10.1038/s43588-023-00461-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 05/09/2023] [Indexed: 01/06/2024]
Abstract
The carbon footprint of scientific computing is substantial, but environmentally sustainable computational science (ESCS) is a nascent field with many opportunities to thrive. To realize the immense green opportunities and continued, yet sustainable, growth of computer science, we must take a coordinated approach to our current challenges, including greater awareness and transparency, improved estimation and wider reporting of environmental impacts. Here, we present a snapshot of where ESCS stands today and introduce the GREENER set of principles, as well as guidance for best practices moving forward.
Collapse
Affiliation(s)
- Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK.
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
| | | | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | | | - Martin Juckes
- RAL Space, Science and Technology Facilities Council, Harwell Campus, Didcot, UK
| | - Johanna McEntyre
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
| | | | | | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
- The Alan Turing Institute, London, UK
| |
Collapse
|
23
|
Xu Y, Ritchie SC, Liang Y, Timmers PRHJ, Pietzner M, Lannelongue L, Lambert SA, Tahir UA, May-Wilson S, Foguet C, Johansson Å, Surendran P, Nath AP, Persyn E, Peters JE, Oliver-Williams C, Deng S, Prins B, Luan J, Bomba L, Soranzo N, Di Angelantonio E, Pirastu N, Tai ES, van Dam RM, Parkinson H, Davenport EE, Paul DS, Yau C, Gerszten RE, Mälarstig A, Danesh J, Sim X, Langenberg C, Wilson JF, Butterworth AS, Inouye M. An atlas of genetic scores to predict multi-omic traits. Nature 2023; 616:123-131. [PMID: 36991119 PMCID: PMC10323211 DOI: 10.1038/s41586-023-05844-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 02/15/2023] [Indexed: 03/30/2023]
Abstract
The use of omic modalities to dissect the molecular underpinnings of common diseases and traits is becoming increasingly common. But multi-omic traits can be genetically predicted, which enables highly cost-effective and powerful analyses for studies that do not have multi-omics1. Here we examine a large cohort (the INTERVAL study2; n = 50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, n = 3,175; Olink, n = 4,822), plasma metabolomics (Metabolon HD4, n = 8,153), serum metabolomics (Nightingale, n = 37,359) and whole-blood Illumina RNA sequencing (n = 4,136), and use machine learning to train genetic scores for 17,227 molecular traits, including 10,521 that reach Bonferroni-adjusted significance. We evaluate the performance of genetic scores through external validation across cohorts of individuals of European, Asian and African American ancestries. In addition, we show the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of the UK Biobank3 to identify disease associations using a phenome-wide scan. We highlight a series of biological insights with regard to genetic mechanisms in metabolism and canonical pathway associations with disease; for example, JAK-STAT signalling and coronary atherosclerosis. Finally, we develop a portal ( https://www.omicspred.org/ ) to facilitate public access to all genetic scores and validation results, as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores.
Collapse
Affiliation(s)
- Yu Xu
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK.
| | - Scott C Ritchie
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Yujian Liang
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| | - Paul R H J Timmers
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Maik Pietzner
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge, UK
- Computational Medicine, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Precision Healthcare University Research Institute, Queen Mary University of London, London, UK
| | - Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Samuel A Lambert
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Usman A Tahir
- Division of Cardiovascular Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Sebastian May-Wilson
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK
| | - Carles Foguet
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Åsa Johansson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Praveen Surendran
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Artika P Nath
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Elodie Persyn
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
| | - James E Peters
- Department of Immunology and Inflammation, Faculty of Medicine, Imperial College London, London, UK
| | - Clare Oliver-Williams
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Shuliang Deng
- Division of Cardiovascular Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Bram Prins
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Jian'an Luan
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | - Lorenzo Bomba
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- BioMarin Pharmaceutical, Novato, CA, USA
| | - Nicole Soranzo
- British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- Department of Haematology, University of Cambridge, Cambridge, UK
- NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Emanuele Di Angelantonio
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Science Research Centre, Human Technopole, Milan, Italy
| | - Nicola Pirastu
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - E Shyong Tai
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
- Department of Medicine, National University of Singapore and National University Health System, Singapore, Singapore
| | - Rob M van Dam
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
- Departments of Exercise and Nutrition Sciences and Epidemiology, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Dirk S Paul
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
| | - Christopher Yau
- Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK
- Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Health Data Research UK, London, UK
| | - Robert E Gerszten
- Division of Cardiovascular Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Broad Institute of Harvard University and Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Anders Mälarstig
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Pfizer Worldwide Research, Development and Medical, Stockholm, Sweden
| | - John Danesh
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Xueling Sim
- Saw Swee Hock School of Public Health, National University of Singapore and National University Health System, Singapore, Singapore
| | - Claudia Langenberg
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge School of Clinical Medicine, Cambridge, UK
- Computational Medicine, Berlin Institute of Health (BIH) at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Precision Healthcare University Research Institute, Queen Mary University of London, London, UK
| | - James F Wilson
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, UK
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK
| | - Adam S Butterworth
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- NIHR Blood and Transplant Research Unit in Donor Health and Behaviour, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK.
- British Heart Foundation Centre of Research Excellence, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
- The Alan Turing Institute, London, UK.
| |
Collapse
|
24
|
Borges RM. A Braver New World? Of chatbots and other cognoscenti. J Biosci 2023. [DOI: 10.1007/s12038-023-00334-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
|
25
|
Eminaga O, Abbas M, Shen J, Laurie M, Brooks JD, Liao JC, Rubin DL. PlexusNet: A neural network architectural concept for medical image classification. Comput Biol Med 2023; 154:106594. [PMID: 36753979 DOI: 10.1016/j.compbiomed.2023.106594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 01/12/2023] [Accepted: 01/22/2023] [Indexed: 01/27/2023]
Abstract
State-of-the-art (SOTA) convolutional neural network models have been widely adapted in medical imaging and applied to address different clinical problems. However, the complexity and scale of such models may not be justified in medical imaging and subject to the available resource budget. Further increasing the number of representative feature maps for the classification task decreases the model explainability. The current data normalization practice is fixed prior to model development and discounting the specification of the data domain. Acknowledging these issues, the current work proposed a new scalable model family called PlexusNet; the block architecture and model scaling by the network's depth, width, and branch regulate PlexusNet's architecture. The efficient computation costs outlined the dimensions of PlexusNet scaling and design. PlexusNet includes a new learnable data normalization algorithm for better data generalization. We applied a simple yet effective neural architecture search to design PlexusNet tailored to five clinical classification problems that achieve a performance noninferior to the SOTA models ResNet-18 and EfficientNet B0/1. It also does so with lower parameter capacity and representative feature maps in ten-fold ranges than the smallest SOTA models with comparable performance. The visualization of representative features revealed distinguishable clusters associated with categories based on latent features generated by PlexusNet. The package and source code are at https://github.com/oeminaga/PlexusNet.git.
Collapse
Affiliation(s)
- Okyaz Eminaga
- Center for Artificial Intelligence in Medicine & Imaging and Department of Urology, Stanford School of Medicine, Stanford, CA, 94305, USA; Department of Urology, Stanford School of Medicine, Stanford, CA, 94305, USA
| | - Mahmoud Abbas
- Department of Pathology, University of Muenster, Muenster, Germany.
| | - Jeanne Shen
- Department of Pathology, Stanford School of Medicine, Stanford, CA, 94305, USA.
| | - Mark Laurie
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA.
| | - James D Brooks
- Department of Urology, Stanford School of Medicine, Stanford, CA, 94305, USA.
| | - Joseph C Liao
- Department of Urology, Stanford School of Medicine, Stanford, CA, 94305, USA.
| | - Daniel L Rubin
- Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, 94305, USA.
| |
Collapse
|
26
|
Käser S, Vazquez-Salazar LI, Meuwly M, Töpfer K. Neural network potentials for chemistry: concepts, applications and prospects. DIGITAL DISCOVERY 2023; 2:28-58. [PMID: 36798879 PMCID: PMC9923808 DOI: 10.1039/d2dd00102k] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 12/20/2022] [Indexed: 12/24/2022]
Abstract
Artificial Neural Networks (NN) are already heavily involved in methods and applications for frequent tasks in the field of computational chemistry such as representation of potential energy surfaces (PES) and spectroscopic predictions. This perspective provides an overview of the foundations of neural network-based full-dimensional potential energy surfaces, their architectures, underlying concepts, their representation and applications to chemical systems. Methods for data generation and training procedures for PES construction are discussed and means for error assessment and refinement through transfer learning are presented. A selection of recent results illustrates the latest improvements regarding accuracy of PES representations and system size limitations in dynamics simulations, but also NN application enabling direct prediction of physical results without dynamics simulations. The aim is to provide an overview for the current state-of-the-art NN approaches in computational chemistry and also to point out the current challenges in enhancing reliability and applicability of NN methods on a larger scale.
Collapse
Affiliation(s)
- Silvan Käser
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | | | - Markus Meuwly
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| | - Kai Töpfer
- Department of Chemistry, University of Basel Klingelbergstrasse 80 CH-4056 Basel Switzerland
| |
Collapse
|
27
|
Schweizer L, Seegerer P, Kim HY, Saitenmacher R, Muench A, Barnick L, Osterloh A, Dittmayer C, Jödicke R, Pehl D, Reinhardt A, Ruprecht K, Stenzel W, Wefers AK, Harter PN, Schüller U, Heppner FL, Alber M, Müller KR, Klauschen F. Analysing cerebrospinal fluid with explainable deep learning: From diagnostics to insights. Neuropathol Appl Neurobiol 2023; 49:e12866. [PMID: 36519297 DOI: 10.1111/nan.12866] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 11/14/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
AIM Analysis of cerebrospinal fluid (CSF) is essential for diagnostic workup of patients with neurological diseases and includes differential cell typing. The current gold standard is based on microscopic examination by specialised technicians and neuropathologists, which is time-consuming, labour-intensive and subjective. METHODS We, therefore, developed an image analysis approach based on expert annotations of 123,181 digitised CSF objects from 78 patients corresponding to 15 clinically relevant categories and trained a multiclass convolutional neural network (CNN). RESULTS The CNN classified the 15 categories with high accuracy (mean AUC 97.3%). By using explainable artificial intelligence (XAI), we demonstrate that the CNN identified meaningful cellular substructures in CSF cells recapitulating human pattern recognition. Based on the evaluation of 511 cells selected from 12 different CSF samples, we validated the CNN by comparing it with seven board-certified neuropathologists blinded for clinical information. Inter-rater agreement between the CNN and the ground truth was non-inferior (Krippendorff's alpha 0.79) compared with the agreement of seven human raters and the ground truth (mean Krippendorff's alpha 0.72, range 0.56-0.81). The CNN assigned the correct diagnostic label (inflammatory, haemorrhagic or neoplastic) in 10 out of 11 clinical samples, compared with 7-11 out of 11 by human raters. CONCLUSIONS Our approach provides the basis to overcome current limitations in automated cell classification for routine diagnostics and demonstrates how a visual explanation framework can connect machine decision-making with cell properties and thus provide a novel versatile and quantitative method for investigating CSF manifestations of various neurological diseases.
Collapse
Affiliation(s)
- Leonille Schweizer
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.,German Cancer Consortium (DKTK), Partner Site Berlin, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Philipp Seegerer
- Machine-Learning Group, Department of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Berlin, Germany.,Aignostics GmbH, Berlin, Germany
| | - Hee-Yeong Kim
- Systems Medicine of Infectious Disease, Robert Koch Institute, Berlin, Germany
| | - René Saitenmacher
- Machine-Learning Group, Department of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Berlin, Germany
| | - Amos Muench
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.,German Cancer Consortium (DKTK), Partner Site Berlin, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Liane Barnick
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Anja Osterloh
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Carsten Dittmayer
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Ruben Jödicke
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.,German Cancer Consortium (DKTK), Partner Site Berlin, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Debora Pehl
- Department of Pathology, Vivantes Hospitals Berlin, Berlin, Germany
| | | | - Klemens Ruprecht
- Department of Neurology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Werner Stenzel
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Annika K Wefers
- Institute of NeuropathologyUniversity Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Patrick N Harter
- Neurological Institute (Edinger Institute), Goethe University, Frankfurt am Main, Germany.,Frankfurt Cancer Institute, Goethe University, Frankfurt am Main, Germany.,German Cancer Consortium (DKTK), Partner Site Frankfurt/Mainz, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ulrich Schüller
- Institute of NeuropathologyUniversity Medical Center Hamburg-Eppendorf, Hamburg, Germany.,Department of Pediatric Hematology and Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.,Research Institute Children's Cancer Center Hamburg, Hamburg, Germany
| | - Frank L Heppner
- Department of Neuropathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.,German Cancer Consortium (DKTK), Partner Site Berlin, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Cluster of Excellence, NeuroCure, Berlin, Germany.,German Center for Neurodegenerative Diseases (DZNE) Berlin, Berlin, Germany
| | - Maximilian Alber
- Aignostics GmbH, Berlin, Germany.,Institute of Pathology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Klaus-Robert Müller
- Machine-Learning Group, Department of Software Engineering and Theoretical Computer Science, Technische Universität Berlin, Berlin, Germany.,Max Planck Institut für Informatik, Saarbrücken, Germany.,Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany.,Department of Artificial Intelligence, Korea University, Seoul, South Korea
| | - Frederick Klauschen
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Berlin, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Institute of Pathology, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
28
|
Yang L, Chen J. A comprehensive evaluation of microbial differential abundance analysis methods: current status and potential solutions. MICROBIOME 2022; 10:130. [PMID: 35986393 PMCID: PMC9392415 DOI: 10.1186/s40168-022-01320-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 07/04/2022] [Indexed: 06/12/2023]
Abstract
BACKGROUND Differential abundance analysis (DAA) is one central statistical task in microbiome data analysis. A robust and powerful DAA tool can help identify highly confident microbial candidates for further biological validation. Numerous DAA tools have been proposed in the past decade addressing the special characteristics of microbiome data such as zero inflation and compositional effects. Disturbingly, different DAA tools could sometimes produce quite discordant results, opening to the possibility of cherry-picking the tool in favor of one's own hypothesis. To recommend the best DAA tool or practice to the field, a comprehensive evaluation, which covers as many biologically relevant scenarios as possible, is critically needed. RESULTS We performed by far the most comprehensive evaluation of existing DAA tools using real data-based simulations. We found that DAA methods explicitly addressing compositional effects such as ANCOM-BC, Aldex2, metagenomeSeq (fitFeatureModel), and DACOMP did have improved performance in false-positive control. But they are still not optimal: type 1 error inflation or low statistical power has been observed in many settings. The recent LDM method generally had the best power, but its false-positive control in the presence of strong compositional effects was not satisfactory. Overall, none of the evaluated methods is simultaneously robust, powerful, and flexible, which makes the selection of the best DAA tool difficult. To meet the analysis needs, we designed an optimized procedure, ZicoSeq, drawing on the strength of the existing DAA methods. We show that ZicoSeq generally controlled for false positives across settings, and the power was among the highest. Application of DAA methods to a large collection of real datasets revealed a similar pattern observed in simulation studies. CONCLUSIONS Based on the benchmarking study, we conclude that none of the existing DAA methods evaluated can be applied blindly to any real microbiome dataset. The applicability of an existing DAA method depends on specific settings, which are usually unknown a priori. To circumvent the difficulty of selecting the best DAA tool in practice, we design ZicoSeq, which addresses the major challenges in DAA and remedies the drawbacks of existing DAA methods. ZicoSeq can be applied to microbiome datasets from diverse settings and is a useful DAA tool for robust microbiome biomarker discovery. Video Abstract.
Collapse
Affiliation(s)
- Lu Yang
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, 55905, USA
- Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Jun Chen
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, 55905, USA.
- Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA.
| |
Collapse
|
29
|
Thiele L, Cranmer M, Coulton W, Ho S, Spergel DN. Predicting the Thermal Sunyaev-Zel'dovich Field using Modular and Equivariant Set-Based Neural Networks. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac78c2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
Theoretical uncertainty limits our ability to extract cosmological information from baryonic fields such as the thermal Sunyaev-Zel’dovich (tSZ) effect. Being sourced by the electron pressure field, the tSZ effect depends on baryonic physics that is usually modeled by expensive hydrodynamic simulations. We train neural networks on the IllustrisTNG-300 cosmological simulation to predict the continuous electron pressure field in galaxy clusters from gravity-only simulations. Modeling clusters is challenging for neural networks as most of the gas pressure is concentrated in a handful of voxels and even the largest hydrodynamical simulations contain only a few hundred clusters that can be used for training. Instead of conventional convolutional neural net (CNN) architectures, we choose to employ a rotationally equivariant DeepSets architecture to operate directly on the set of dark matter particles. We argue that set-based architectures provide distinct advantages over CNNs. For example, we can enforce exact rotational and permutation equivariance, incorporate existing knowledge on the tSZ field, and work with sparse fields as are standard in cosmology. We compose our architecture with separate, physically meaningful modules, making it amenable to interpretation. For example, we can separately study the influence of local and cluster-scale environment, determine that cluster triaxiality has negligible impact, and train a module that corrects for mis-centering. Our model improves by 70% on analytic profiles fit to the same simulation data. We argue that the electron pressure field, viewed as a function of a gravity-only simulation, has inherent stochasticity, and model this property through a conditional-VAE extension to the network. This modification yields further improvement by 7%, it is limited by our small training set however. We envision that our method will prove useful in problems beyond the specific one considered here.
Collapse
|
30
|
A Survey on Sustainable Surrogate-Based Optimisation. SUSTAINABILITY 2022. [DOI: 10.3390/su14073867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Surrogate-based optimisation (SBO) algorithms are a powerful technique that combine machine learning and optimisation to solve expensive optimisation problems. This type of problem appears when dealing with computationally expensive simulators or algorithms. By approximating the expensive part of the optimisation problem with a surrogate, the number of expensive function evaluations can be reduced. This paper defines sustainable SBO, which consists of three aspects: applying SBO to a sustainable application, reducing the number of expensive function evaluations, and considering the computational effort of the machine learning and optimisation parts of SBO. The paper reviews sustainable applications that have successfully applied SBO over the past years, and analyses the used framework, type of surrogate used, sustainable SBO aspects, and open questions. This leads to recommendations for researchers working on sustainability-related applications who want to apply SBO, as well as recommendations for SBO researchers. It is argued that transparency of the computation resources used in the SBO framework, as well as developing SBO techniques that can deal with a large number of variables and objectives, can lead to more sustainable SBO.
Collapse
|
31
|
Abstract
Molecular evolutionary analyses require computationally intensive steps such as aligning multiple sequences, optimizing substitution models, inferring evolutionary trees, testing phylogenies by bootstrap analysis, and estimating divergence times. With the rise of large genomic data sets, phylogenomics is imposing a big carbon footprint on the environment with consequences for the planet's health. Electronic waste and energy usage are large environmental issues. Fortunately, innovative methods and heuristics are available to shrink the carbon footprint, presenting researchers with opportunities to lower the environmental costs and greener evolutionary computing. Green computing will also enable greater scientific rigor and encourage broader participation in big data analytics.
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
- Department of Biology, Temple University, Philadelphia, PA, USA
| |
Collapse
|
32
|
Grealey J, Lannelongue L, Saw WY, Marten J, Méric G, Ruiz-Carmona S, Inouye M. THE CARBON FOOTPRINT OF BIOINFORMATICS. Mol Biol Evol 2022; 39:6526403. [PMID: 35143670 PMCID: PMC8892942 DOI: 10.1093/molbev/msac034] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Bioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm’s greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.
Collapse
Affiliation(s)
- Jason Grealey
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Department of Mathematics and Statistics, La Trobe University, Melbourne, Australia
| | - Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Woei-Yuh Saw
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Jonathan Marten
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Guillaume Méric
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Australia
| | - Sergio Ruiz-Carmona
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.,Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.,British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK.,The Alan Turing Institute, London, UK
| |
Collapse
|
33
|
Qin Y, Havulinna AS, Liu Y, Jousilahti P, Ritchie SC, Tokolyi A, Sanders JG, Valsta L, Brożyńska M, Zhu Q, Tripathi A, Vázquez-Baeza Y, Loomba R, Cheng S, Jain M, Niiranen T, Lahti L, Knight R, Salomaa V, Inouye M, Méric G. Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort. Nat Genet 2022; 54:134-142. [PMID: 35115689 PMCID: PMC9883041 DOI: 10.1038/s41588-021-00991-z] [Citation(s) in RCA: 178] [Impact Index Per Article: 89.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 11/19/2021] [Indexed: 01/31/2023]
Abstract
Human genetic variation affects the gut microbiota through a complex combination of environmental and host factors. Here we characterize genetic variations associated with microbial abundances in a single large-scale population-based cohort of 5,959 genotyped individuals with matched gut microbial metagenomes, and dietary and health records (prevalent and follow-up). We identified 567 independent SNP-taxon associations. Variants at the LCT locus associated with Bifidobacterium and other taxa, but they differed according to dairy intake. Furthermore, levels of Faecalicatena lactaris associated with ABO, and suggested preferential utilization of secreted blood antigens as energy source in the gut. Enterococcus faecalis levels associated with variants in the MED13L locus, which has been linked to colorectal cancer. Mendelian randomization analysis indicated a potential causal effect of Morganella on major depressive disorder, consistent with observational incident disease analysis. Overall, we identify and characterize the intricate nature of host-microbiota interactions and their association with disease.
Collapse
Affiliation(s)
- Youwen Qin
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Aki S Havulinna
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- Institute for Molecular Medicine Finland, FIMM-HiLIFE, Helsinki, Finland
| | - Yang Liu
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Department of Clinical Pathology, Melbourne Medical School, The University of Melbourne, Melbourne, Victoria, Australia
| | - Pekka Jousilahti
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Scott C Ritchie
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Alex Tokolyi
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Jon G Sanders
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA
- Cornell Institute for Host-Microbe Interaction and Disease, Cornell University, Ithaca, NY, USA
| | - Liisa Valsta
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Marta Brożyńska
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Qiyun Zhu
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Anupriya Tripathi
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | - Yoshiki Vázquez-Baeza
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science & Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Rohit Loomba
- NAFLD Research Center, Department of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Susan Cheng
- Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Mohit Jain
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Teemu Niiranen
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
- Department of Medicine, Turku University Hospital and University of Turku, Turku, Finland
| | - Leo Lahti
- Department of Computing, University of Turku, Turku, Finland
| | - Rob Knight
- Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science & Engineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
| | - Veikko Salomaa
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki, Finland
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
- School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia.
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Health Data Research UK Cambridge, Wellcome Genome Campus & University of Cambridge, Cambridge, UK.
- The Alan Turing Institute, London, UK.
| | - Guillaume Méric
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria, Australia.
| |
Collapse
|
34
|
Samuel G, Lucassen A. The environmental sustainability of data-driven health research: A scoping review. Digit Health 2022; 8:20552076221111297. [PMID: 35847526 PMCID: PMC9277423 DOI: 10.1177/20552076221111297] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 06/14/2022] [Accepted: 06/15/2022] [Indexed: 11/15/2022] Open
Abstract
Data-Driven and Artificial Intelligence technologies are rapidly changing the way that health research is conducted, including offering new opportunities. This will inevitably have adverse environmental impacts. These include carbon dioxide emissions linked to the energy required to generate and process large amounts of data; the impact on the material environment (in the form of data centres); the unsustainable extraction of minerals for technological components; and e-waste (discarded electronic appliances) disposal. The growth of Data-Driven and Artificial Intelligence technologies means there is now a compelling need to consider these environmental impacts and develop means to mitigate them. Here, we offer a scoping review of how the environmental impacts of data storage and processing during Data-Driven and Artificial Intelligence health-related research are being discussed in the academic literature. Using the UK as a case study, we also offer a review of policies and initiatives that consider the environmental impacts of data storage and processing during Data-Driven and Artificial Intelligence health-related research in the UK. Our findings suggest little engagement with these issues to date. We discuss the implications of this and suggest ways that the Data-Driven and Artificial Intelligence health research sector needs to move to become more environmentally sustainable.
Collapse
Affiliation(s)
- Gabrielle Samuel
- Department of Global Health and Social Medicine, King's College London, London, UK
- Wellcome Centre for Human Genetics, Oxford University, Oxford, UK
| | - A.M. Lucassen
- Wellcome Centre for Human Genetics, Oxford University, Oxford, UK
- Clinical ethics, law and society (CELS) Faculty of Medicine, University of Southampton
| |
Collapse
|
35
|
Armstrong G, Cantrell K, Huang S, McDonald D, Haiminen N, Carrieri AP, Zhu Q, Gonzalez A, McGrath I, Beck KL, Hakim D, Havulinna AS, Méric G, Niiranen T, Lahti L, Salomaa V, Jain M, Inouye M, Swafford AD, Kim HC, Parida L, Vázquez-Baeza Y, Knight R. Efficient computation of Faith's phylogenetic diversity with applications in characterizing microbiomes. Genome Res 2021; 31:2131-2137. [PMID: 34479875 PMCID: PMC8559715 DOI: 10.1101/gr.275777.121] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 09/01/2021] [Indexed: 02/01/2023]
Abstract
The number of publicly available microbiome samples is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic diversity (Faith's PD) is a highly utilized phylogenetic alpha diversity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's phylogenetic diversity (SFPhD) enables calculation of this widely adopted diversity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting diversity differences between younger and older populations in the FINRISK study's metagenomic data.
Collapse
Affiliation(s)
- George Armstrong
- Department of Pediatrics, School of Medicine, University of California, San Diego, California 92093, USA
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California 92093, USA
- Bioinformatics and Systems Biology Program, University of California, San Diego, California 92093, USA
| | - Kalen Cantrell
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Shi Huang
- Department of Pediatrics, School of Medicine, University of California, San Diego, California 92093, USA
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Daniel McDonald
- Department of Pediatrics, School of Medicine, University of California, San Diego, California 92093, USA
| | - Niina Haiminen
- IBM T. J. Watson Research Center, Yorktown Heights, New York 10562, USA
| | | | - Qiyun Zhu
- School of Life Sciences, Arizona State University, Tempe, Arizona 85281, USA
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, Arizona 85281, USA
| | - Antonio Gonzalez
- Department of Pediatrics, School of Medicine, University of California, San Diego, California 92093, USA
| | - Imran McGrath
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California 92093, USA
- Division of Biological Sciences, University of California San Diego, La Jolla, California 92093, USA
| | - Kristen L Beck
- IBM Almaden Research Center, San Jose, California 95120, USA
| | - Daniel Hakim
- Department of Pediatrics, School of Medicine, University of California, San Diego, California 92093, USA
- Bioinformatics and Systems Biology Program, University of California, San Diego, California 92093, USA
| | - Aki S Havulinna
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki 00271, Finland
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki 00014, Finland
| | - Guillaume Méric
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria 3800, Australia
| | - Teemu Niiranen
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki 00271, Finland
- Department of Internal Medicine, University of Turku, Turku 20014, Finland
- Division of Medicine, Turku University Hospital, Turku 20014, Finland
| | - Leo Lahti
- Department of Computing, University of Turku, Turku 20014, Finland
| | - Veikko Salomaa
- Department of Public Health and Welfare, Finnish Institute for Health and Welfare, Helsinki 00271, Finland
| | - Mohit Jain
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California 92093, USA
- Department of Medicine, University of California, San Diego, California 92093, USA
- Department of Pharmacology, University of California, San Diego, California 92093, USA
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- Department of Public Health and Primary Care, Cambridge University, Cambridge CB2 1TN, United Kingdom
| | - Austin D Swafford
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Ho-Cheol Kim
- IBM Almaden Research Center, San Jose, California 95120, USA
| | - Laxmi Parida
- IBM T. J. Watson Research Center, Yorktown Heights, New York 10562, USA
| | - Yoshiki Vázquez-Baeza
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California 92093, USA
| | - Rob Knight
- Department of Pediatrics, School of Medicine, University of California, San Diego, California 92093, USA
- Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, California 92093, USA
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, California 92093, USA
| |
Collapse
|
36
|
Lee K, Kayaalp M, Henry S, Uzuner Ö. A Context-Enhanced De-identification System. ACM TRANSACTIONS ON COMPUTING FOR HEALTHCARE 2021; 3. [PMID: 34676376 DOI: 10.1145/3470980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Many modern entity recognition systems, including the current state-of-the-art de-identification systems, are based on bidirectional long short-term memory (biLSTM) units augmented by a conditional random field (CRF) sequence optimizer. These systems process the input sentence by sentence. This approach prevents the systems from capturing dependencies over sentence boundaries and makes accurate sentence boundary detection a prerequisite. Since sentence boundary detection can be problematic especially in clinical reports, where dependencies and co-references across sentence boundaries are abundant, these systems have clear limitations. In this study, we built a new system on the framework of one of the current state-of-the-art de-identification systems, NeuroNER, to overcome these limitations. This new system incorporates context embeddings through forward and backward n -grams without using sentence boundaries. Our context-enhanced de-identification (CEDI) system captures dependencies over sentence boundaries and bypasses the sentence boundary detection problem altogether. We enhanced this system with deep affix features and an attention mechanism to capture the pertinent parts of the input. The CEDI system outperforms NeuroNER on the 2006 i2b2 de-identification challenge dataset, the 2014 i2b2 shared task de-identification dataset, and the 2016 CEGS N-GRID de-identification dataset (p < 0.01). All datasets comprise narrative clinical reports in English but contain different note types varying from discharge summaries to psychiatric notes. Enhancing CEDI with deep affix features and the attention mechanism further increased performance.
Collapse
Affiliation(s)
- Kahyun Lee
- George Mason University, Fairfax, VA, USA
| | | | - Sam Henry
- George Mason University, Fairfax, VA, USA
| | | |
Collapse
|
37
|
Craig MJ, García-Melchor M. Applying Active Learning to the Screening of Molecular Oxygen Evolution Catalysts. Molecules 2021; 26:molecules26216362. [PMID: 34770771 PMCID: PMC8588390 DOI: 10.3390/molecules26216362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 10/01/2021] [Accepted: 10/19/2021] [Indexed: 11/16/2022] Open
Abstract
The oxygen evolution reaction (OER) can enable green hydrogen production; however, the state-of-the-art catalysts for this reaction are composed of prohibitively expensive materials. In addition, cheap catalysts have associated overpotentials that render the reaction inefficient. This impels the search to discover novel catalysts for this reaction computationally. In this communication, we present machine learning algorithms to enhance the hypothetical screening of molecular OER catalysts. By predicting calculated binding energies using Gaussian process regression (GPR) models and applying active learning schemes, we provide evidence that our algorithm can improve computational efficiency by guiding simulations towards candidates with promising OER descriptor values. Furthermore, we derive an acquisition function that, when maximized, can identify catalysts that can exhibit theoretical overpotentials that circumvent the constraints imposed by linear scaling relations by attempting to enforce a specific mechanism. Finally, we provide a brief perspective on the appropriate sets of molecules to consider when screening complexes that could be stable and active for this reaction.
Collapse
|
38
|
Lannelongue L, Grealey J, Bateman A, Inouye M. Ten simple rules to make your computing more environmentally sustainable. PLoS Comput Biol 2021; 17:e1009324. [PMID: 34543272 PMCID: PMC8452068 DOI: 10.1371/journal.pcbi.1009324] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Affiliation(s)
- Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
| | - Jason Grealey
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Department of Mathematics and Statistics, La Trobe University, Melbourne, Australia
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, United Kingdom
- The Alan Turing Institute, London, United Kingdom
| |
Collapse
|
39
|
Diwan GD, Carlos Gonzalez-Sanchez J, Apic G, Russell RB. Next generation protein structure predictions and genetic variant interpretation. J Mol Biol 2021; 433:167180. [PMID: 34358547 DOI: 10.1016/j.jmb.2021.167180] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 07/24/2021] [Accepted: 07/26/2021] [Indexed: 10/20/2022]
Abstract
The need to make sense of the thousands of genetic variants uncovered every day in terms of pathology or biological mechanism is acute. Many insights into how genetic changes impact protein function can be gleaned if three-dimensional structures of the associated proteins are available. The availability of a highly accurate method of predicting structures from amino acid sequences is thus potentially a great boost to those wanting to understand genetic changes. In this paper we discuss the current state of protein structures known for the human and other proteomes and how better structure predictions might impact on variant interpretation efforts. For the human proteome in particular, the state of the available structural data suggests that the impact on variant interpretation might be less than anticipated. We also discuss additional efforts in structure prediction that could further aid the understanding of genetic variants.
Collapse
Affiliation(s)
- Gaurav D Diwan
- BioQuant, Heidelberg University, Im Neuenheimer Feld 267, Heidelberg, Germany; Heidelberg University Biochemistry Center (BZH), Im Neuenheimer Feld
| | - Juan Carlos Gonzalez-Sanchez
- BioQuant, Heidelberg University, Im Neuenheimer Feld 267, Heidelberg, Germany; Heidelberg University Biochemistry Center (BZH), Im Neuenheimer Feld
| | - Gordana Apic
- BioQuant, Heidelberg University, Im Neuenheimer Feld 267, Heidelberg, Germany; Heidelberg University Biochemistry Center (BZH), Im Neuenheimer Feld
| | - Robert B Russell
- BioQuant, Heidelberg University, Im Neuenheimer Feld 267, Heidelberg, Germany; Heidelberg University Biochemistry Center (BZH), Im Neuenheimer Feld.
| |
Collapse
|