1
|
Lancaster AK, Single RM, Mack SJ, Sochat V, Mariani MP, Webster GD. PyPop: a mature open-source software pipeline for population genomics. Front Immunol 2024; 15:1378512. [PMID: 38629078 PMCID: PMC11019567 DOI: 10.3389/fimmu.2024.1378512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/08/2024] [Indexed: 04/19/2024] Open
Abstract
Python for Population Genomics (PyPop) is a software package that processes genotype and allele data and performs large-scale population genetic analyses on highly polymorphic multi-locus genotype data. In particular, PyPop tests data conformity to Hardy-Weinberg equilibrium expectations, performs Ewens-Watterson tests for selection, estimates haplotype frequencies, measures linkage disequilibrium, and tests significance. Standardized means of performing these tests is key for contemporary studies of evolutionary biology and population genetics, and these tests are central to genetic studies of disease association as well. Here, we present PyPop 1.0.0, a new major release of the package, which implements new features using the more robust infrastructure of GitHub, and is distributed via the industry-standard Python Package Index. New features include implementation of the asymmetric linkage disequilibrium measures and, of particular interest to the immunogenetics research communities, support for modern nomenclature, including colon-delimited allele names, and improvements to meta-analysis features for aggregating outputs for multiple populations. Code available at: https://zenodo.org/records/10080668 and https://github.com/alexlancaster/pypop.
Collapse
Affiliation(s)
- Alexander K. Lancaster
- Amber Biology LLC, Cambridge, MA, United States
- Ronin Institute, Montclair, NJ, United States
- Institute for Globally Distributed Open Research and Education (IGDORE), Cambridge, MA, United States
| | - Richard M. Single
- Department of Mathematics and Statistics, University of Vermont, Burlington, VT, United States
| | - Steven J. Mack
- Department of Pediatrics, University of California, San Francisco, Oakland, CA, United States
| | - Vanessa Sochat
- Livermore Computing, Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Michael P. Mariani
- Department of Mathematics and Statistics, University of Vermont, Burlington, VT, United States
- Mariani Systems LLC, Hanover, NH, United States
| | - Gordon D. Webster
- Amber Biology LLC, Cambridge, MA, United States
- Ronin Institute, Montclair, NJ, United States
| |
Collapse
|
2
|
Torabian S, Vélez N, Sochat V, Halchenko YO, Grossman ED. The PyMVPA BIDS-App: a robust multivariate pattern analysis pipeline for fMRI data. Front Neurosci 2023; 17:1233416. [PMID: 37694123 PMCID: PMC10483824 DOI: 10.3389/fnins.2023.1233416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 08/04/2023] [Indexed: 09/12/2023] Open
Abstract
With the advent of multivariate pattern analysis (MVPA) as an important analytic approach to fMRI, new insights into the functional organization of the brain have emerged. Several software packages have been developed to perform MVPA analysis, but deploying them comes with the cost of adjusting data to individual idiosyncrasies associated with each package. Here we describe PyMVPA BIDS-App, a fast and robust pipeline based on the data organization of the BIDS standard that performs multivariate analyses using powerful functionality of PyMVPA. The app runs flexibly with blocked and event-related fMRI experimental designs, is capable of performing classification as well as representational similarity analysis, and works both within regions of interest or on the whole brain through searchlights. In addition, the app accepts as input both volumetric and surface-based data. Inspections into the intermediate stages of the analyses are available and the readability of final results are facilitated through visualizations. The PyMVPA BIDS-App is designed to be accessible to novice users, while also offering more control to experts through command-line arguments in a highly reproducible environment.
Collapse
Affiliation(s)
- Sajjad Torabian
- Visual Perception and Neuroimaging Lab, Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, United States
| | - Natalia Vélez
- Computational Cognitive Neuroscience Lab, Department of Psychology, Harvard University, Cambridge, MA, United States
| | - Vanessa Sochat
- Lawrence Livermore National Laboratory, Livermore, CA, United States
| | - Yaroslav O. Halchenko
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, United States
| | - Emily D. Grossman
- Visual Perception and Neuroimaging Lab, Department of Cognitive Sciences, University of California, Irvine, Irvine, CA, United States
| |
Collapse
|
3
|
Turco G, Chang C, Wang RY, Kim G, Stoops EH, Richardson B, Sochat V, Rust J, Oughtred R, Thayer N, Kang F, Livstone MS, Heinicke S, Schroeder M, Dolinski KJ, Botstein D, Baryshnikova A. Global analysis of the yeast knockout phenome. Sci Adv 2023; 9:eadg5702. [PMID: 37235661 DOI: 10.1126/sciadv.adg5702] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 04/20/2023] [Indexed: 05/28/2023]
Abstract
Genome-wide phenotypic screens in the budding yeast Saccharomyces cerevisiae, enabled by its knockout collection, have produced the largest, richest, and most systematic phenotypic description of any organism. However, integrative analyses of this rich data source have been virtually impossible because of the lack of a central data repository and consistent metadata annotations. Here, we describe the aggregation, harmonization, and analysis of ~14,500 yeast knockout screens, which we call Yeast Phenome. Using this unique dataset, we characterized two unknown genes (YHR045W and YGL117W) and showed that tryptophan starvation is a by-product of many chemical treatments. Furthermore, we uncovered an exponential relationship between phenotypic similarity and intergenic distance, which suggests that gene positions in both yeast and human genomes are optimized for function.
Collapse
Affiliation(s)
- Gina Turco
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Christie Chang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | | | - Griffin Kim
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | - Brianna Richardson
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Vanessa Sochat
- Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Jennifer Rust
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Rose Oughtred
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | | | - Fan Kang
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Michael S Livstone
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Sven Heinicke
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Mark Schroeder
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Kara J Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | | | | |
Collapse
|
4
|
Mundt MR, Beattie K, Bisila J, Ferenbaugh CR, Godoy WF, Gupta R, Guyer JE, Kiran M, Malviya-Thakur A, Milewicz R, Sims BH, Sochat V, Teves JB. For the Public Good: Connecting, Retaining, and Recognizing Current and Future RSEs at National Organizations. Comput Sci Eng 2023. [DOI: 10.1109/mcse.2023.3256759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Affiliation(s)
- M. R. Mundt
- Sandia National Laboratories (SNL), Albuquerque, NM
| | - K. Beattie
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA
| | - J. Bisila
- Sandia National Laboratories (SNL), Albuquerque, NM
| | | | - W. F. Godoy
- Oak Ridge National Laboratory (ORNL), Oak Ridge, TN
| | - R. Gupta
- Argonne National Laboratory (ANL), Lemont, IL
| | - J. E. Guyer
- National Institute of Standards and Technology (NIST), Gaithersburg, MD
| | - M. Kiran
- Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA
| | | | - R. Milewicz
- Sandia National Laboratories (SNL), Albuquerque, NM
| | - B. H. Sims
- Los Alamos National Laboratory (LANL), Los Alamos, NM
| | - V. Sochat
- Lawrence Livermore National Laboratory (LLNL), Livermore, CA
| | - J. B. Teves
- National Institutes of Health (NIH), Bethesda, MD
| |
Collapse
|
5
|
Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, Forster J, Lee S, Twardziok SO, Kanitz A, Wilm A, Holtgrewe M, Rahmann S, Nahnsen S, Köster J. Sustainable data analysis with Snakemake. F1000Res 2021; 10:33. [PMID: 34035898 PMCID: PMC8114187 DOI: 10.12688/f1000research.29032.2] [Citation(s) in RCA: 387] [Impact Index Per Article: 129.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/08/2021] [Indexed: 01/22/2023] Open
Abstract
Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Collapse
Affiliation(s)
- Felix Mölder
- Algorithms for reproducible bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.,Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Kim Philipp Jablonski
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | | | | | - Christopher H Tomkins-Tinch
- Broad Institute of MIT and Harvard, Cambridge, USA.,Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, USA
| | - Vanessa Sochat
- Stanford University Research Computing Center, Stanford University, Stanford, USA
| | - Jan Forster
- Algorithms for reproducible bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.,German Cancer Consortium (DKTK, partner site Essen) and German Cancer Research Center, DKFZ, Heidelberg, Germany
| | - Soohyun Lee
- Biomedical Informatics, Harvard Medical School, Harvard University, Boston, USA
| | - Sven O Twardziok
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Alexander Kanitz
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics / ELIXIR Switzerland, Lausanne, Switzerland
| | | | - Manuel Holtgrewe
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany.,CUBI - Core Unit Bioinformatics, Berlin Institute of Health, Berlin, Germany
| | - Sven Rahmann
- Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
| | - Johannes Köster
- Algorithms for reproducible bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.,Medical Oncology, Harvard Medical School, Harvard University, Boston, USA
| |
Collapse
|
6
|
Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, Forster J, Lee S, Twardziok SO, Kanitz A, Wilm A, Holtgrewe M, Rahmann S, Nahnsen S, Köster J. Sustainable data analysis with Snakemake. F1000Res 2021; 10:33. [PMID: 34035898 DOI: 10.12688/f1000research.29032.1] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/07/2021] [Indexed: 01/22/2023] Open
Abstract
Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Collapse
Affiliation(s)
- Felix Mölder
- Algorithms for reproducible bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.,Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Kim Philipp Jablonski
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Basel, Switzerland
| | | | | | - Christopher H Tomkins-Tinch
- Broad Institute of MIT and Harvard, Cambridge, USA.,Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, USA
| | - Vanessa Sochat
- Stanford University Research Computing Center, Stanford University, Stanford, USA
| | - Jan Forster
- Algorithms for reproducible bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.,German Cancer Consortium (DKTK, partner site Essen) and German Cancer Research Center, DKFZ, Heidelberg, Germany
| | - Soohyun Lee
- Biomedical Informatics, Harvard Medical School, Harvard University, Boston, USA
| | - Sven O Twardziok
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Alexander Kanitz
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics / ELIXIR Switzerland, Lausanne, Switzerland
| | | | - Manuel Holtgrewe
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany.,CUBI - Core Unit Bioinformatics, Berlin Institute of Health, Berlin, Germany
| | - Sven Rahmann
- Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
| | - Johannes Köster
- Algorithms for reproducible bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany.,Medical Oncology, Harvard Medical School, Harvard University, Boston, USA
| |
Collapse
|
7
|
Abstract
Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow's reproducibility can be greatly affected by the choices that are made with respect to building containers. In many cases, the build process for the container's image is created from instructions provided in a Dockerfile format. In support of this approach, we present a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following the rules in this article, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows.
Collapse
Affiliation(s)
- Daniel Nüst
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | - Vanessa Sochat
- Stanford Research Computing Center, Stanford University, Stanford, California, United States of America
| | - Ben Marwick
- Department of Anthropology, University of Washington, Seattle, Washington, United States of America
| | - Stephen J. Eglen
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Cambridgeshire, Great Britain
| | - Tim Head
- Wild Tree Tech, Zurich, Switzerland
| | - Tony Hirst
- Department of Computing and Communications, The Open University, Great Britain
| | - Benjamin D. Evans
- School of Psychological Science, University of Bristol, Bristol, Great Britain
| |
Collapse
|
8
|
Abstract
Background Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific applications. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. SCIF makes it easy to expose metadata, multiple environments, installation steps, files, and entry points to render scientific applications consistent, modular, and discoverable. A SCIF can be installed on a traditional host or in a container technology such as Docker or Singularity. We start by reviewing the background and rationale for the SCIF, followed by an overview of the specification and the different levels of internal modules (“apps”) that the organizational format affords. Finally, we demonstrate that SCIF is useful by implementing and discussing several use cases that improve user interaction and understanding of scientific applications. SCIF is released along with a client and integration in the Singularity 2.4 software to quickly install and interact with SCIF. When used inside of a reproducible container, a SCIF is a recipe for reproducibility and introspection of the functions and users that it serves. Results We use SCIF to evaluate container software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of applications, we developed tools along with an open source, version-controlled, tested, and programmatically accessible web infrastructure. SCIF and associated resources are available at https://sci-f.github.io. The ease of using SCIF, especially in the context of containers, offers promise for scientists’ work to be self-documenting and programatically parseable for maximum reproducibility. SCIF opens up an abstraction from underlying programming languages and packaging logic to work with scientific applications, opening up new opportunities for scientific software development.
Collapse
Affiliation(s)
- Vanessa Sochat
- Stanford Research Computing Center.,Stanford University School of Medicine, Stanford, CA 94025
| |
Collapse
|
9
|
Abstract
Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science.
Collapse
Affiliation(s)
- Gregory M. Kurtzer
- High Performance Computing Services, Lawrence Berkeley National Lab, Berkeley, CA, United States of America
| | - Vanessa Sochat
- Stanford Research Computing Center and School of Medicine, Stanford University, Stanford, CA, United States of America
- * E-mail:
| | - Michael W. Bauer
- High Performance Computing Services, Lawrence Berkeley National Lab, Berkeley, CA, United States of America
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, United States of America
- Experimental Systems, GSI Helmholtzzentrum für Schwerionenforschung, Darmstadt, Germany
| |
Collapse
|
10
|
Jeffers A, Sochat V, Kattan MW, Yu C, Melcon E, Yamoah K, Rebbeck TR, Whittemore AS. Predicting Prostate Cancer Recurrence After Radical Prostatectomy. Prostate 2017; 77:291-298. [PMID: 27775165 PMCID: PMC5877452 DOI: 10.1002/pros.23268] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 10/05/2016] [Indexed: 11/06/2022]
Abstract
BACKGROUND Prostate cancer prognosis is variable, and management decisions involve balancing patients' risks of recurrence and recurrence-free death. Moreover, the roles of body mass index (BMI) and race in risk of recurrence are controversial [1,2]. To address these issues, we developed and cross-validated RAPS (Risks After Prostate Surgery), a personal prediction model for biochemical recurrence (BCR) within 10 years of radical prostatectomy (RP) that includes BMI and race as possible predictors, and recurrence-free death as a competing risk. METHODS RAPS uses a patient's risk factors at surgery to assign him a recurrence probability based on statistical learning methods applied to a cohort of 1,276 patients undergoing RP at the University of Pennsylvania. We compared the performance of RAPS to that of an existing model with respect to calibration (by comparing observed and predicted outcomes), and discrimination (using the area under the receiver operating characteristic curve (AUC)). RESULTS RAPS' cross-validated BCR predictions provided better calibration than those of an existing model that underestimated patients' risks. Discrimination was similar for the two models, with BCR AUCs of 0.793, 95% confidence interval (0.766-0.820) for RAPS, and 0.780 (0.745-0.815) for the existing model. RAPS' most important BCR predictors were tumor grade, preoperative prostate-specific antigen (PSA) level and BMI; race was less important [3]. RAPS' predictions can be obtained online at https://predict.shinyapps.io/raps. CONCLUSION RAPS' cross-validated BCR predictions were better calibrated than those of an existing model, and BMI information contributed substantially to these predictions. RAPS predictions for recurrence-free death were limited by lack of co-morbidity data; however the model provides a simple framework for extension to include such data. Its use and extension should facilitate decision strategies for post-RP prostate cancer management. Prostate 77:291-298, 2017. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
| | - Vanessa Sochat
- Department of Biomedical Data Sciences, Stanford University School of Medicine, Stanford, California
| | - Michael W Kattan
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Changhong Yu
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
| | - Erin Melcon
- Department of Health Research and Policy, Stanford University School of Medicine, Stanford, California
| | - Kosj Yamoah
- Department of Urology, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Timothy R Rebbeck
- Department of Epidemiology, Harvard University School of Public Health, Boston, Massachusetts
| | - Alice S Whittemore
- Department of Health Research and Policy, Stanford University School of Medicine, Stanford, California
| |
Collapse
|
11
|
Maumet C, Auer T, Bowring A, Chen G, Das S, Flandin G, Ghosh S, Glatard T, Gorgolewski KJ, Helmer KG, Jenkinson M, Keator DB, Nichols BN, Poline JB, Reynolds R, Sochat V, Turner J, Nichols TE. Sharing brain mapping statistical results with the neuroimaging data model. Sci Data 2016; 3:160102. [PMID: 27922621 PMCID: PMC5139675 DOI: 10.1038/sdata.2016.102] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 09/21/2016] [Indexed: 11/16/2022] Open
Abstract
Only a tiny fraction of the data and metadata produced by an fMRI study is finally conveyed to the community. This lack of transparency not only hinders the reproducibility of neuroimaging results but also impairs future meta-analyses. In this work we introduce NIDM-Results, a format specification providing a machine-readable description of neuroimaging statistical results along with key image data summarising the experiment. NIDM-Results provides a unified representation of mass univariate analyses including a level of detail consistent with available best practices. This standardized representation allows authors to relay methods and results in a platform-independent regularized format that is not tied to a particular neuroimaging software package. Tools are available to export NIDM-Result graphs and associated files from the widely used SPM and FSL software packages, and the NeuroVault repository can import NIDM-Results archives. The specification is publically available at: http://nidm.nidash.org/specs/nidm-results.html.
Collapse
Affiliation(s)
| | - Tibor Auer
- MRC Cognition and Brain Sciences Unit, Cambridge CB2 7EF, UK
| | | | - Gang Chen
- Scientific and Statistical Computing Core, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Samir Das
- McGill Centre for Integrative Neuroscience, Ludmer Centre, Montreal Neurological Institute, Montreal, Quebec, Canada H3A 2B4
| | - Guillaume Flandin
- Wellcome Trust Centre for Neuroimaging, UCL Institute of Neurology, London WC1N 3BG, UK
| | - Satrajit Ghosh
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Tristan Glatard
- McGill Centre for Integrative Neuroscience, Ludmer Centre, Montreal Neurological Institute, Montreal, Quebec, Canada H3A 2B4
- Université de Lyon, CREATIS; CNRS UMR5220; Inserm U1044; INSA-Lyon; Université Claude Bernard Lyon 1, Villeurbanne 69100, France
| | | | - Karl G. Helmer
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital; Department of Radiology, Boston, Massachusetts 02129, USA
| | | | - David B. Keator
- Department of Psychiatry and Human Behavior, Department of Computer Science, Department of Neurology, University of California, Irvine, California 92697, USA
| | - B. Nolan Nichols
- Center for Health Sciences, SRI International, Menlo Park, California 94025, USA
| | - Jean-Baptiste Poline
- Helen Wills Neuroscience Institute, H. Wheeler Jr. Brain Imaging Center, University of California, Berkeley, California 94720, USA
| | - Richard Reynolds
- Scientific and Statistical Computing Core, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Vanessa Sochat
- Department of Psychology, Stanford University, Stanford, California 94305, USA
| | - Jessica Turner
- Psychology and Neuroscience, Georgia State University, Atlanta, Georgia 30302, USA
| | - Thomas E. Nichols
- WMG, University of Warwick, Coventry CV4 7AL, UK
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
12
|
Craddock RC, Bellec P, Margules DS, Nichols BN, Pfannmöller JP, Badhwar A, Kennedy D, Poline JB, Toro R, Cipollini B, Rokem A, Clark D, Gorgolewski KJ, Craddock RC, Craddock RC, Clark DJ, Das S, Madjar C, Sengupta A, Mohades Z, Dery S, Deng W, Earl E, Demeter DV, Mills K, Mihai G, Ruzic L, Ketz N, Reineberg A, Reddan MC, Goddings AL, Gonzalez-Castillo J, Gorgolewski KJ, Froehlich C, Dekel G, Margulies DS, Craddock RC, Fulcher BD, Glatard T, Das S, Adalat R, Beck N, Bernard R, Khalili-Mahani N, Rioux P, Rousseau MÉ, Evans AC, Halchenko YO, Castello MVDO, Hernández-Pérez R, Morales EA, Cuaya LV, Ito KL, Liew SL, Johnson HJ, Kan E, Anglin J, Borich M, Jahanshad N, Thompson P, Liew SL, Margulies DS, Falkiewicz M, Huntenburg JM, O’Connor D, Clark DJ, Milham MP, Craddock RC, Pereira RF, Heinsfeld AS, Franco AR, Buchweitz A, Meneguzzi F, Pfannmöller JP, Mesquita R, Herrera LCT, Dentico D, Sochat V, Nichols BN, Heinsfeld AS, Franco AR, Buchweitz A, Meneguzzi F, Villalon-Reina JE, Garyfallidis E. 2015 Brainhack Proceedings. Gigascience 2016. [PMCID: PMC5103253 DOI: 10.1186/s13742-016-0147-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
I1 Introduction to the 2015 Brainhack Proceedings R. Cameron Craddock, Pierre Bellec, Daniel S. Margules, B. Nolan Nichols, Jörg P. Pfannmöller A1 Distributed collaboration: the case for the enhancement of Brainspell’s interface AmanPreet Badhwar, David Kennedy, Jean-Baptiste Poline, Roberto Toro A2 Advancing open science through NiData Ben Cipollini, Ariel Rokem A3 Integrating the Brain Imaging Data Structure (BIDS) standard into C-PAC Daniel Clark, Krzysztof J. Gorgolewski, R. Cameron Craddock A4 Optimized implementations of voxel-wise degree centrality and local functional connectivity density mapping in AFNI R. Cameron Craddock, Daniel J. Clark A5 LORIS: DICOM anonymizer Samir Das, Cécile Madjar, Ayan Sengupta, Zia Mohades A6 Automatic extraction of academic collaborations in neuroimaging Sebastien Dery A7 NiftyView: a zero-footprint web application for viewing DICOM and NIfTI files Weiran Deng A8 Human Connectome Project Minimal Preprocessing Pipelines to Nipype Eric Earl, Damion V. Demeter, Kate Mills, Glad Mihai, Luka Ruzic, Nick Ketz, Andrew Reineberg, Marianne C. Reddan, Anne-Lise Goddings, Javier Gonzalez-Castillo, Krzysztof J. Gorgolewski A9 Generating music with resting-state fMRI data Caroline Froehlich, Gil Dekel, Daniel S. Margulies, R. Cameron Craddock A10 Highly comparable time-series analysis in Nitime Ben D. Fulcher A11 Nipype interfaces in CBRAIN Tristan Glatard, Samir Das, Reza Adalat, Natacha Beck, Rémi Bernard, Najmeh Khalili-Mahani, Pierre Rioux, Marc-Étienne Rousseau, Alan C. Evans A12 DueCredit: automated collection of citations for software, methods, and data Yaroslav O. Halchenko, Matteo Visconti di Oleggio Castello A13 Open source low-cost device to register dog’s heart rate and tail movement Raúl Hernández-Pérez, Edgar A. Morales, Laura V. Cuaya A14 Calculating the Laterality Index Using FSL for Stroke Neuroimaging Data Kaori L. Ito, Sook-Lei Liew A15 Wrapping FreeSurfer 6 for use in high-performance computing environments Hans J. Johnson A16 Facilitating big data meta-analyses for clinical neuroimaging through ENIGMA wrapper scripts Erik Kan, Julia Anglin, Michael Borich, Neda Jahanshad, Paul Thompson, Sook-Lei Liew A17 A cortical surface-based geodesic distance package for Python Daniel S Margulies, Marcel Falkiewicz, Julia M Huntenburg A18 Sharing data in the cloud David O’Connor, Daniel J. Clark, Michael P. Milham, R. Cameron Craddock A19 Detecting task-based fMRI compliance using plan abandonment techniques Ramon Fraga Pereira, Anibal Sólon Heinsfeld, Alexandre Rosa Franco, Augusto Buchweitz, Felipe Meneguzzi A20 Self-organization and brain function Jörg P. Pfannmöller, Rickson Mesquita, Luis C.T. Herrera, Daniela Dentico A21 The Neuroimaging Data Model (NIDM) API Vanessa Sochat, B Nolan Nichols A22 NeuroView: a customizable browser-base utility Anibal Sólon Heinsfeld, Alexandre Rosa Franco, Augusto Buchweitz, Felipe Meneguzzi A23 DIPY: Brain tissue classification Julio E. Villalon-Reina, Eleftherios Garyfallidis
Collapse
|
13
|
Affiliation(s)
- Vanessa Sochat
- Program in Biomedical Informatics, Stanford University, Stanford, California, USA
| | - B. Nolan Nichols
- SRI International, Menlo Park, CA, USA
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| |
Collapse
|
14
|
Sochat V, David M, Wall DP. Translational Meta-analytical Methods to Localize the Regulatory Patterns of Neurological Disorders in the Human Brain. AMIA Annu Symp Proc 2015; 2015:2073-2082. [PMID: 26958307 PMCID: PMC4765688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The task of mapping neurological disorders in the human brain must be informed by multiple measurements of an individual's phenotype - neuroimaging, genomics, and behavior. We developed a novel meta-analytical approach to integrate disparate resources and generated transcriptional maps of neurological disorders in the human brain yielding a purely computational procedure to pinpoint the brain location of transcribed genes likely to be involved in either onset or maintenance of the neurological condition.
Collapse
Affiliation(s)
- Vanessa Sochat
- Stanford Graduate Fellow, Graduate Program in Biomedical Informatics
| | - Maude David
- Department of Pediatrics, Systems Medicine Division Stanford University School of Medicine Stanford, CA 94305
| | - Dennis P Wall
- Stanford Graduate Fellow, Graduate Program in Biomedical Informatics
| |
Collapse
|
15
|
Sochat V, Supekar K, Bustillo J, Calhoun V, Turner JA, Rubin DL. A robust classifier to distinguish noise from fMRI independent components. PLoS One 2014; 9:e95493. [PMID: 24748378 PMCID: PMC3991682 DOI: 10.1371/journal.pone.0095493] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 03/27/2014] [Indexed: 12/14/2022] Open
Abstract
Analyzing Functional Magnetic Resonance Imaging (fMRI) of resting brains to determine the spatial location and activity of intrinsic brain networks--a novel and burgeoning research field--is limited by the lack of ground truth and the tendency of analyses to overfit the data. Independent Component Analysis (ICA) is commonly used to separate the data into signal and Gaussian noise components, and then map these components on to spatial networks. Identifying noise from this data, however, is a tedious process that has proven hard to automate, particularly when data from different institutions, subjects, and scanners is used. Here we present an automated method to delineate noisy independent components in ICA using a data-driven infrastructure that queries a database of 246 spatial and temporal features to discover a computational signature of different types of noise. We evaluated the performance of our method to detect noisy components from healthy control fMRI (sensitivity = 0.91, specificity = 0.82, cross validation accuracy (CVA) = 0.87, area under the curve (AUC) = 0.93), and demonstrate its generalizability by showing equivalent performance on (1) an age- and scanner-matched cohort of schizophrenia patients from the same institution (sensitivity = 0.89, specificity = 0.83, CVA = 0.86), (2) an age-matched cohort on an equivalent scanner from a different institution (sensitivity = 0.88, specificity = 0.88, CVA = 0.88), and (3) an age-matched cohort on a different scanner from a different institution (sensitivity = 0.72, specificity = 0.92, CVA = 0.79). We additionally compare our approach with a recently published method. Our results suggest that our method is robust to noise variations due to population as well as scanner differences, thereby making it well suited to the goal of automatically distinguishing noise from functional networks to enable investigation of human brain function.
Collapse
Affiliation(s)
- Vanessa Sochat
- Stanford Graduate Fellow, Graduate Program in Biomedical Informatics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Kaustubh Supekar
- Department of Psychiatry & Behavioral Sciences, Stanford University School of Medicine, Stanford, California, United States of America
| | - Juan Bustillo
- The Mind Research Network, Albuquerque, New Mexico, United States of America
- Department of Psychiatry, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Vince Calhoun
- The Mind Research Network, Albuquerque, New Mexico, United States of America
| | - Jessica A. Turner
- The Mind Research Network, Albuquerque, New Mexico, United States of America
- Georgia State University, Department of Psychology and Neuroscience Institute, Atlanta, Georgia, United States of America
| | - Daniel L. Rubin
- Department of Radiology, Stanford University School of Medicine, Stanford, California, United States of America
| |
Collapse
|