1
|
Clements HD, Flynn AR, Nicholls BT, Grosheva D, Lefave SJ, Merriman MT, Hyster TK, Sigman MS. Using Data Science for Mechanistic Insights and Selectivity Predictions in a Non-Natural Biocatalytic Reaction. J Am Chem Soc 2023; 145:17656-17664. [PMID: 37530568 PMCID: PMC10602048 DOI: 10.1021/jacs.3c03639] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023]
Abstract
The study of non-natural biocatalytic transformations relies heavily on empirical methods, such as directed evolution, for identifying improved variants. Although exceptionally effective, this approach provides limited insight into the molecular mechanisms behind the transformations and necessitates multiple protein engineering campaigns for new reactants. To address this limitation, we disclose a strategy to explore the biocatalytic reaction space and garner insight into the molecular mechanisms driving enzymatic transformations. Specifically, we explored the selectivity of an "ene"-reductase, GluER-T36A, to create a data-driven toolset that explores reaction space and rationalizes the observed and predicted selectivities of substrate/mutant combinations. The resultant statistical models related structural features of the enzyme and substrate to selectivity and were used to effectively predict selectivity in reactions with out-of-sample substrates and mutants. Our approach provided a deeper understanding of enantioinduction by GluER-T36A and holds the potential to enhance the virtual screening of enzyme mutants.
Collapse
Affiliation(s)
- Hanna D Clements
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Autumn R Flynn
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Bryce T Nicholls
- Department of Chemistry and Chemical Biology, Cornell University, 122 Baker Laboratory, Ithaca, New York 14853, United States
| | - Daria Grosheva
- Department of Chemistry and Chemical Biology, Cornell University, 122 Baker Laboratory, Ithaca, New York 14853, United States
| | - Sarah J Lefave
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Morgan T Merriman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| | - Todd K Hyster
- Department of Chemistry and Chemical Biology, Cornell University, 122 Baker Laboratory, Ithaca, New York 14853, United States
| | - Matthew S Sigman
- Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112, United States
| |
Collapse
|
2
|
Banerjee S, Bishop TRP. dsSurvival 2.0: privacy enhancing survival curves for survival models in the federated DataSHIELD analysis system. BMC Res Notes 2023; 16:98. [PMID: 37280717 PMCID: PMC10243006 DOI: 10.1186/s13104-023-06372-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 05/25/2023] [Indexed: 06/08/2023] Open
Abstract
OBJECTIVE Survival models are used extensively in biomedical sciences, where they allow the investigation of the effect of exposures on health outcomes. It is desirable to use diverse data sets in survival analyses, because this offers increased statistical power and generalisability of results. However, there are often challenges with bringing data together in one location or following an analysis plan and sharing results. DataSHIELD is an analysis platform that helps users to overcome these ethical, governance and process difficulties. It allows users to analyse data remotely, using functions that are built to restrict access to the detailed data items (federated analysis). Previous works have provided survival modelling functionality in DataSHIELD (dsSurvival package), but there is a requirement to provide functions that offer privacy enhancing survival curves that retain useful information. RESULTS We introduce an enhanced version of the dsSurvival package which offers privacy enhancing survival curves for DataSHIELD. Different methods for enhancing privacy were evaluated for their effectiveness in enhancing privacy while maintaining utility. We demonstrated how our selected method could enhance privacy in different scenarios using real survival data. The details of how DataSHIELD can be used to generate survival curves can be found in the associated tutorial.
Collapse
Affiliation(s)
- Soumya Banerjee
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK.
| | - Tom R P Bishop
- Medical Research Council Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge, UK
| |
Collapse
|
3
|
Vega-Villalobos A, Almanza-Ortega NN, Torres-Poveda K, Pérez-Ortega J, Barahona I. Correlation between mobility in mass transport and mortality due to COVID-19: A comparison of Mexico City, New York, and Madrid from a data science perspective. PLoS One 2022; 17:e0264713. [PMID: 35298483 PMCID: PMC8929656 DOI: 10.1371/journal.pone.0264713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Accepted: 02/16/2022] [Indexed: 11/21/2022] Open
Abstract
In most big cities, public transports are enclosed and crowded spaces. Therefore, they are considered as one of the most important triggers of COVID-19 spread. Most of the existing research related to the mobility of people and COVID-19 spread is focused on investigating highly frequented paths by analyzing data collected from mobile devices, which mainly refer to geo-positioning records. In contrast, this paper tackles the problem by studying mass mobility. The relations between daily mobility on public transport (subway or metro) in three big cities and mortality due to COVID-19 are investigated. Data collected for these purposes come from official sources, such as the web pages of the cities’ local governments. To provide a systematic framework, we applied the IBM Foundational Methodology for Data Science to the epidemiological domain of this paper. Our analysis consists of moving averages with a moving window equal to seven days so as to avoid bias due to weekly tendencies. Among the main findings of this work are: a) New York City and Madrid show similar distribution on studied variables, which resemble a Gauss bell, in contrast to Mexico City, and b) Non-pharmaceutical interventions don’t bring immediate results, and reductions to the number of deaths due to COVID are observed after a certain number of days. This paper yields partial evidence for assessing the effectiveness of public policies in mitigating the COVID-19 pandemic.
Collapse
Affiliation(s)
| | | | - Kirvis Torres-Poveda
- Centro de Investigación Sobre Enfermedades Infecciosas, Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
- CONACyT-Instituto Nacional de Salud Pública, Cuernavaca, Morelos, México
| | | | - Igor Barahona
- Universidad Nacional Autónoma de México, Instituto de Matemáticas, Laboratorio de Aplicaciones de las Matemáticas, Cuernavaca, Morelos, México
| |
Collapse
|
4
|
Abstract
Psychiatric disease is one of the greatest health challenges of our time. The pipeline for conceptually novel therapeutics remains low, in part because uncovering the biological mechanisms of psychiatric disease has been difficult. We asked experts researching different aspects of psychiatric disease: what do you see as the major urgent questions that need to be addressed? Where are the next frontiers, and what are the current hurdles to understanding the biological basis of psychiatric disease?
Collapse
|
5
|
Goel M, Bagler G. Computational gastronomy: A data science approach to food. J Biosci 2022; 47:12. [PMID: 35092414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Cooking forms the core of our cultural identity other than being the basis of nutrition and health. The increasing availability of culinary data and the advent of computational methods for their scrutiny are dramatically changing the artistic outlook towards gastronomy. Starting with a seemingly simple question, 'Why do we eat what we eat?', data-driven research conducted in our lab has led to interesting explorations of traditional recipes, their flavor composition, and health associations. Our investigations have revealed 'culinary fingerprints' of regional cuisines across the world. Application of data-driven strategies for investigating the gastronomic data has opened up exciting avenues, giving rise to an all-new field of 'computational gastronomy'. This emerging interdisciplinary science asks questions of culinary origin to seek their answers via the compilation of culinary data and their analysis using methods of complex systems, statistics, computer science, and artificial intelligence. Along with complementary experimental studies, these endeavors have the potential to transform the food landscape by effectively leveraging data-driven food innovations for better health and nutrition.
Collapse
Affiliation(s)
- Mansi Goel
- Center for Computational Biology, Indraprastha Institute of Information Technology Delhi (IIIT-Delhi), New Delhi 110 020, India
| | | |
Collapse
|
6
|
Wallach JD, Zhang AD, Skydel JJ, Bartlett VL, Dhruva SS, Shah ND, Ross JS. Feasibility of Using Real-world Data to Emulate Postapproval Confirmatory Clinical Trials of Therapeutic Agents Granted US Food and Drug Administration Accelerated Approval. JAMA Netw Open 2021; 4:e2133667. [PMID: 34751763 PMCID: PMC8579227 DOI: 10.1001/jamanetworkopen.2021.33667] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This cross-sectional study examines the feasibility of using real-world data, such as billing, claims, and electronic health records, to emulate US Food and Drug Administration–required confirmatory clinical trials for the 50 new therapeutic agents that received accelerated approval between 2009 and 2018.
Collapse
Affiliation(s)
- Joshua D. Wallach
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, Connecticut
| | | | | | | | - Sanket S. Dhruva
- Section of Cardiology, San Francisco Veterans Affairs Health Care System, San Francisco, California
- Department of Medicine, School of Medicine, University of California, San Francisco
| | - Nilay D. Shah
- Division of Health Care Policy and Research, Mayo Clinic, Rochester, Minnesota
| | - Joseph S. Ross
- Section of General Internal Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
- National Clinician Scholars Program, Yale School of Medicine, Department of Internal Medicine, New Haven, Connecticut
- Department of Health Policy and Management, Yale School of Public Health, New Haven, Connecticut
| |
Collapse
|
7
|
Martinez-Soto CE, Cucić S, Lin JT, Kirst S, Mahmoud ES, Khursigara CM, Anany H. PHIDA: A High Throughput Turbidimetric Data Analytic Tool to Compare Host Range Profiles of Bacteriophages Isolated Using Different Enrichment Methods. Viruses 2021; 13:2120. [PMID: 34834927 PMCID: PMC8623551 DOI: 10.3390/v13112120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Revised: 10/08/2021] [Accepted: 10/12/2021] [Indexed: 02/07/2023] Open
Abstract
Bacteriophages are viruses that infect bacteria and are present in niches where bacteria thrive. In recent years, the suggested application areas of lytic bacteriophage have been expanded to include therapy, biocontrol, detection, sanitation, and remediation. However, phage application is constrained by the phage's host range-the range of bacterial hosts sensitive to the phage and the degree of infection. Even though phage isolation and enrichment techniques are straightforward protocols, the correlation between the enrichment technique and host range profile has not been evaluated. Agar-based methods such as spotting assay and efficiency of plaquing (EOP) are the most used methods to determine the phage host range. These methods, aside from being labor intensive, can lead to subjective and incomplete results as they rely on qualitative observations of the lysis/plaques, do not reflect the lytic activity in liquid culture, and can overestimate the host range. In this study, phages against three bacterial genera were isolated using three different enrichment methods. Host range profiles of the isolated phages were quantitatively determined using a high throughput turbidimetric protocol and the data were analyzed with an accessible analytic tool "PHIDA". Using this tool, the host ranges of 9 Listeria, 14 Salmonella, and 20 Pseudomonas phages isolated with different enrichment methods were quantitatively compared. A high variability in the host range index (HRi) ranging from 0.86-0.63, 0.07-0.24, and 0.00-0.67 for Listeria, Salmonella, and Pseudomonas phages, respectively, was observed. Overall, no direct correlation was found between the phage host range breadth and the enrichment method in any of the three target bacterial genera. The high throughput method and analytics tool developed in this study can be easily adapted to any phage study and can provide a consensus for phage host range determination.
Collapse
Affiliation(s)
- Carlos E. Martinez-Soto
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON N1G 5C9, Canada; (C.E.M.-S.); (S.C.); (J.T.L.); (S.K.); (C.M.K.)
- Department of Molecular and Cellular Biology, College of Biological Science, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Stevan Cucić
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON N1G 5C9, Canada; (C.E.M.-S.); (S.C.); (J.T.L.); (S.K.); (C.M.K.)
- Department of Molecular and Cellular Biology, College of Biological Science, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Janet T. Lin
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON N1G 5C9, Canada; (C.E.M.-S.); (S.C.); (J.T.L.); (S.K.); (C.M.K.)
| | - Sarah Kirst
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON N1G 5C9, Canada; (C.E.M.-S.); (S.C.); (J.T.L.); (S.K.); (C.M.K.)
| | - El Sayed Mahmoud
- Faculty of Applied Science and Technology, The Sheridan College Institute of Technology and Advanced Learning, Oakville, ON L6H 2L1, Canada;
| | - Cezar M. Khursigara
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON N1G 5C9, Canada; (C.E.M.-S.); (S.C.); (J.T.L.); (S.K.); (C.M.K.)
| | - Hany Anany
- Guelph Research and Development Centre, Agriculture and Agri-Food Canada, Guelph, ON N1G 5C9, Canada; (C.E.M.-S.); (S.C.); (J.T.L.); (S.K.); (C.M.K.)
- Department of Molecular and Cellular Biology, College of Biological Science, University of Guelph, Guelph, ON N1G 2W1, Canada
| |
Collapse
|
8
|
Bahmani A, Alavi A, Buergel T, Upadhyayula S, Wang Q, Ananthakrishnan SK, Alavi A, Celis D, Gillespie D, Young G, Xing Z, Nguyen MHH, Haque A, Mathur A, Payne J, Mazaheri G, Li JK, Kotipalli P, Liao L, Bhasin R, Cha K, Rolnik B, Celli A, Dagan-Rosenfeld O, Higgs E, Zhou W, Berry CL, Van Winkle KG, Contrepois K, Ray U, Bettinger K, Datta S, Li X, Snyder MP. A scalable, secure, and interoperable platform for deep data-driven health management. Nat Commun 2021; 12:5757. [PMID: 34599181 PMCID: PMC8486823 DOI: 10.1038/s41467-021-26040-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 08/23/2021] [Indexed: 11/08/2022] Open
Abstract
The large amount of biomedical data derived from wearable sensors, electronic health records, and molecular profiling (e.g., genomics data) is rapidly transforming our healthcare systems. The increasing scale and scope of biomedical data not only is generating enormous opportunities for improving health outcomes but also raises new challenges ranging from data acquisition and storage to data analysis and utilization. To meet these challenges, we developed the Personal Health Dashboard (PHD), which utilizes state-of-the-art security and scalability technologies to provide an end-to-end solution for big biomedical data analytics. The PHD platform is an open-source software framework that can be easily configured and deployed to any big data health project to store, organize, and process complex biomedical data sets, support real-time data analysis at both the individual level and the cohort level, and ensure participant privacy at every step. In addition to presenting the system, we illustrate the use of the PHD framework for large-scale applications in emerging multi-omics disease studies, such as collecting and visualization of diverse data types (wearable, clinical, omics) at a personal level, investigation of insulin resistance, and an infrastructure for the detection of presymptomatic COVID-19.
Collapse
Affiliation(s)
- Amir Bahmani
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Arash Alavi
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Thore Buergel
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Sushil Upadhyayula
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Qiwen Wang
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Amir Alavi
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Diego Celis
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Dan Gillespie
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Gregory Young
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Ziye Xing
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA
| | - Minh Hoang Huynh Nguyen
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA
| | - Audrey Haque
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA
| | - Ankit Mathur
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Josh Payne
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Ghazal Mazaheri
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Jason Kenichi Li
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Pramod Kotipalli
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Lisa Liao
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Rajat Bhasin
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Kexin Cha
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Benjamin Rolnik
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | | | | | - Emily Higgs
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Wenyu Zhou
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA
| | - Camille Lauren Berry
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Katherine Grace Van Winkle
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | | | - Utsab Ray
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA
| | - Keith Bettinger
- Department of Genetics, Stanford University, Stanford, CA, USA
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA
| | - Somalee Datta
- Technology and Digital Solutions, Stanford Medicine, Stanford, CA, USA
| | - Xiao Li
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Biochemistry, The Center for RNA Science and Therapeutics, Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA.
| | - Michael P Snyder
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA, USA.
- Stanford Healthcare Innovation Lab, Stanford University, Stanford, CA, USA.
| |
Collapse
|
9
|
Metz M, Smith R, Mitchell R, Duong YT, Brown K, Kinchen S, Lee K, Ogollah FM, Dzinamarira T, Maliwa V, Moore C, Patel H, Chung H, Mtengo H, Saito S. Data Architecture to Support Real-Time Data Analytics for the Population-Based HIV Impact Assessments. J Acquir Immune Defic Syndr 2021; 87:S28-S35. [PMID: 34166310 PMCID: PMC10897861 DOI: 10.1097/qai.0000000000002703] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 04/07/2021] [Indexed: 11/25/2022]
Abstract
BACKGROUND AND SETTING Electronic data capture facilitates timely use of data. Population-based HIV impact assessments (PHIAs) were led by host governments, with funding from the President's Emergency Plan for AIDS Relief, technical assistance from the Centers for Disease Control, and implementation support from ICAP at Columbia University. We described data architectures, code-based processes, and resulting data volume and quality for 14 national PHIA surveys with concurrent timelines and varied country-level data governance (2015-2020). METHODS PHIA project data were collected through tablets, point-of-care and laboratory testing instruments, and inventory management systems, using open-source software, vendor solutions, and custom-built software. Data were securely uploaded to the PHIA data warehouse daily or weekly and then used to populate survey-monitoring dashboards and return timely laboratory-based test results on an ongoing basis. Automated data processing allowed timely reporting of survey results. RESULTS Fourteen data architectures were successfully established, and data from more than 450,000 participants in 30,000 files across 13 countries with completed PHIAs, and blood draws producing approximately 6000 aliquots each week per country, were securely collected, transmitted, and processed by 17 full-time equivalent staff. More than 25,600 viral load results were returned to clinics of participants' choice. Data cleaning was not needed for 98.5% of household and 99.2% of individual questionnaires. CONCLUSION The PHIA data architecture permitted secure, simultaneous collection and transmission of high-quality interview and biomarker data across multiple countries, quick turnaround time of laboratory-based biomarker results, and rapid dissemination of survey outcomes to guide President's Emergency Plan for AIDS Relief epidemic control.
Collapse
Affiliation(s)
| | | | - Rick Mitchell
- ICAP at Columbia University, New York, NY
- Clinical Trials Unit, Westat, Rockville, MD
| | | | - Kristin Brown
- Division of Global HIV and TB, Center for Global Health, U.S. Centers for Disease Control and Prevention, Atlanta, GA
| | - Steve Kinchen
- Division of Global HIV and TB, Center for Global Health, U.S. Centers for Disease Control and Prevention, Atlanta, GA
| | - Kiwon Lee
- ICAP at Columbia University, New York, NY
| | | | | | | | - Carole Moore
- Division of Global HIV and TB, Center for Global Health, U.S. Centers for Disease Control and Prevention, Atlanta, GA
| | - Hetal Patel
- Division of Global HIV and TB, Center for Global Health, U.S. Centers for Disease Control and Prevention, Atlanta, GA
| | | | | | - Suzue Saito
- ICAP at Columbia University, New York, NY
- Department of Epidemiology, Mailman School of Public Health at Columbia University, New York, NY
| |
Collapse
|
10
|
Garrison L, Muller J, Schreiber S, Oeltze-Jafra S, Hauser H, Bruckner S. DimLift: Interactive Hierarchical Data Exploration Through Dimensional Bundling. IEEE Trans Vis Comput Graph 2021; 27:2908-2922. [PMID: 33544674 DOI: 10.1109/tvcg.2021.3057519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The identification of interesting patterns and relationships is essential to exploratory data analysis. This becomes increasingly difficult in high dimensional datasets. While dimensionality reduction techniques can be utilized to reduce the analysis space, these may unintentionally bury key dimensions within a larger grouping and obfuscate meaningful patterns. With this work we introduce DimLift, a novel visual analysis method for creating and interacting with dimensional bundles. Generated through an iterative dimensionality reduction or user-driven approach, dimensional bundles are expressive groups of dimensions that contribute similarly to the variance of a dataset. Interactive exploration and reconstruction methods via a layered parallel coordinates plot allow users to lift interesting and subtle relationships to the surface, even in complex scenarios of missing and mixed data types. We exemplify the power of this technique in an expert case study on clinical cohort data alongside two additional case examples from nutrition and ecology.
Collapse
|
11
|
Tilves C, Peddada S, Miljkovic I. Body Composition Analyses Require Compositional Data Analytic (CoDA) Methods. Obesity (Silver Spring) 2021; 29:783-785. [PMID: 33759398 PMCID: PMC8340562 DOI: 10.1002/oby.23132] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 01/04/2021] [Accepted: 01/19/2021] [Indexed: 11/10/2022]
Affiliation(s)
- Curtis Tilves
- Department of Epidemiology, University of Pittsburgh, Pittsburgh, PA, USA
| | - Shyamal Peddada
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Iva Miljkovic
- Department of Epidemiology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
12
|
Affiliation(s)
- Philip E. Bourne
- University of Virginia, Charlottesville, Virginia, United States of America
- * E-mail:
| |
Collapse
|
13
|
Pluchino A, Biondo AE, Giuffrida N, Inturri G, Latora V, Le Moli R, Rapisarda A, Russo G, Zappalà C. A novel methodology for epidemic risk assessment of COVID-19 outbreak. Sci Rep 2021; 11:5304. [PMID: 33674627 PMCID: PMC7935987 DOI: 10.1038/s41598-021-82310-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 01/19/2021] [Indexed: 12/24/2022] Open
Abstract
We propose a novel data-driven framework for assessing the a-priori epidemic risk of a geographical area and for identifying high-risk areas within a country. Our risk index is evaluated as a function of three different components: the hazard of the disease, the exposure of the area and the vulnerability of its inhabitants. As an application, we discuss the case of COVID-19 outbreak in Italy. We characterize each of the twenty Italian regions by using available historical data on air pollution, human mobility, winter temperature, housing concentration, health care density, population size and age. We find that the epidemic risk is higher in some of the Northern regions with respect to Central and Southern Italy. The corresponding risk index shows correlations with the available official data on the number of infected individuals, patients in intensive care and deceased patients, and can help explaining why regions such as Lombardia, Emilia-Romagna, Piemonte and Veneto have suffered much more than the rest of the country. Although the COVID-19 outbreak started in both North (Lombardia) and Central Italy (Lazio) almost at the same time, when the first cases were officially certified at the beginning of 2020, the disease has spread faster and with heavier consequences in regions with higher epidemic risk. Our framework can be extended and tested on other epidemic data, such as those on seasonal flu, and applied to other countries. We also present a policy model connected with our methodology, which might help policy-makers to take informed decisions.
Collapse
Affiliation(s)
- A Pluchino
- Dipartimento di Fisica e Astronomia "Ettore Majorana", INFN Sezione di Catania, Università di Catania, Catania, Italy.
| | - A E Biondo
- Dipartimento di Economia e Impresa, Università di Catania, Catania, Italy
| | - N Giuffrida
- Dipartimento di Ingegneria Civile e Architettura, Università di Catania, Catania, Italy
| | - G Inturri
- Dipartimento di Ingegneria Elettrica Elettronica e Informatica, Università di Catania, Catania, Italy
| | - V Latora
- Dipartimento di Fisica e Astronomia "Ettore Majorana", INFN Sezione di Catania, Università di Catania, Catania, Italy
- Complexity Science Hub Vienna, Vienna, Austria
- School of Mathematical Sciences, Queen Mary University of London, London, E1 4NS, UK
- The Alan Turing Institute, The British Library, London, NW1 2DB, UK
| | - R Le Moli
- Dipartimento di Medicina Clinica e Sperimentale - UO di Endocrinologia - Ospedale Garibaldi Nesima, Università di Catania, Catania, Italy
| | - A Rapisarda
- Dipartimento di Fisica e Astronomia "Ettore Majorana", INFN Sezione di Catania, Università di Catania, Catania, Italy
- Complexity Science Hub Vienna, Vienna, Austria
| | - G Russo
- Dipartimento di Matematica e Informatica, Università di Catania, Catania, Italy
| | - C Zappalà
- Dipartimento di Fisica e Astronomia "Ettore Majorana", INFN Sezione di Catania, Università di Catania, Catania, Italy
| |
Collapse
|
14
|
Abstract
Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of machine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis. Every teacher strives for an a-ha moment, a sudden revelation by the student who gained a fundamental insight she will always remember. In the past years, authors of this paper have been tailoring their courses in machine learning to include material that could lead students to such discoveries. We aim to expose machine learning to practitioners–not only computer scientists but also molecular biologists and students of biomedicine, that is, the end-users of bioinformatics’ computational approaches. In this article, we lay out a course that aims to teach about overfitting, one of the key concepts in machine learning that needs to be understood, mastered, and avoided in data science applications. We propose a hands-on approach that uses an open-source workflow-based data science toolbox that combines data visualization and machine learning. In the proposed training about overfitting, we first deceive the students, then expose the problem, and finally challenge them to find the solution. In the paper, we present three lessons in overfitting and associated data analysis workflows and motivate the use of introduced computation methods by relating them to concepts conveyed by instructors.
Collapse
Affiliation(s)
- Janez Demšar
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Blaž Zupan
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
15
|
Kincaid MW, Peters ZJ, Curry JC. Data-driven approaches to care delivery: Actionable informatics in the DoD's primary care behavioral health program. Fam Syst Health 2021; 39:66-76. [PMID: 34014731 DOI: 10.1037/fsh0000583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
INTRODUCTION Transforming administrative health care data into meaningful metrics has been critical to the implementation of the Department of Defense's Primary Care Behavioral Health (PCBH) program. METHODS Data from clinical encounters with PCBH providers are used to develop metrics of program performance collaboratively. Metrics focus on describing the PCBH program and patients, provider fidelity to the model, and provider performance. These metrics form two key deliverables: a monitoring dashboard for program managers and a training dashboard for expert trainers conducting site visits. RESULTS Behavioral health consultants (BHCs) conducted nearly 200,000 encounters with more than 100,000 unique patients in fiscal year 2019 at more than 170 locations in 6 countries and 37 states. Administrative data derived from these encounters were used to create a variety of metrics that describe practice and performance at both the provider and program levels. These metrics are delivered through a variety of analytic products to stakeholders who use that information to make data-driven decisions about program direction and provider training. DISCUSSION We discuss examples of program management decisions and expert trainer actions based on these dashboards, highlighting the benefits of continued collaboration between analysts and program managers. Specifically, excerpts from several dashboards illustrate how penetration and productivity metrics yield specific, tailored action plans to improve care delivery and provider performance. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
- Melissa W Kincaid
- Psychological Health Center of Excellence, Research and Development Directorate (J-9), Defense Health Agency
| | - Zachary J Peters
- Psychological Health Center of Excellence, Research and Development Directorate (J-9), Defense Health Agency
| | - Justin C Curry
- Psychological Health Center of Excellence, Research and Development Directorate (J-9), Defense Health Agency
| |
Collapse
|
16
|
Abstract
BACKGROUND The rapid integration of Artificial Intelligence (AI) into the healthcare field has occurred with little communication between computer scientists and doctors. The impact of AI on health outcomes and inequalities calls for health professionals and data scientists to make a collaborative effort to ensure historic health disparities are not encoded into the future. We present a study that evaluates bias in existing Natural Language Processing (NLP) models used in psychiatry and discuss how these biases may widen health inequalities. Our approach systematically evaluates each stage of model development to explore how biases arise from a clinical, data science and linguistic perspective. DESIGN/METHODS A literature review of the uses of NLP in mental health was carried out across multiple disciplinary databases with defined Mesh terms and keywords. Our primary analysis evaluated biases within 'GloVe' and 'Word2Vec' word embeddings. Euclidean distances were measured to assess relationships between psychiatric terms and demographic labels, and vector similarity functions were used to solve analogy questions relating to mental health. RESULTS Our primary analysis of mental health terminology in GloVe and Word2Vec embeddings demonstrated significant biases with respect to religion, race, gender, nationality, sexuality and age. Our literature review returned 52 papers, of which none addressed all the areas of possible bias that we identify in model development. In addition, only one article existed on more than one research database, demonstrating the isolation of research within disciplinary silos and inhibiting cross-disciplinary collaboration or communication. CONCLUSION Our findings are relevant to professionals who wish to minimize the health inequalities that may arise as a result of AI and data-driven algorithms. We offer primary research identifying biases within these technologies and provide recommendations for avoiding these harms in the future.
Collapse
Affiliation(s)
- Isabel Straw
- Department of Public Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Chris Callison-Burch
- Computer and Information Science Department, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
17
|
Patel JA, Sharma P. Online Analytical Processing for Business Intelligence in Big Data. Big Data 2020; 8:501-518. [PMID: 33347370 DOI: 10.1089/big.2020.0045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Online analytical processing (OLAP) approach is widely used in business intelligence to cater the multidimensional queries for decades. In this era of cutting-edge technology and the internet, data generation rates have been rising exponentially. Internet of things sensors and social media platforms are some of the major contributors, leading toward the absolute data boom. Storage and speed are the crucial parameters and undoubtedly the burning issues in efficient data handling. The key idea here is to address these two challenges of big data computing in OLAP. In this article, the authors have proposed and implemented OLAP on Hadoop by Indexing (OOHI). OOHI offers a simplified multidimensional model that stores dimensions in the schema server and measures on the Hadoop cluster. Overall setup is divided into various modules, namely: data storage module (DSM), dimension encoding module (DEM), cube segmentation module, segment selection module (SSM), and block selection and process (BSAP) module. Serialization and deserialization concept applied by DSM for storage and retrieval of the data for efficient space utilization. Integer encoding adopted by DEM in dimension hierarchy is selected to escape sparsity problem in multidimensional big data. To reduce search space by chunks of the cube from the queried chunks, SSM plays an important role. Map reduce-based indexing approach and series of seek operations of BSAP module were integrated to achieve parallelism and fault tolerance. Real-time oceanography data and supermarket data sets are applied to demonstrate that OOHI model is data independent. Various test cases are designed to cover the scope of each dimension and volume of data set. Comparative results and performance analytics portray that OOHI outperforms in data storage, dice, slice, and roll-up operations compared with Hadoop based OLAP.
Collapse
Affiliation(s)
- Jigna Ashish Patel
- Department of CSE, Institute of Technology, Nirma University, Ahmedabad, India
| | | |
Collapse
|
18
|
Abstract
Amid public health concerns over climate change, "precision public health" (PPH) is emerging in next generation approaches to practice. These novel methods promise to augment public health operations by using ever larger and more robust health datasets combined with new tools for collecting and analyzing data. Precision strategies to protecting the public health could more effectively or efficiently address the systemic threats of climate change, but may also propagate or exacerbate health disparities for the populations most vulnerable in a changing climate. How PPH interventions collect and aggregate data, decide what to measure, and analyze data pose potential issues around privacy, neglecting social determinants of health, and introducing algorithmic bias into climate responses. Adopting a health justice framework, guided by broader social and climate justice tenets, can reveal principles and policy actions which may guide more responsible implementation of PPH in climate responses.
Collapse
Affiliation(s)
- Walter G Johnson
- Walter G. Johnson, J.D. M.S.T.P., is a research fellow at the Sandra Day O'Connor College of Law, Arizona State University. He received a J.D. from the Sandra Day O'Connor College of Law in 2020 and a Master of Science and Technology Policy (M.S.T.P.) from Arizona State University in 2017
| |
Collapse
|
19
|
Vaca Jacome AS, Peckner R, Shulman N, Krug K, DeRuff KC, Officer A, Christianson KE, MacLean B, MacCoss MJ, Carr SA, Jaffe JD. Avant-garde: an automated data-driven DIA data curation tool. Nat Methods 2020; 17:1237-1244. [PMID: 33199889 PMCID: PMC7723322 DOI: 10.1038/s41592-020-00986-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 09/25/2020] [Indexed: 12/03/2022]
Abstract
Several challenges remain in data-independent acquisition (DIA) data analysis, such as to confidently identify peptides, define integration boundaries, remove interferences, and control false discovery rates. In practice, a visual inspection of the signals is still required, which is impractical with large datasets. We present Avant-garde as a tool to refine DIA (and parallel reaction monitoring) data. Avant-garde uses a novel data-driven scoring strategy: signals are refined by learning from the dataset itself, using all measurements in all samples to achieve the best optimization. We evaluate the performance of Avant-garde using benchmark DIA datasets and show that it can determine the quantitative suitability of a peptide peak, and reach the same levels of selectivity, accuracy, and reproducibility as manual validation. Avant-garde is complementary to existing DIA analysis engines and aims to establish a strong foundation for subsequent analysis of quantitative mass spectrometry data.
Collapse
Affiliation(s)
| | - Ryan Peckner
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cogen Therapeutics, Cambridge, MA, USA
| | | | - Karsten Krug
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Adam Officer
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | | | - Steven A Carr
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jacob D Jaffe
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Inzen Therapeutics, Cambridge, MA, USA.
- Inzen Therapeutics, Cambridge, MA, USA.
| |
Collapse
|
20
|
Tarka P, Jędrych E. On the Unstructured Big Data Analytical Methods in Firms: Conceptual Model, Measurement, and Perception. Big Data 2020; 8:478-500. [PMID: 33202160 DOI: 10.1089/big.2020.0123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Firms face challenging analytical tasks at the advent of a growing amount of unstructured big data (BD). These data lead to radical shifts in their analytical strategies and market insights. Yet, the particular types of analytical methods remain in the literature still loosely scattered. This work stresses the unstructured BD analytics, first by capturing their unique characteristics and then by proposing a model for diagnosis of the analytical methods related to unstructured data (UD) inside the firms. We focus on five interrelated research aspects, by: explaining the essence of UD with the firms' environment; identifying and classifying the most important analytical methods in organizations to better understand UD; developing a conceptual model along with measures; and diagnosing the extent to which the unstructured analytical methods, beside the structured analytics, relate with firm performance (FP). Finally, this model is investigated from perspective of the two-communities theory in reference to data scientists and marketing researchers within the organizational environment. A model is tested on the basis of complementary analytical strategies: confirmatory and multigroup factor analyses and structural equation modeling, for which data (N = 356) were collected from international online survey. Results confirm a high level of adequacy of the conceptual model and superiority of unstructured over the structured analytics leading to FP, while the scalar invariance testing proves minor differences between groups in reference to two of the analytical methods.
Collapse
Affiliation(s)
- Piotr Tarka
- Department of Market Research, Poznan University of Economics and Business, Poznan, Poland
| | - Elżbieta Jędrych
- Department of Business and International Relations, Vistula University, Warszawa, Poland
| |
Collapse
|
21
|
Marino S, Zhao Y, Zhou N, Zhou Y, Toga AW, Zhao L, Jian Y, Yang Y, Chen Y, Wu Q, Wild J, Cummings B, Dinov ID. Compressive Big Data Analytics: An ensemble meta-algorithm for high-dimensional multisource datasets. PLoS One 2020; 15:e0228520. [PMID: 32857775 PMCID: PMC7455041 DOI: 10.1371/journal.pone.0228520] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 08/11/2020] [Indexed: 11/18/2022] Open
Abstract
Health advances are contingent on continuous development of new methods and approaches to foster data-driven discovery in the biomedical and clinical sciences. Open-science and team-based scientific discovery offer hope for tackling some of the difficult challenges associated with managing, modeling, and interpreting of large, complex, and multisource data. Translating raw observations into useful information and actionable knowledge depends on effective domain-independent reproducibility, area-specific replicability, data curation, analysis protocols, organization, management and sharing of health-related digital objects. This study expands the functionality and utility of an ensemble semi-supervised machine learning technique called Compressive Big Data Analytics (CBDA). Applied to high-dimensional data, CBDA (1) identifies salient features and key biomarkers enabling reliable and reproducible forecasting of binary, multinomial and continuous outcomes (i.e., feature mining); and (2) suggests the most accurate algorithms/models for predictive analytics of the observed data (i.e., model mining). The method relies on iterative subsampling, combines function optimization and statistical inference, and generates ensemble predictions for observed univariate outcomes. The novelty of this study is highlighted by a new and expanded set of CBDA features including (1) efficiently handling extremely large datasets (>100,000 cases and >1,000 features); (2) generalizing the internal and external validation steps; (3) expanding the set of base-learners for joint ensemble prediction; (4) introducing an automated selection of CBDA specifications; and (5) providing mechanisms to assess CBDA convergence, evaluate the prediction accuracy, and measure result consistency. To ground the mathematical model and the corresponding computational algorithm, CBDA 2.0 validation utilizes synthetic datasets as well as a population-wide census-like study. Specifically, an empirical validation of the CBDA technique is based on a translational health research using a large-scale clinical study (UK Biobank), which includes imaging, cognitive, and clinical assessment data. The UK Biobank archive presents several difficult challenges related to the aggregation, harmonization, modeling, and interrogation of the information. These problems are related to the complex longitudinal structure, variable heterogeneity, feature multicollinearity, incongruency, and missingness, as well as violations of classical parametric assumptions. Our results show the scalability, efficiency, and usability of CBDA to interrogate complex data into structural information leading to derived knowledge and translational action. Applying CBDA 2.0 to the UK Biobank case-study allows predicting various outcomes of interest, e.g., mood disorders and irritability, and suggests new and exciting avenues of evidence-based research in the context of identifying, tracking, and treating mental health and aging-related diseases. Following open-science principles, we share the entire end-to-end protocol, source-code, and results. This facilitates independent validation, result reproducibility, and team-based collaborative discovery.
Collapse
Affiliation(s)
- Simeone Marino
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yi Zhao
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Nina Zhou
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yiwang Zhou
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Arthur W. Toga
- Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, Los Angeles, California, United States of America
| | - Lu Zhao
- Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, Los Angeles, California, United States of America
| | - Yingsi Jian
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yichen Yang
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yehu Chen
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Qiucheng Wu
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jessica Wild
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Brandon Cummings
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
- Michigan Center for Integrative Research in Critical Care, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Ivo D. Dinov
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan, United States of America
- Neuroscience Graduate Program, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
22
|
Abstract
PURPOSE OF REVIEW New single-cell tec. hnologies developed over the past decade have considerably reshaped the biomedical research landscape, and more recently have found their way into studies probing the pathogenesis of type 1 diabetes (T1D). In this context, the emergence of mass cytometry in 2009 revolutionized immunological research in two fundamental ways that also affect the T1D world: first, its ready embrace by the community and rapid dissemination across academic and private science centers alike established a new standard of analytical complexity for the high-dimensional proteomic stratification of single-cell populations; and second, the somewhat unexpected arrival of mass cytometry awoke the flow cytometry field from its seeming sleeping beauty stupor and precipitated substantial technological advances that by now approach a degree of analytical dimensionality comparable to mass cytometry. RECENT FINDINGS Here, we summarize in detail how mass cytometry has thus far been harnessed for the pursuit of discovery studies in T1D science; we provide a succinct overview of other single-cell analysis platforms that already have been or soon will be integrated into various T1D investigations; and we briefly consider how effective adoption of these technologies requires an adjusted model for expense allocation, prioritization of experimental questions, division of labor, and recognition of scientific contributions. SUMMARY The introduction of contemporary single-cell technologies in general, and of mass cytometry, in particular, provides important new opportunities for current and future T1D research; the necessary reconfiguration of research strategies to accommodate implementation of these technologies, however, may both broaden research endeavors by fostering genuine team science, and constrain their actual practice because of the need for considerable investments into infrastructure and technical expertise.
Collapse
Affiliation(s)
| | - Dirk Homann
- Precision Immunology Institute
- Diabetes, Obesity & Metabolism Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
23
|
Zhan C, Tse CK, Lai Z, Hao T, Su J. Prediction of COVID-19 spreading profiles in South Korea, Italy and Iran by data-driven coding. PLoS One 2020; 15:e0234763. [PMID: 32628673 PMCID: PMC7337285 DOI: 10.1371/journal.pone.0234763] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 06/02/2020] [Indexed: 11/18/2022] Open
Abstract
This work applies a data-driven coding method for prediction of the COVID-19 spreading profile in any given population that shows an initial phase of epidemic progression. Based on the historical data collected for COVID-19 spreading in 367 cities in China and the set of parameters of the augmented Susceptible-Exposed-Infected-Removed (SEIR) model obtained for each city, a set of profile codes representing a variety of transmission mechanisms and contact topologies is formed. By comparing the data of an early outbreak of a given population with the complete set of historical profiles, the best fit profiles are selected and the corresponding sets of profile codes are used for prediction of the future progression of the epidemic in that population. Application of the method to the data collected for South Korea, Italy and Iran shows that peaks of infection cases are expected to occur before mid April, the end of March and the end of May 2020, and that the percentage of population infected in each city or region will be less than 0.01%, 0.5% and 0.5%, for South Korea, Italy and Iran, respectively.
Collapse
Affiliation(s)
- Choujun Zhan
- School of Computing, South China Normal University, Guangzhou, China
| | - Chi K. Tse
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Zhikang Lai
- School of Electrical and Computer Engineering, Nanfang College of Sun Yat-Sen University, Guangzhou, China
| | - Tianyong Hao
- School of Computing, South China Normal University, Guangzhou, China
| | - Jingjing Su
- Nethersole School of Nursing, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
24
|
Hyder A, May AA. Translational data analytics in exposure science and environmental health: a citizen science approach with high school students. Environ Health 2020; 19:73. [PMID: 32611428 PMCID: PMC7329470 DOI: 10.1186/s12940-020-00627-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 06/22/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND Translational data analytics aims to apply data analytics principles and techniques to bring about broader societal or human impact. Translational data analytics for environmental health is an emerging discipline and the objective of this study is to describe a real-world example of this emerging discipline. METHODS We implemented a citizen-science project at a local high school. Multiple cohorts of citizen scientists, who were students, fabricated and deployed low-cost air quality sensors. A cloud-computing solution provided real-time air quality data for risk screening purposes, data analytics and curricular activities. RESULTS The citizen-science project engaged with 14 high school students over a four-year period that is continuing to this day. The project led to the development of a website that displayed sensor-based measurements in local neighborhoods and a GitHub-like repository for open source code and instructions. Preliminary results showed a reasonable comparison between sensor-based and EPA land-based federal reference monitor data for CO and NOx. CONCLUSIONS Initial sensor-based data collection efforts showed reasonable agreement with land-based federal reference monitors but more work needs to be done to validate these results. Lessons learned were: 1) the need for sustained funding because citizen science-based project timelines are a function of community needs/capacity and building interdisciplinary rapport in academic settings and 2) the need for a dedicated staff to manage academic-community relationships.
Collapse
Affiliation(s)
- Ayaz Hyder
- Division of Environmental Health Sciences, College of Public Health, The Ohio State University, 1841 Neil Ave., Cunz Hall, Room 380D, Columbus, OH 43210 USA
- Translational Data Analytics Institute, The Ohio State University, 1841 Neil Ave., Cunz Hall, Room 380D, Columbus, OH 43210 USA
| | - Andrew A. May
- Department of Civil, Environmental and Geodetic Engineering, College of Engineering, The Ohio State University, 2070 Neil Avenue, 483A Hitchcock Hall, Columbus, OH 43210 USA
- Ohio State University Center for Automotive Research, 2070 Neil Avenue, 483A Hitchcock Hall, Columbus, OH 43210 USA
| |
Collapse
|
25
|
Abstract
Sandro Galea and co-authors discuss a forthcoming Collection on data science and social determinants of health.
Collapse
Affiliation(s)
- Sandro Galea
- School of Public Health, Boston University, Boston, Massachusetts, United States of America
- * E-mail:
| | - Salma M. Abdalla
- School of Public Health, Boston University, Boston, Massachusetts, United States of America
| | | |
Collapse
|
26
|
Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, Kirchler M, Iwanir R, Mumford JA, Adcock RA, Avesani P, Baczkowski BM, Bajracharya A, Bakst L, Ball S, Barilari M, Bault N, Beaton D, Beitner J, Benoit RG, Berkers RMWJ, Bhanji JP, Biswal BB, Bobadilla-Suarez S, Bortolini T, Bottenhorn KL, Bowring A, Braem S, Brooks HR, Brudner EG, Calderon CB, Camilleri JA, Castrellon JJ, Cecchetti L, Cieslik EC, Cole ZJ, Collignon O, Cox RW, Cunningham WA, Czoschke S, Dadi K, Davis CP, Luca AD, Delgado MR, Demetriou L, Dennison JB, Di X, Dickie EW, Dobryakova E, Donnat CL, Dukart J, Duncan NW, Durnez J, Eed A, Eickhoff SB, Erhart A, Fontanesi L, Fricke GM, Fu S, Galván A, Gau R, Genon S, Glatard T, Glerean E, Goeman JJ, Golowin SAE, González-García C, Gorgolewski KJ, Grady CL, Green MA, Guassi Moreira JF, Guest O, Hakimi S, Hamilton JP, Hancock R, Handjaras G, Harry BB, Hawco C, Herholz P, Herman G, Heunis S, Hoffstaedter F, Hogeveen J, Holmes S, Hu CP, Huettel SA, Hughes ME, Iacovella V, Iordan AD, Isager PM, Isik AI, Jahn A, Johnson MR, Johnstone T, Joseph MJE, Juliano AC, Kable JW, Kassinopoulos M, Koba C, Kong XZ, Koscik TR, Kucukboyaci NE, Kuhl BA, Kupek S, Laird AR, Lamm C, Langner R, Lauharatanahirun N, Lee H, Lee S, Leemans A, Leo A, Lesage E, Li F, Li MYC, Lim PC, Lintz EN, Liphardt SW, Losecaat Vermeer AB, Love BC, Mack ML, Malpica N, Marins T, Maumet C, McDonald K, McGuire JT, Melero H, Méndez Leal AS, Meyer B, Meyer KN, Mihai G, Mitsis GD, Moll J, Nielson DM, Nilsonne G, Notter MP, Olivetti E, Onicas AI, Papale P, Patil KR, Peelle JE, Pérez A, Pischedda D, Poline JB, Prystauka Y, Ray S, Reuter-Lorenz PA, Reynolds RC, Ricciardi E, Rieck JR, Rodriguez-Thompson AM, Romyn A, Salo T, Samanez-Larkin GR, Sanz-Morales E, Schlichting ML, Schultz DH, Shen Q, Sheridan MA, Silvers JA, Skagerlund K, Smith A, Smith DV, Sokol-Hessner P, Steinkamp SR, Tashjian SM, Thirion B, Thorp JN, Tinghög G, Tisdall L, Tompson SH, Toro-Serey C, Torre Tresols JJ, Tozzi L, Truong V, Turella L, van 't Veer AE, Verguts T, Vettel JM, Vijayarajah S, Vo K, Wall MB, Weeda WD, Weis S, White DJ, Wisniewski D, Xifra-Porxas A, Yearling EA, Yoon S, Yuan R, Yuen KSL, Zhang L, Zhang X, Zosky JE, Nichols TE, Poldrack RA, Schonberg T. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 2020; 582:84-88. [PMID: 32483374 PMCID: PMC7771346 DOI: 10.1038/s41586-020-2314-9] [Citation(s) in RCA: 423] [Impact Index Per Article: 105.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 04/07/2020] [Indexed: 01/13/2023]
Abstract
Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses1. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset2-5. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.
Collapse
Affiliation(s)
- Rotem Botvinik-Nezer
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- Department of Neurobiology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, USA
| | - Felix Holzmeister
- Department of Banking and Finance, University of Innsbruck, Innsbruck, Austria
| | - Colin F Camerer
- HSS and CNS, California Institute of Technology, Pasadena, CA, USA
| | - Anna Dreber
- Department of Economics, Stockholm School of Economics, Stockholm, Sweden
- Department of Economics, University of Innsbruck, Innsbruck, Austria
| | - Juergen Huber
- Department of Banking and Finance, University of Innsbruck, Innsbruck, Austria
| | - Magnus Johannesson
- Department of Economics, Stockholm School of Economics, Stockholm, Sweden
| | - Michael Kirchler
- Department of Banking and Finance, University of Innsbruck, Innsbruck, Austria
| | - Roni Iwanir
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- Department of Neurobiology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Jeanette A Mumford
- Center for Healthy Minds, University of Wisconsin-Madison, Madison, WI, USA
| | - R Alison Adcock
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA
| | - Paolo Avesani
- Neuroinformatics Laboratory, Fondazione Bruno Kessler, Trento, Italy
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy
| | - Blazej M Baczkowski
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Aahana Bajracharya
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, MO, USA
| | - Leah Bakst
- Department of Psychological and Brain Sciences, Boston University, Boston, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, MA, USA
| | - Sheryl Ball
- Department of Economics, Virginia Tech, Blacksburg, VA, USA
- School of Neuroscience, Virginia Tech, Blacksburg, VA, USA
| | - Marco Barilari
- Crossmodal Perception and Plasticity Laboratory, Institutes for Research in Psychology (IPSY) and Neurosciences (IoNS), UCLouvain, Louvain-la-Neuve, Belgium
| | - Nadège Bault
- School of Psychology, University of Plymouth, Plymouth, UK
| | - Derek Beaton
- Rotman Research Institute, Baycrest Health Sciences Centre, Toronto, Ontario, Canada
| | - Julia Beitner
- Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
- Department of Psychology, Goethe University, Frankfurt am Main, Germany
| | - Roland G Benoit
- Max Planck Research Group: Adaptive Memory, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Ruud M W J Berkers
- Max Planck Research Group: Adaptive Memory, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Jamil P Bhanji
- Department of Psychology, Rutgers University-Newark, Newark, NJ, USA
| | - Bharat B Biswal
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, USA
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | | | - Tiago Bortolini
- D'Or Institute for Research and Education (IDOR), Rio de Janeiro, Brazil
| | | | - Alexander Bowring
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Senne Braem
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
- Department of Psychology, Vrije Universiteit Brussel, Brussels, Belgium
| | - Hayley R Brooks
- Department of Psychology, University of Denver, Denver, CO, USA
| | - Emily G Brudner
- Department of Psychology, Rutgers University-Newark, Newark, NJ, USA
| | | | - Julia A Camilleri
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jaime J Castrellon
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Luca Cecchetti
- MoMiLab Research Unit, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Edna C Cieslik
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Zachary J Cole
- Department of Psychology, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Olivier Collignon
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy
- Crossmodal Perception and Plasticity Laboratory, Institutes for Research in Psychology (IPSY) and Neurosciences (IoNS), UCLouvain, Louvain-la-Neuve, Belgium
| | - Robert W Cox
- National Institute of Mental Health (NIMH), National Institutes of Health, Bethesda, MD, USA
| | | | - Stefan Czoschke
- Institute of Medical Psychology, Goethe University, Frankfurt am Main, Germany
| | | | - Charles P Davis
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
- Brain Imaging Research Center, University of Connecticut, Storrs, CT, USA
- Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, Storrs, CT, USA
| | - Alberto De Luca
- PROVIDI Lab, Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands
| | | | - Lysia Demetriou
- Section of Endocrinology and Investigative Medicine, Faculty of Medicine, Imperial College London, London, UK
- Nuffield Department of Women's and Reproductive Health, University of Oxford, Oxford, UK
| | | | - Xin Di
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, USA
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Erin W Dickie
- Krembil Centre for Neuroinformatics, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
| | - Ekaterina Dobryakova
- Center for Traumatic Brain Injury Research, Kessler Foundation, East Hanover, NJ, USA
| | - Claire L Donnat
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Juergen Dukart
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Niall W Duncan
- Graduate Institute of Mind, Brain and Consciousness, Taipei Medical University, Taipei, Taiwan
- Brain and Consciousness Research Centre, TMU-ShuangHo Hospital, New Taipei City, Taiwan
| | - Joke Durnez
- Department of Psychology and Stanford Center for Reproducible Neuroscience, Stanford University, Stanford, CA, USA
| | - Amr Eed
- Instituto de Neurociencias, CSIC-UMH, Alicante, Spain
| | - Simon B Eickhoff
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrew Erhart
- Department of Psychology, University of Denver, Denver, CO, USA
| | - Laura Fontanesi
- Faculty of Psychology, University of Basel, Basel, Switzerland
| | - G Matthew Fricke
- Computer Science Department, University of New Mexico, Albuquerque, NM, USA
| | - Shiguang Fu
- School of Management, Zhejiang University of Technology, Hangzhou, China
- Institute of Neuromanagement, Zhejiang University of Technology, Hangzhou, China
| | - Adriana Galván
- Department of Psychology, University of California Los Angeles, Los Angeles, CA, USA
| | - Remi Gau
- Crossmodal Perception and Plasticity Laboratory, Institutes for Research in Psychology (IPSY) and Neurosciences (IoNS), UCLouvain, Louvain-la-Neuve, Belgium
| | - Sarah Genon
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Tristan Glatard
- Department of Computer Science and Software Engineering, Concordia University, Montreal, Quebec, Canada
| | - Enrico Glerean
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland
| | - Jelle J Goeman
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Sergej A E Golowin
- Graduate Institute of Mind, Brain and Consciousness, Taipei Medical University, Taipei, Taiwan
| | | | | | - Cheryl L Grady
- Rotman Research Institute, Baycrest Health Sciences Centre, Toronto, Ontario, Canada
| | - Mikella A Green
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - João F Guassi Moreira
- Department of Psychology, University of California Los Angeles, Los Angeles, CA, USA
| | - Olivia Guest
- Department of Experimental Psychology, University College London, London, UK
- Research Centre on Interactive Media, Smart Systems and Emerging Technologies - RISE, Nicosia, Cyprus
| | - Shabnam Hakimi
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA
| | - J Paul Hamilton
- Center for Social and Affective Neuroscience, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Roeland Hancock
- Brain Imaging Research Center, University of Connecticut, Storrs, CT, USA
- Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, Storrs, CT, USA
| | - Giacomo Handjaras
- MoMiLab Research Unit, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Bronson B Harry
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, New South Wales, Australia
| | - Colin Hawco
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| | - Peer Herholz
- McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Gabrielle Herman
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| | - Stephan Heunis
- Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Department of Research and Development, Epilepsy Centre Kempenhaeghe, Heeze, The Netherlands
| | - Felix Hoffstaedter
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jeremy Hogeveen
- Department of Psychology, University of New Mexico, Albuquerque, NM, USA
- Psychology Clinical Neuroscience Center, University of New Mexico, Albuquerque, NM, USA
| | - Susan Holmes
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - Chuan-Peng Hu
- Leibniz-Institut für Resilienzforschung (LIR), Mainz, Germany
| | - Scott A Huettel
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Matthew E Hughes
- School of Health Sciences, Swinburne University of Technology, Hawthorn, Victoria, Australia
| | - Vittorio Iacovella
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy
| | | | - Peder M Isager
- Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Ayse I Isik
- Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
| | - Andrew Jahn
- fMRI Laboratory, University of Michigan, Ann Arbor, MI, USA
| | - Matthew R Johnson
- Department of Psychology, University of Nebraska-Lincoln, Lincoln, NE, USA
- Center for Brain, Biology and Behavior, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Tom Johnstone
- School of Health Sciences, Swinburne University of Technology, Hawthorn, Victoria, Australia
| | - Michael J E Joseph
- Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| | - Anthony C Juliano
- Center for Neuropsychology and Neuroscience Research, Kessler Foundation, East Hanover, NJ, USA
| | - Joseph W Kable
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
- MindCORE, University of Pennsylvania, Philadelphia, PA, USA
| | - Michalis Kassinopoulos
- Graduate Program in Biological and Biomedical Engineering, McGill University, Montreal, Quebec, Canada
| | - Cemal Koba
- MoMiLab Research Unit, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Xiang-Zhen Kong
- Language and Genetics Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
| | - Timothy R Koscik
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | - Nuri Erkut Kucukboyaci
- Center for Traumatic Brain Injury Research, Kessler Foundation, East Hanover, NJ, USA
- Department of Physical Medicine and Rehabilitation, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Brice A Kuhl
- Department of Psychology, University of Oregon, Eugene, OR, USA
| | - Sebastian Kupek
- Faculty of Economics and Statistics, University of Innsbruck, Innsbruck, Austria
| | - Angela R Laird
- Department of Physics, Florida International University, Miami, Florida, USA
| | - Claus Lamm
- Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria
- Vienna Cognitive Science Hub, University of Vienna, Vienna, Austria
| | - Robert Langner
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Nina Lauharatanahirun
- US CCDC Army Research Laboratory, Human Research and Engineering Directorate, Aberdeen Proving Ground, MD, USA
- Annenberg School for Communication, University of Pennsylvania, Philadelphia, PA, USA
| | - Hongmi Lee
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD, USA
| | - Sangil Lee
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA
| | - Alexander Leemans
- PROVIDI Lab, Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Andrea Leo
- MoMiLab Research Unit, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Elise Lesage
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Flora Li
- Fralin Biomedical Research Institute, Roanoke, VA, USA
- Economics Experimental Lab, Nanjing Audit University, Nanjing, China
| | - Monica Y C Li
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
- Brain Imaging Research Center, University of Connecticut, Storrs, CT, USA
- Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, Storrs, CT, USA
- Haskins Laboratories, New Haven, CT, USA
| | - Phui Cheng Lim
- Department of Psychology, University of Nebraska-Lincoln, Lincoln, NE, USA
- Center for Brain, Biology and Behavior, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Evan N Lintz
- Department of Psychology, University of Nebraska-Lincoln, Lincoln, NE, USA
| | | | - Annabel B Losecaat Vermeer
- Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria
| | - Bradley C Love
- Department of Experimental Psychology, University College London, London, UK
- The Alan Turing Institute, London, UK
| | - Michael L Mack
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| | - Norberto Malpica
- Laboratorio de Análisis de Imagen Médica y Biometría (LAIMBIO), Universidad Rey Juan Carlos, Madrid, Spain
| | - Theo Marins
- D'Or Institute for Research and Education (IDOR), Rio de Janeiro, Brazil
| | - Camille Maumet
- Inria, Univ Rennes, CNRS, Inserm, IRISA UMR 6074, Empenn ERL U 1228, Rennes, France
| | - Kelsey McDonald
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Joseph T McGuire
- Department of Psychological and Brain Sciences, Boston University, Boston, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, MA, USA
| | - Helena Melero
- Laboratorio de Análisis de Imagen Médica y Biometría (LAIMBIO), Universidad Rey Juan Carlos, Madrid, Spain
- Departamento de Psicobiología, División de Psicología, CES Cardenal Cisneros, Madrid, Spain
- Northeastern University Biomedical Imaging Center, Northeastern University, Boston, MA, USA
| | - Adriana S Méndez Leal
- Department of Psychology, University of California Los Angeles, Los Angeles, CA, USA
| | - Benjamin Meyer
- Leibniz-Institut für Resilienzforschung (LIR), Mainz, Germany
- Neuroimaging Center (NIC), Focus Program Translational Neurosciences (FTN), Johannes Gutenberg University Medical Center Mainz, Mainz, Germany
| | - Kristin N Meyer
- Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Glad Mihai
- Max Planck Research Group: Neural Mechanisms of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
| | - Georgios D Mitsis
- Department of Bioengineering, McGill University, Montreal, Quebec, Canada
| | - Jorge Moll
- D'Or Institute for Research and Education (IDOR), Rio de Janeiro, Brazil
- Department of Psychology, Stanford University, Stanford, CA, USA
| | - Dylan M Nielson
- Data Science and Sharing Team, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - Gustav Nilsonne
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
- Department of Psychology, Stockholm University, Stockholm, Sweden
| | - Michael P Notter
- The Laboratory for Investigative Neurophysiology (The LINE), Department of Radiology, University Hospital Center and University of Lausanne, Lausanne, Switzerland
| | - Emanuele Olivetti
- Neuroinformatics Laboratory, Fondazione Bruno Kessler, Trento, Italy
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy
| | - Adrian I Onicas
- MoMiLab Research Unit, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Paolo Papale
- MoMiLab Research Unit, IMT School for Advanced Studies Lucca, Lucca, Italy
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, The Netherlands
| | - Kaustubh R Patil
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, MO, USA
| | - Alexandre Pérez
- McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Doris Pischedda
- Bernstein Center for Computational Neuroscience and Berlin Center for Advanced Neuroimaging and Clinic for Neurology, Charité Universitätsmedizin, corporate member of Freie Universität Berlin, Humboldt Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
- Cluster of Excellence Science of Intelligence, Technische Universität Berlin and Humboldt Universität zu Berlin, Berlin, Germany
- NeuroMI - Milan Center for Neuroscience, Milan, Italy
| | - Jean-Baptiste Poline
- McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, Montreal, Quebec, Canada
- Henry H. Wheeler, Jr. Brain Imaging Center, Helen Wills Neuroscience Institute, University of California Berkeley, Berkeley, CA, USA
| | - Yanina Prystauka
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
- Brain Imaging Research Center, University of Connecticut, Storrs, CT, USA
- Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, Storrs, CT, USA
| | - Shruti Ray
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, USA
| | | | - Richard C Reynolds
- Scientific and Statistical Computing Core, National Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - Emiliano Ricciardi
- MoMiLab Research Unit, IMT School for Advanced Studies Lucca, Lucca, Italy
| | - Jenny R Rieck
- Rotman Research Institute, Baycrest Health Sciences Centre, Toronto, Ontario, Canada
| | - Anais M Rodriguez-Thompson
- Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Anthony Romyn
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| | - Taylor Salo
- Department of Psychology, Florida International University, Miami, FL, USA
| | - Gregory R Samanez-Larkin
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Emilio Sanz-Morales
- Laboratorio de Análisis de Imagen Médica y Biometría (LAIMBIO), Universidad Rey Juan Carlos, Madrid, Spain
| | | | - Douglas H Schultz
- Department of Psychology, University of Nebraska-Lincoln, Lincoln, NE, USA
- Center for Brain, Biology and Behavior, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Qiang Shen
- School of Management, Zhejiang University of Technology, Hangzhou, China
- Institute of Neuromanagement, Zhejiang University of Technology, Hangzhou, China
| | - Margaret A Sheridan
- Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jennifer A Silvers
- Department of Psychology, University of California Los Angeles, Los Angeles, CA, USA
| | - Kenny Skagerlund
- Department of Behavioural Sciences and Learning, Linköping University, Linköping, Sweden
- Center for Social and Affective Neuroscience, Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden
| | - Alec Smith
- Department of Economics, Virginia Tech, Blacksburg, VA, USA
- School of Neuroscience, Virginia Tech, Blacksburg, VA, USA
| | - David V Smith
- Department of Psychology, Temple University, Philadelphia, PA, USA
| | | | - Simon R Steinkamp
- Institute of Neuroscience and Medicine, Cognitive Neuroscience (INM-3), Research Centre Jülich, Jülich, Germany
| | - Sarah M Tashjian
- Department of Psychology, University of California Los Angeles, Los Angeles, CA, USA
| | | | - John N Thorp
- Department of Psychology, Columbia University, New York, NY, USA
| | - Gustav Tinghög
- Department of Management and Engineering, Linköping University, Linköping, Sweden
- Department of Health, Medicine and Caring Sciences, Linköping University, Linköping, Sweden
| | - Loreen Tisdall
- Department of Psychology, Stanford University, Stanford, CA, USA
- Center for Cognitive and Decision Sciences, University of Basel, Basel, Switzerland
| | - Steven H Tompson
- US CCDC Army Research Laboratory, Human Research and Engineering Directorate, Aberdeen Proving Ground, MD, USA
| | - Claudio Toro-Serey
- Department of Psychological and Brain Sciences, Boston University, Boston, MA, USA
- Center for Systems Neuroscience, Boston University, Boston, MA, USA
| | | | - Leonardo Tozzi
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Vuong Truong
- Graduate Institute of Mind, Brain and Consciousness, Taipei Medical University, Taipei, Taiwan
- Brain and Consciousness Research Centre, TMU-ShuangHo Hospital, New Taipei City, Taiwan
| | - Luca Turella
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy
| | - Anna E van 't Veer
- Methodology and Statistics Unit, Institute of Psychology, Leiden University, Leiden, The Netherlands
| | - Tom Verguts
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Jean M Vettel
- US Combat Capabilities Development Command Army Research Laboratory, Aberdeen, MD, USA
- University of California Santa Barbara, Santa Barbara, CA, USA
- University of Pennsylvania, Philadelphia, PA, USA
| | - Sagana Vijayarajah
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| | - Khoi Vo
- Center for Cognitive Neuroscience, Duke University, Durham, NC, USA
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA
| | - Matthew B Wall
- Invicro, London, UK
- Faculty of Medicine, Imperial College London, London, UK
- Clinical Psychopharmacology Unit, University College London, London, UK
| | - Wouter D Weeda
- Methodology and Statistics Unit, Institute of Psychology, Leiden University, Leiden, The Netherlands
| | - Susanne Weis
- Institute of Neuroscience and Medicine, Brain and Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - David J White
- Centre for Human Psychopharmacology, Swinburne University, Hawthorn, Victoria, Australia
| | - David Wisniewski
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Alba Xifra-Porxas
- Graduate Program in Biological and Biomedical Engineering, McGill University, Montreal, Quebec, Canada
| | - Emily A Yearling
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, USA
- Brain Imaging Research Center, University of Connecticut, Storrs, CT, USA
- Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, Storrs, CT, USA
| | - Sangsuk Yoon
- Department of Management and Marketing, School of Business, University of Dayton, Dayton, OH, USA
| | - Rui Yuan
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
| | - Kenneth S L Yuen
- Leibniz-Institut für Resilienzforschung (LIR), Mainz, Germany
- Neuroimaging Center (NIC), Focus Program Translational Neurosciences (FTN), Johannes Gutenberg University Medical Center Mainz, Mainz, Germany
| | - Lei Zhang
- Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Vienna, Austria
| | - Xu Zhang
- Brain Imaging Research Center, University of Connecticut, Storrs, CT, USA
- Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, Storrs, CT, USA
- Biomedical Engineering Department, University of Connecticut, Storrs, CT, USA
| | - Joshua E Zosky
- Department of Psychology, University of Nebraska-Lincoln, Lincoln, NE, USA
- Center for Brain, Biology and Behavior, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Thomas E Nichols
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
| | | | - Tom Schonberg
- Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel.
- Department of Neurobiology, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
27
|
Elias D, Campaña H, Poletta F, Heisecke S, Gili J, Ratowiecki J, Gimenez L, Pawluk M, Santos MR, Cosentino V, Uranga R, Rittler M, Lopez Camelo J. A graph theory approach to analyze birth defect associations. PLoS One 2020; 15:e0233529. [PMID: 32442191 PMCID: PMC7244144 DOI: 10.1371/journal.pone.0233529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2019] [Accepted: 05/06/2020] [Indexed: 01/11/2023] Open
Abstract
Birth defects are prenatal morphological or functional anomalies. Associations among them are studied to identify their etiopathogenesis. The graph theory methods allow analyzing relationships among a complete set of anomalies. A graph consists of nodes which represent the entities (birth defects in the present work), and edges that join nodes indicating the relationships among them. The aim of the present study was to validate the graph theory methods to study birth defect associations. All birth defects monitoring records from the Estudio Colaborativo Latino Americano de Malformaciones Congénitas gathered between 1967 and 2017 were used. From around 5 million live and stillborn infants, 170,430 had one or more birth defects. Volume-adjusted Chi-Square was used to determine the association strength between two birth defects and to weight the graph edges. The complete birth defect graph showed a Log-Normal degree distribution and its characteristics differed from random, scale-free and small-world graphs. The graph comprised 118 nodes and 550 edges. Birth defects with the highest centrality values were nonspecific codes such as Other upper limb anomalies. After partition, the graph yielded 12 groups; most of them were recognizable and included conditions such as VATER and OEIS associations, and Patau syndrome. Our findings validate the graph theory methods to study birth defect associations. This method may contribute to identify underlying etiopathogeneses as well as to improve coding systems.
Collapse
Affiliation(s)
- Dario Elias
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
| | - Hebe Campaña
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
- Comisión de Investigaciones Científicas, Buenos Aires, Argentina
| | - Fernando Poletta
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
- Instituto Nacional de Genética Médica Populacional, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
| | - Silvina Heisecke
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| | - Juan Gili
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
- Instituto Académico Pedagógico de Ciencias Humanas, Universidad Nacional de Villa María, Córdoba, Argentina
| | - Julia Ratowiecki
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
| | - Lucas Gimenez
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
- Instituto Nacional de Genética Médica Populacional, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
| | - Mariela Pawluk
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
| | - Maria Rita Santos
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
- Comisión de Investigaciones Científicas, Buenos Aires, Argentina
- Instituto Multidisciplinario de Biología Celular, Buenos Aires, Argentina
| | - Viviana Cosentino
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
| | - Rocio Uranga
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
- Hospital San Juan de Dios, Buenos Aires, Argentina
| | - Monica Rittler
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
- Hospital Materno Infantil Ramón Sarda, Buenos Aires, Argentina
| | - Jorge Lopez Camelo
- Laboratorio de Epidemiología Genética, Centro de Educación Médica e Investigaciones Clínicas-Consejo Nacional de Investigaciones Científicas y Técnicas (CEMIC-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Estudio Colaborativo Latino Americano de Malformaciones Congénitas, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
- Instituto Nacional de Genética Médica Populacional, CEMIC-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
- * E-mail:
| |
Collapse
|
28
|
Abstract
With increasing demand for training in data science, extracurricular or "ad hoc" education efforts have emerged to help individuals acquire relevant skills and expertise. Although extracurricular efforts already exist for many computationally intensive disciplines, their support of data science education has significantly helped in coping with the speed of innovation in data science practice and formal curricula. While the proliferation of ad hoc efforts is an indication of their popularity, less has been documented about the needs that they are designed to meet, the limitations that they face, and practical suggestions for holding successful efforts. To holistically understand the role of different ad hoc formats for data science, we surveyed organizers of ad hoc data science education efforts to understand how organizers perceived the events to have gone-including areas of strength and areas requiring growth. We also gathered recommendations from these past events for future organizers. Our results suggest that the perceived benefits of ad hoc efforts go beyond developing technical skills and may provide continued benefit in conjunction with formal curricula, which warrants further investigation. As increasing numbers of researchers from computational fields with a history of complex data become involved with ad hoc efforts to share their skills, the lessons learned that we extract from the surveys will provide concrete suggestions for the practitioner-leaders interested in creating, improving, and sustaining future efforts.
Collapse
Affiliation(s)
- Orianna DeMasi
- Department of Computer Science, University of California, Davis, California, United States of America
| | - Alexandra Paxton
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- Center for the Ecological Study of Perception and Action, University of Connecticut, Storrs, Connecticut, United States of America
| | - Kevin Koy
- IDEO, San Francisco, California, United States of America
| |
Collapse
|
29
|
McDonough CW, Breitenstein MK, Shahin M, Empey PE, Freimuth RR, Li L, Liebman M, Tuteja S. Translational Informatics Connects Real-World Information to Knowledge in an Increasingly Data-Driven World. Clin Pharmacol Ther 2020; 107:738-741. [PMID: 31837229 PMCID: PMC7678684 DOI: 10.1002/cpt.1719] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 11/01/2019] [Indexed: 11/07/2022]
Affiliation(s)
| | | | | | | | | | - Lang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH
| | | | - Sony Tuteja
- University of Pennsylvania Perelman School of Medicine, Philadelphia, PA
| |
Collapse
|
30
|
Abstract
Background. Accurate diagnosis of patients' preferences is central to shared decision making. Missing from clinical practice is an approach that links pretreatment preferences and patient-reported outcomes. Objective. We propose a Bayesian collaborative filtering (CF) algorithm that combines pretreatment preferences and patient-reported outcomes to provide treatment recommendations. Design. We present the methodological details of a Bayesian CF algorithm designed to accomplish 3 tasks: 1) eliciting patient preferences using conjoint analysis surveys, 2) clustering patients into preference phenotypes, and 3) making treatment recommendations based on the posttreatment satisfaction of like-minded patients. We conduct a series of simulation studies to test the algorithm and to compare it to a 2-stage approach. Results. The Bayesian CF algorithm and 2-stage approaches performed similarly when there was extensive overlap between preference phenotypes. When the treatment was moderately associated with satisfaction, both methods made accurate recommendations. The kappa estimates measuring agreement between the true and predicted recommendations were 0.70 (95% confidence interval = 0.052-0.88) and 0.73 (0.56-0.90) under the Bayesian CF and 2-stage approaches, respectively. The 2-stage approach failed to converge in settings in which clusters were well separated, whereas the Bayesian CF algorithm produced acceptable results, with kappas of 0.73 (0.56-0.90) and 0.83 (0.69-0.97) for scenarios with moderate and large treatment effects, respectively. Limitations. Our approach assumes that the patient population is composed of distinct preference phenotypes, there is association between treatment and outcomes, and treatment effects vary across phenotypes. Findings are also limited to simulated data. Conclusion. The Bayesian CF algorithm is feasible, provides accurate cluster treatment recommendations, and outperforms 2-stage estimation when clusters are well separated. As such, the approach serves as a roadmap for incorporating predictive analytics into shared decision making.
Collapse
Affiliation(s)
- Azza Shaoibi
- Epidemiology Analytics, Janssen Research and Development, Titusville, NJ, USA
| | - Brian Neelon
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Leslie A Lenert
- Epidemiology Analytics, Janssen Research and Development, Titusville, NJ, USA
- Department of Medicine, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
31
|
Xu H, Li J, Jiang X, Chen Q. Electronic Health Records for Drug Repurposing: Current Status, Challenges, and Future Directions. Clin Pharmacol Ther 2020; 107:712-714. [PMID: 32012237 PMCID: PMC10815929 DOI: 10.1002/cpt.1769] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 01/06/2020] [Indexed: 12/20/2022]
Abstract
It is well recognized that the global pharmaceutical industry now faces challenges such as high costs and low productivity when developing new drugs (e.g., it is estimated that the average cost for developing a new drug ranges from US $2 billion to $3 billion with the total time to bring it to the market being about 13–15 years).1 Therefore, drug repurposing (also called drug repositioning/reprofiling), which finds new indications for existing drugs, has received great attention in the past decade. Drug repurposing can reduce drug development time, while improving success rates because the toxicity profiles of existing drugs are already known. Studies have shown that new applications for repurposed drugs have nearly a 30% success rate for US Food and Drug Administration (FDA) approval, whereas traditional new drug applications have < 10% approval rate.
Collapse
Affiliation(s)
- Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Jianfu Li
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Qingxia Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
32
|
Polasek TM, Rostami-Hodjegan A. Virtual Twins: Understanding the Data Required for Model-Informed Precision Dosing. Clin Pharmacol Ther 2020; 107:742-745. [PMID: 32056199 DOI: 10.1002/cpt.1778] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 01/13/2020] [Indexed: 12/16/2022]
Affiliation(s)
- Thomas M Polasek
- Certara, Princeton, New Jersey, USA
- Department of Clinical Pharmacology, Royal Adelaide Hospital, Adelaide, Australia
- Centre for Medicines Use and Safety, Monash University, Melbourne, Australia
| | - Amin Rostami-Hodjegan
- Certara, Princeton, New Jersey, USA
- Centre for Applied Pharmacokinetic Research, University of Manchester, Manchester, UK
| |
Collapse
|
33
|
Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21:30. [PMID: 32033565 PMCID: PMC7006217 DOI: 10.1186/s13059-020-1935-5] [Citation(s) in RCA: 668] [Impact Index Per Article: 167.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 01/15/2020] [Indexed: 12/11/2022] Open
Abstract
Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
Collapse
Affiliation(s)
- Shanika L. Amarasinghe
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Luke Zappia
- Bioinformatics, Murdoch Children’s Research Institute, Parkville, 3052 Australia
- School of Biosciences, Faculty of Science, The University of Melbourne, Parkville, 3010 Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
- School of Mathematics and StatisticsThe University of Melbourne, Parkville, 3010 Australia
| | - Quentin Gouil
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| |
Collapse
|
34
|
Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020; 21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 534] [Impact Index Per Article: 133.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open
Abstract
The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Collapse
Affiliation(s)
- David Lähnemann
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
| | - Ewa Szczurek
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Davis J. McCarthy
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia
- Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
| | - Catalina A. Vallejos
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
- The Alan Turing Institute, British Library, London, UK
| | - Kieran R. Campbell
- Department of Statistics, University of British Columbia, Vancouver, Canada
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Data Science Institute, University of British Columbia, Vancouver, Canada
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmed Mahfouz
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, USA
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Jasmijn Baaijens
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
| | - Marleen Balvert
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Buys de Barbanson
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Antonio Cappuccio
- Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
| | - Giacomo Corleone
- Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
| | - Bas E. Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria Florescu
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Rens Holmer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Thamar Jessurun Lobo
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Emma M. Keizer
- Biometris, Wageningen University & Research, Wageningen, The Netherlands
| | - Indu Khatri
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Szymon M. Kielbasa
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Alexey M. Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Boudewijn P.F. Lelieveldt
- PRB lab, Delft University of Technology, Delft, The Netherlands
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ion I. Mandoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, USA
| | - John C. Marioni
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Felix Mölder
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Amir Niknejad
- Computation molecular design, Zuse Institute Berlin, Berlin, Germany
- Mathematics Department, Mount Saint Vincent, New York, USA
| | - Alicja Rączkowska
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Marcel Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
| | - Antonios Somarakis
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Oliver Stegle
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
| | - Huan Yang
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Alice C. McHardy
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Sohrab P. Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
35
|
Abstract
The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior knowledge of non-differential transcripts is crucial to the normalization process. Some methods have successfully overcome this problem by the assumption that most transcripts are not differentially expressed. However, when RNA-seq profiles become more abundant and heterogeneous, this assumption fails to hold, leading to erroneous normalization. We present a normalization procedure that does not rely on this assumption, nor prior knowledge about the reference transcripts. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our synthesized validation data showed that it could recover the reference transcripts with high precision, thus resulting in high-quality normalization. On a realistic data set from the ENCODE project, this algorithm gave good results and could finish in a reasonable time. These preliminary results imply that we may be able to break the long persisting circularity problem in RNA-seq normalization.
Collapse
Affiliation(s)
- Diem-Trang Tran
- School of Computing, University of Utah, Salt Lake City, Utah, United States of America
- * E-mail:
| | - Aditya Bhaskara
- School of Computing, University of Utah, Salt Lake City, Utah, United States of America
| | - Balagurunathan Kuberan
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, Utah, United States of America
- Department of Biology, University of Utah, Salt Lake City, Utah, United States of America
| | - Matthew Might
- Hugh Kaul Precision Medicine Institute, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| |
Collapse
|
36
|
Pittard WS, Villaveces CK, Li S. A Bioinformatics Primer to Data Science, with Examples for Metabolomics. Methods Mol Biol 2020; 2104:245-263. [PMID: 31953822 DOI: 10.1007/978-1-0716-0239-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
With the increasing importance of big data in biomedicine, skills in data science are a foundation for the individual career development and for the progress of science. This chapter is a practical guide to working with high-throughput biomedical data. It covers how to understand and set up the computing environment, to start a research project with proper and effective data management, and to perform common bioinformatics tasks such as data wrangling, quality control, statistical analysis, and visualization, with examples on metabolomics data. Concepts and tools related to coding and scripting are discussed. Version control, knitr and Jupyter notebooks are important to project management, collaboration, and research reproducibility. Overall, this chapter describes a core set of skills to work in bioinformatics, and can serve as a reference text at the level of a graduate course and interfacing with data science.
Collapse
Affiliation(s)
- W Stephen Pittard
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | | | - Shuzhao Li
- Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA.
| |
Collapse
|
37
|
|
38
|
Olatosi B, Zhang J, Weissman S, Hu J, Haider MR, Li X. Using big data analytics to improve HIV medical care utilisation in South Carolina: A study protocol. BMJ Open 2019; 9:e027688. [PMID: 31326931 PMCID: PMC6661700 DOI: 10.1136/bmjopen-2018-027688] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 03/28/2019] [Accepted: 06/04/2019] [Indexed: 12/23/2022] Open
Abstract
INTRODUCTION Linkage and retention in HIV medical care remains problematic in the USA. Extensive health utilisation data collection through electronic health records (EHR) and claims data represent new opportunities for scientific discovery. Big data science (BDS) is a powerful tool for investigating HIV care utilisation patterns. The South Carolina (SC) office of Revenue and Fiscal Affairs (RFA) data warehouse captures individual-level longitudinal health utilisation data for persons living with HIV (PLWH). The data warehouse includes EHR, claims and data from private institutions, housing, prisons, mental health, Medicare, Medicaid, State Health Plan and the department of health and human services. The purpose of this study is to describe the process for creating a comprehensive database of all SC PLWH, and plans for using BDS to explore, identify, characterise and explain new predictors of missed opportunities for HIV medical care utilisation. METHODS AND ANALYSIS This project will create person-level profiles guided by the Gelberg-Andersen Behavioral Model and describe new patterns of HIV care utilisation. The population for the comprehensive database comes from statewide HIV surveillance data (2005-2016) for all SC PLWH (N≈18000). Surveillance data are available from the state health department's enhanced HIV/AIDS Reporting System (e-HARS). Additional data pulls for the e-HARS population will include Ryan White HIV/AIDS Program Service Reports, Health Sciences SC data and Area Health Resource Files. These data will be linked to the RFA data and serve as sources for traditional and vulnerable domain Gelberg-Anderson Behavioral Model variables. The project will use BDS techniques such as machine learning to identify new predictors of HIV care utilisation behaviour among PLWH, and 'missed opportunities' for re-engaging them back into care. ETHICS AND DISSEMINATION The study team applied for data from different sources and submitted individual Institutional Review Board (IRB) applications to the University of South Carolina (USC) IRB and other local authorities/agencies/state departments. This study was approved by the USC IRB (#Pro00068124) in 2017. To protect the identity of the persons living with HIV (PLWH), researchers will only receive linked deidentified data from the RFA. Study findings will be disseminated at local community forums, community advisory group meetings, meetings with our state agencies, local partners and other key stakeholders (including PLWH, policy-makers and healthcare providers), presentations at academic conferences and through publication in peer-reviewed articles. Data security and patient confidentiality are the bedrock of this study. Extensive data agreements ensuring data security and patient confidentiality for the deidentified linked data have been established and are stringently adhered to. The RFA is authorised to collect and merge data from these different sources and to ensure the privacy of all PLWH. The legislatively mandated SC data oversight council reviewed the proposed process stringently before approving it. Researchers will get only the encrypted deidentified dataset to prevent any breach of privacy in the data transfer, management and analysis processes. In addition, established secure data governance rules, data encryption and encrypted predictive techniques will be deployed. In addition to the data anonymisation as a part of privacy-preserving analytics, encryption schemes that protect running prediction algorithms on encrypted data will also be deployed. Best practices and lessons learnt about the complex processes involved in negotiating and navigating multiple data sharing agreements between different entities are being documented for dissemination.
Collapse
Affiliation(s)
- Bankole Olatosi
- Health Services, Policy and Management, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina, USA
| | - Jiajia Zhang
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, South Carolina, USA
| | - Sharon Weissman
- Internal Medicine, School of Medicine, University of South Carolina, Columbia, South Carolina, USA
| | - Jianjun Hu
- Department of Computer Science & Engineering, College of Engineering, University of South Carolina, Columbia, South Carolina, USA
| | - Mohammad Rifat Haider
- Department of Health Promotion, Education & Behavior, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina, USA
| | - Xiaoming Li
- Health Promotion Education and Behavior, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
39
|
Grzegorczyk M, Aderhold A, Husmeier D. Overview and Evaluation of Recent Methods for Statistical Inference of Gene Regulatory Networks from Time Series Data. Methods Mol Biol 2019; 1883:49-94. [PMID: 30547396 DOI: 10.1007/978-1-4939-8882-2_3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
A challenging problem in systems biology is the reconstruction of gene regulatory networks from postgenomic data. A variety of reverse engineering methods from machine learning and computational statistics have been proposed in the literature. However, deciding on the best method to adopt for a particular application or data set might be a confusing task. The present chapter provides a broad overview of state-of-the-art methods with an emphasis on conceptual understanding rather than a deluge of mathematical details, and the pros and cons of the various approaches are discussed. Guidance on practical applications with pointers to publicly available software implementations are included. The chapter concludes with a comprehensive comparative benchmark study on simulated data and a real-work application taken from the current plant systems biology.
Collapse
Affiliation(s)
- Marco Grzegorczyk
- Johann Bernoulli Institute, University of Groningen, Groningen, The Netherlands
| | - Andrej Aderhold
- Center for Computer Science, Universidade Federal do Rio Grande, Rio Grande, Brazil
| | - Dirk Husmeier
- School of Mathematics and Statistics, University of Glasgow, Glasgow, UK.
| |
Collapse
|
40
|
Kampe C, Reid G, Jones P, S C, S S, Vogel KM. Bringing the National Security Agency into the Classroom: Ethical Reflections on Academia-Intelligence Agency Partnerships. Sci Eng Ethics 2019; 25:869-898. [PMID: 29318451 DOI: 10.1007/s11948-017-9938-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/21/2017] [Indexed: 06/07/2023]
Abstract
Academia-intelligence agency collaborations are on the rise for a variety of reasons. These can take many forms, one of which is in the classroom, using students to stand in for intelligence analysts. Classrooms, however, are ethically complex spaces, with students considered vulnerable populations, and become even more complex when layering multiple goals, activities, tools, and stakeholders over those traditionally present. This does not necessarily mean one must shy away from academia-intelligence agency partnerships in classrooms, but that these must be conducted carefully and reflexively. This paper hopes to contribute to this conversation by describing one purposeful classroom encounter that occurred between a professor, students, and intelligence practitioners in the fall of 2015 at North Carolina State University: an experiment conducted as part of a graduate-level political science class that involved students working with a prototype analytic technology, a type of participatory sensing/self-tracking device, developed by the National Security Agency. This experiment opened up the following questions that this paper will explore: What social, ethical, and pedagogical considerations arise with the deployment of a prototype intelligence technology in the college classroom, and how can they be addressed? How can academia-intelligence agency collaboration in the classroom be conducted in ways that provide benefits to all parties, while minimizing disruptions and negative consequences? This paper will discuss the experimental findings in the context of ethical perspectives involved in values in design and participatory/self-tracking data practices, and discuss lessons learned for the ethics of future academia-intelligence agency partnerships in the classroom.
Collapse
Affiliation(s)
- Christopher Kampe
- Communication Rhetoric and Digital Media Program, North Carolina State University, Raleigh, NC, 27695, USA
| | - Gwendolynne Reid
- Oxford College of Emory University, 810 Whatcoat Street, Oxford, GA, 30054, USA
| | - Paul Jones
- Laboratory for Analytic Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - Colleen S
- Laboratory for Analytic Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - Sean S
- Laboratory for Analytic Sciences, North Carolina State University, Raleigh, NC, 27695, USA
| | - Kathleen M Vogel
- School of Public Policy, University of Maryland, College Park, 3139 Van Munching Hall, College Park, MD, 20742, USA.
| |
Collapse
|
41
|
Musa A, Tripathi S, Dehmer M, Yli-Harja O, Kauffman SA, Emmert-Streib F. Systems Pharmacogenomic Landscape of Drug Similarities from LINCS data: Drug Association Networks. Sci Rep 2019; 9:7849. [PMID: 31127155 PMCID: PMC6534546 DOI: 10.1038/s41598-019-44291-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 05/08/2019] [Indexed: 02/01/2023] Open
Abstract
Modern research in the biomedical sciences is data-driven utilizing high-throughput technologies to generate big genomic data. The Library of Integrated Network-based Cellular Signatures (LINCS) is an example for a large-scale genomic data repository providing hundred thousands of high-dimensional gene expression measurements for thousands of drugs and dozens of cell lines. However, the remaining challenge is how to use these data effectively for pharmacogenomics. In this paper, we use LINCS data to construct drug association networks (DANs) representing the relationships between drugs. By using the Anatomical Therapeutic Chemical (ATC) classification of drugs we demonstrate that the DANs represent a systems pharmacogenomic landscape of drugs summarizing the entire LINCS repository on a genomic scale meaningfully. Here we identify the modules of the DANs as therapeutic attractors of the ATC drug classes.
Collapse
Affiliation(s)
- Aliyu Musa
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
- Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
| | - Shailesh Tripathi
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
- Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Wehrgrabengasse 1-3, 4400, Steyr, Austria
| | - Matthias Dehmer
- Department for Biomedical Computer Science and Mechatronics, UMIT - The Health and Lifesciences University, Eduard Wallnoefer Zentrum 1, 6060, Hall in Tyrol, Austria
- College of Computer and Control Engineering, Nankai University, Tianjin, 300350, P.R. China
- Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Wehrgrabengasse 1-3, 4400, Steyr, Austria
| | - Olli Yli-Harja
- Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland
- Computational Systems Biology Lab, Tampere University of Technology, Korkeakoulunkatu 10, 33720, Tampere, Finland
- Institute for Systems Biology, Seattle, WA, 98109, USA
| | | | - Frank Emmert-Streib
- Predictive Society and Data Analytics Lab, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.
- Institute of Biosciences and Medical Technology, Tampere University, Tampere, Korkeakoulunkatu 10, 33720, Tampere, Finland.
| |
Collapse
|
42
|
Rivière E, Quinton A, Dehail P. [Analysis of the discrimination of the final marks after the first computerized national ranking exam in Medicine in June 2016 in France]. Rev Med Interne 2019; 40:286-290. [PMID: 30902508 DOI: 10.1016/j.revmed.2018.10.386] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 10/07/2018] [Accepted: 10/18/2018] [Indexed: 11/18/2022]
Abstract
INTRODUCTION The first computerised national ranking exam (cNRE) in Medicine was introduced in June 2016 for 8214 students. It was made of 18 progressive clinical cases (PCCs) with multiple choice questions (MCQs), 120 independent MCQs and 2 scientific articles to criticize. A lack of mark discrimination grounded the cNRE reform. We aimed to assess the discrimination of the final marks after this first cNRE. METHODS A national Excel® file gathering overall statistics and marks were transmitted to the medical faculties after the cNRE. The mean points deviation between two papers and the percentage of points ranking 75% of students allowed us to analyse marks' discrimination. RESULTS The national distribution sigmoid curve of the marks is superimposable with previous NRE in 2015. In PCCs, 72% of students were ranked in 1090 points out of 7560 (14%). In independents MCQs, 73% of students were ranked in 434 points out of 2160 (20%). In critical analysis of articles, 75% of students were ranked in 225 points out of 1080 (21%). The above percentages of students are on the plateau of each discrimination curve for PCCs, independent MCQs and critical analysis of scientific articles. CONCLUSION The cNRE reduced equally-ranked students compared to 2015, with a mean deviation between two papers of 0.28 in 2016 vs 0.04 in 2015. Despite the new format introduced by the cNRE, 75% of students are still ranked in a low proportion of points that is equivalent to previous NRE in 2015 (between 15 et 20% of points).
Collapse
Affiliation(s)
- E Rivière
- Service de médecine interne et maladies infectieuses, hôpital Haut-Lévêque, CHU de Bordeaux, 33600 Pessac, France; Centre de recherche appliquée aux méthodes éducatives (CRAME), université de Bordeaux, 33000 Bordeaux, France; Conférences de préparation aux ECN, université de Bordeaux, 33000 Bordeaux, France; UFR des sciences médicales, université de Bordeaux, 33000 Bordeaux, France.
| | - A Quinton
- Centre de recherche appliquée aux méthodes éducatives (CRAME), université de Bordeaux, 33000 Bordeaux, France; UFR des sciences médicales, université de Bordeaux, 33000 Bordeaux, France
| | - P Dehail
- Conférences de préparation aux ECN, université de Bordeaux, 33000 Bordeaux, France; UFR des sciences médicales, université de Bordeaux, 33000 Bordeaux, France; Service de rééducation fonctionnelle, hôpital Pellegrin, CHU de Bordeaux, 33000 Bordeaux, France
| |
Collapse
|
43
|
Alghamdi SM, Sundberg BA, Sundberg JP, Schofield PN, Hoehndorf R. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies. Sci Rep 2019; 9:4025. [PMID: 30858527 PMCID: PMC6411989 DOI: 10.1038/s41598-019-40368-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 02/14/2019] [Indexed: 12/28/2022] Open
Abstract
Data are increasingly annotated with multiple ontologies to capture rich information about the features of the subject under investigation. Analysis may be performed over each ontology separately, but recently there has been a move to combine multiple ontologies to provide more powerful analytical possibilities. However, it is often not clear how to combine ontologies or how to assess or evaluate the potential design patterns available. Here we use a large and well-characterized dataset of anatomic pathology descriptions from a major study of aging mice. We show how different design patterns based on the MPATH and MA ontologies provide orthogonal axes of analysis, and perform differently in over-representation and semantic similarity applications. We discuss how such a data-driven approach might be used generally to generate and evaluate ontology design patterns.
Collapse
Affiliation(s)
- Sarah M Alghamdi
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia
- King Abdul-Aziz University, Faculty of Computing and Information Technology, Rabigh, 25732, Saudi Arabia
| | - Beth A Sundberg
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA
| | - John P Sundberg
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA
| | - Paul N Schofield
- The Jackson Laboratory, 600, Main Street, Bar Harbor, ME, 04609, USA.
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, UK.
| | - Robert Hoehndorf
- King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
44
|
Abstract
Data science can be incorporated into every stage of a scientific study. Here we describe how data science can be used to generate hypotheses, to design experiments, to perform experiments, and to analyse data. We also present our vision for how data science techniques will be an integral part of the laboratory of the future.
Collapse
Affiliation(s)
- Daphne Ezer
- Alan Turing InstituteLondonUnited Kingdom
- Department of StatisticsUniversity of WarwickCoventryUnited Kingdom
| | - Kirstie Whitaker
- Alan Turing InstituteLondonUnited Kingdom
- Department of PsychiatryUniversity of CambridgeCambridgeUnited Kingdom
| |
Collapse
|
45
|
Huang H, Tang H, Huang J, Chen B, Liu R, Tang RS, Lu Y, Yang P. Special Issue: Selected Papers of the Inaugural DahShu Data Science Symposium: Computational Precision Health (CPH 2017). J Comput Biol 2019; 24:635-636. [PMID: 28657834 DOI: 10.1089/cmb.2017.29007.hh] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
46
|
Lapidus M. Not All Library Analytics are Created Equal: LibAnswers to the Rescue! Med Ref Serv Q 2019; 38:41-55. [PMID: 30942679 DOI: 10.1080/02763869.2019.1548892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Revised: 11/06/2018] [Accepted: 11/07/2018] [Indexed: 06/09/2023]
Abstract
The reasons for implementing and the advantages of switching to the Reference Analytics system, a part of the Springshare LibAnswers platform, for collecting reference statistics at a three-campus university library are described. The benefits of using this web-based product are highlighted based on the comparison with the previously used analytical tools and the annual statistical data. Transitioning to Reference Analytics allowed librarians to take advantage of such features, as seamless access to reference transactions, easy customization, cross-tabulation, and data visualization, proving beneficial for overall library reference services.
Collapse
Affiliation(s)
- Mariana Lapidus
- a Henrietta DeBenedictis Library, MCPHS University , Boston , Massachusetts , USA
| |
Collapse
|
47
|
Abstract
Gene regulatory networks are powerful abstractions of biological systems. Since the advent of high-throughput measurement technologies in biology in the late 1990s, reconstructing the structure of such networks has been a central computational problem in systems biology. While the problem is certainly not solved in its entirety, considerable progress has been made in the last two decades, with mature tools now available. This chapter aims to provide an introduction to the basic concepts underpinning network inference tools, attempting a categorization which highlights commonalities and relative strengths. While the chapter is meant to be self-contained, the material presented should provide a useful background to the later, more specialized chapters of this book.
Collapse
Affiliation(s)
- Vân Anh Huynh-Thu
- Department of Electrical Engineering and Computer Science, University of Liège, Liège, Belgium
| | | |
Collapse
|
48
|
Levin-Schwartz Y, Calhoun VD, Adalı T. A method to compare the discriminatory power of data-driven methods: Application to ICA and IVA. J Neurosci Methods 2019; 311:267-276. [PMID: 30389489 PMCID: PMC6258321 DOI: 10.1016/j.jneumeth.2018.10.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2018] [Revised: 08/24/2018] [Accepted: 10/08/2018] [Indexed: 11/20/2022]
Abstract
BACKGROUND The widespread application of data-driven factorization-based methods, such as independent component analysis (ICA), to functional magnetic resonance imaging data facilitates the study of neural function and how it is disrupted by psychiatric disorders such as schizophrenia. While the increasing number of these methods motivates a comparison of their relative performance, such a comparison is difficult to perform on real fMRI data, since the ground truth is, relatively, unknown and the alignment of factors across different methods is impractical and imprecise. NEW METHOD We present a novel method, global difference maps (GDMs), to compare the results of different fMRI analysis techniques on real fMRI data, quantify their relative performances, and highlight the differences between the decompositions visually. COMPARISON WITH EXISTING METHODS We apply this method to compare the performances of two different factorization-based methods, ICA and its multiset extension independent vector analysis (IVA), for the analysis of fMRI data from 109 patients with schizophrenia and 138 healthy controls during the performance of three tasks. RESULTS Through this application of GDMs, we find that IVA can determine regions that are more discriminatory between patients and controls than ICA, though IVA is less effective at emphasizing regions found in only a subset of the tasks. CONCLUSIONS These results demonstrate that GDMs are an effective way to compare the performances of different factorization-based methods as well as regression-based analyses.
Collapse
Affiliation(s)
- Yuri Levin-Schwartz
- Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD 21250, United States.
| | - Vince D Calhoun
- The Mind Research Network, Albuquerque, NM 87106, United States; Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, United States
| | - Tülay Adalı
- Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD 21250, United States
| |
Collapse
|
49
|
Stevens SLR, Kuzak M, Martinez C, Moser A, Bleeker P, Galland M. Building a local community of practice in scientific programming for life scientists. PLoS Biol 2018; 16:e2005561. [PMID: 30485260 PMCID: PMC6287879 DOI: 10.1371/journal.pbio.2005561] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/10/2018] [Indexed: 11/18/2022] Open
Abstract
In this paper, we describe why and how to build a local community of practice in scientific programming for life scientists who use computers and programming in their research. A community of practice is a small group of scientists who meet regularly to help each other and promote good practices in scientific programming. While most life scientists are well trained in the laboratory to conduct experiments, good practices with (big) data sets and their analysis are often missing. We propose a model on how to build such a community of practice at a local academic institution, present two real-life examples, and introduce challenges and implemented solutions. We believe that the current data deluge that life scientists face can benefit from the implementation of these small communities. Good practices spread among experimental scientists will foster open, transparent, and sound scientific results beneficial to society.
Collapse
Affiliation(s)
- Sarah L. R. Stevens
- Department of Bacteriology, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Mateusz Kuzak
- Dutch Techcentre for Life Sciences, Utrecht, Netherlands
| | | | - Aurelia Moser
- Mozilla Foundation, Mountain View, California, United States of America
| | - Petra Bleeker
- Department of Plant Physiology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| | - Marc Galland
- Department of Plant Physiology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| |
Collapse
|
50
|
Abstract
We develop a number of data-driven investment strategies that demonstrate how machine learning and data analytics can be used to guide investments in peer-to-peer loans. We detail the process starting with the acquisition of (real) data from a peer-to-peer lending platform all the way to the development and evaluation of investment strategies based on a variety of approaches. We focus heavily on how to apply and evaluate the data science methods, and resulting strategies, in a real-world business setting. The material presented in this article can be used by instructors who teach data science courses, at the undergraduate or graduate levels. Importantly, we go beyond just evaluating predictive performance of models, to assess how well the strategies would actually perform, using real, publicly available data. Our treatment is comprehensive and ranges from qualitative to technical, but is also modular-which gives instructors the flexibility to focus on specific parts of the case, depending on the topics they want to cover. The learning concepts include the following: data cleaning and ingestion, classification/probability estimation modeling, regression modeling, analytical engineering, calibration curves, data leakage, evaluation of model performance, basic portfolio optimization, evaluation of investment strategies, and using Python for data science.
Collapse
Affiliation(s)
- Maxime C. Cohen
- Information, Operations, and Management Sciences, NYU Stern School of Business, New York, New York
| | - C. Daniel Guetta
- Decision, Risk, and Operations Division, Columbia Business School, New York, New York
| | - Kevin Jiao
- Information, Operations, and Management Sciences, NYU Stern School of Business, New York, New York
| | - Foster Provost
- Information, Operations, and Management Sciences, NYU Stern School of Business, New York, New York
| |
Collapse
|