Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

54
(from Reference Citation Analysis)

Article PDFs (23)

Cited by > 0 (31)

Searched Name

Data Science/methods

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Number	Citation Analysis
26	Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, Kirchler M, Iwanir R, Mumford JA, Adcock RA, Avesani P, Baczkowski BM, Bajracharya A, Bakst L, Ball S, Barilari M, Bault N, Beaton D, Beitner J, Benoit RG, Berkers RMWJ, Bhanji JP, Biswal BB, Bobadilla-Suarez S, Bortolini T, Bottenhorn KL, Bowring A, Braem S, Brooks HR, Brudner EG, Calderon CB, Camilleri JA, Castrellon JJ, Cecchetti L, Cieslik EC, Cole ZJ, Collignon O, Cox RW, Cunningham WA, Czoschke S, Dadi K, Davis CP, Luca AD, Delgado MR, Demetriou L, Dennison JB, Di X, Dickie EW, Dobryakova E, Donnat CL, Dukart J, Duncan NW, Durnez J, Eed A, Eickhoff SB, Erhart A, Fontanesi L, Fricke GM, Fu S, Galván A, Gau R, Genon S, Glatard T, Glerean E, Goeman JJ, Golowin SAE, González-García C, Gorgolewski KJ, Grady CL, Green MA, Guassi Moreira JF, Guest O, Hakimi S, Hamilton JP, Hancock R, Handjaras G, Harry BB, Hawco C, Herholz P, Herman G, Heunis S, Hoffstaedter F, Hogeveen J, Holmes S, Hu CP, Huettel SA, Hughes ME, Iacovella V, Iordan AD, Isager PM, Isik AI, Jahn A, Johnson MR, Johnstone T, Joseph MJE, Juliano AC, Kable JW, Kassinopoulos M, Koba C, Kong XZ, Koscik TR, Kucukboyaci NE, Kuhl BA, Kupek S, Laird AR, Lamm C, Langner R, Lauharatanahirun N, Lee H, Lee S, Leemans A, Leo A, Lesage E, Li F, Li MYC, Lim PC, Lintz EN, Liphardt SW, Losecaat Vermeer AB, Love BC, Mack ML, Malpica N, Marins T, Maumet C, McDonald K, McGuire JT, Melero H, Méndez Leal AS, Meyer B, Meyer KN, Mihai G, Mitsis GD, Moll J, Nielson DM, Nilsonne G, Notter MP, Olivetti E, Onicas AI, Papale P, Patil KR, Peelle JE, Pérez A, Pischedda D, Poline JB, Prystauka Y, Ray S, Reuter-Lorenz PA, Reynolds RC, Ricciardi E, Rieck JR, Rodriguez-Thompson AM, Romyn A, Salo T, Samanez-Larkin GR, Sanz-Morales E, Schlichting ML, Schultz DH, Shen Q, Sheridan MA, Silvers JA, Skagerlund K, Smith A, Smith DV, Sokol-Hessner P, Steinkamp SR, Tashjian SM, Thirion B, Thorp JN, Tinghög G, Tisdall L, Tompson SH, Toro-Serey C, Torre Tresols JJ, Tozzi L, Truong V, Turella L, van 't Veer AE, Verguts T, Vettel JM, Vijayarajah S, Vo K, Wall MB, Weeda WD, Weis S, White DJ, Wisniewski D, Xifra-Porxas A, Yearling EA, Yoon S, Yuan R, Yuen KSL, Zhang L, Zhang X, Zosky JE, Nichols TE, Poldrack RA, Schonberg T. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 2020;582:84-88. [PMID: 32483374 PMCID: PMC7771346 DOI: 10.1038/s41586-020-2314-9] [Citation(s) in RCA: 439] [Impact Index Per Article: 109.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 04/07/2020] [Indexed: 01/13/2023] Abstract Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses1. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset2-5. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed. Collapse Key Words Collapse MESH Headings Female Humans Male Brain/diagnostic imaging Brain/physiology Data Analysis Data Science/methods Data Science/standards Datasets as Topic/statistics & numerical data Functional Neuroimaging Logistic Models Magnetic Resonance Imaging Meta-Analysis as Topic Models, Neurological Reproducibility of Results Research Personnel/organization & administration Research Personnel/standards Software Collapse Grants R24 MH117179 NIMH NIH HHS UL1 TR001863 NCATS NIH HHS P20 GM109089 NIGMS NIH HHS R01 MH083320 NIMH NIH HHS R01 DA041353 NIDA NIH HHS 100309/Z/12/Z Wellcome Trust P41 EB019936 NIBIB NIH HHS R01 MH096906 NIMH NIH HHS Collapse
27	Elias D, Campaña H, Poletta F, Heisecke S, Gili J, Ratowiecki J, Gimenez L, Pawluk M, Santos MR, Cosentino V, Uranga R, Rittler M, Lopez Camelo J. A graph theory approach to analyze birth defect associations. PLoS One 2020;15:e0233529. [PMID: 32442191 PMCID: PMC7244144 DOI: 10.1371/journal.pone.0233529] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2019] [Accepted: 05/06/2020] [Indexed: 01/11/2023] Open Abstract Birth defects are prenatal morphological or functional anomalies. Associations among them are studied to identify their etiopathogenesis. The graph theory methods allow analyzing relationships among a complete set of anomalies. A graph consists of nodes which represent the entities (birth defects in the present work), and edges that join nodes indicating the relationships among them. The aim of the present study was to validate the graph theory methods to study birth defect associations. All birth defects monitoring records from the Estudio Colaborativo Latino Americano de Malformaciones Congénitas gathered between 1967 and 2017 were used. From around 5 million live and stillborn infants, 170,430 had one or more birth defects. Volume-adjusted Chi-Square was used to determine the association strength between two birth defects and to weight the graph edges. The complete birth defect graph showed a Log-Normal degree distribution and its characteristics differed from random, scale-free and small-world graphs. The graph comprised 118 nodes and 550 edges. Birth defects with the highest centrality values were nonspecific codes such as Other upper limb anomalies. After partition, the graph yielded 12 groups; most of them were recognizable and included conditions such as VATER and OEIS associations, and Patau syndrome. Our findings validate the graph theory methods to study birth defect associations. This method may contribute to identify underlying etiopathogeneses as well as to improve coding systems. Collapse Key Words Collapse MESH Headings Congenital Abnormalities/epidemiology Data Science/methods Databases, Factual Humans Infant, Newborn Statistical Distributions Collapse Grants Agencia Nacional de Promoción Científica y Tecnológica Collapse
28	DeMasi O, Paxton A, Koy K. Ad hoc efforts for advancing data science education. PLoS Comput Biol 2020;16:e1007695. [PMID: 32379822 PMCID: PMC7205211 DOI: 10.1371/journal.pcbi.1007695] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open Abstract With increasing demand for training in data science, extracurricular or "ad hoc" education efforts have emerged to help individuals acquire relevant skills and expertise. Although extracurricular efforts already exist for many computationally intensive disciplines, their support of data science education has significantly helped in coping with the speed of innovation in data science practice and formal curricula. While the proliferation of ad hoc efforts is an indication of their popularity, less has been documented about the needs that they are designed to meet, the limitations that they face, and practical suggestions for holding successful efforts. To holistically understand the role of different ad hoc formats for data science, we surveyed organizers of ad hoc data science education efforts to understand how organizers perceived the events to have gone-including areas of strength and areas requiring growth. We also gathered recommendations from these past events for future organizers. Our results suggest that the perceived benefits of ad hoc efforts go beyond developing technical skills and may provide continued benefit in conjunction with formal curricula, which warrants further investigation. As increasing numbers of researchers from computational fields with a history of complex data become involved with ad hoc efforts to share their skills, the lessons learned that we extract from the surveys will provide concrete suggestions for the practitioner-leaders interested in creating, improving, and sustaining future efforts. Collapse Key Words Collapse MESH Headings Curriculum/trends Data Science/education Data Science/methods Humans Surveys and Questionnaires Collapse Grants Collapse
29	McDonough CW, Breitenstein MK, Shahin M, Empey PE, Freimuth RR, Li L, Liebman M, Tuteja S. Translational Informatics Connects Real-World Information to Knowledge in an Increasingly Data-Driven World. Clin Pharmacol Ther 2020;107:738-741. [PMID: 31837229 PMCID: PMC7678684 DOI: 10.1002/cpt.1719] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 11/01/2019] [Indexed: 11/07/2022] Abstract Collapse Key Words translational informatics electronic health records real-world data clinical pharmacology Collapse MESH Headings Big Data Data Science/methods Data Science/trends Drug Discovery/methods Drug Discovery/trends Humans Medical Informatics/methods Medical Informatics/trends Translational Research, Biomedical/methods Collapse Grants K01 HL141690 NHLBI NIH HHS K23 HL143161 NHLBI NIH HHS Collapse
30	Shaoibi A, Neelon B, Lenert LA. Shared Decision Making: From Decision Science to Data Science. Med Decis Making 2020;40:254-265. [PMID: 32024424 PMCID: PMC7676870 DOI: 10.1177/0272989x20903267] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Abstract Background. Accurate diagnosis of patients' preferences is central to shared decision making. Missing from clinical practice is an approach that links pretreatment preferences and patient-reported outcomes. Objective. We propose a Bayesian collaborative filtering (CF) algorithm that combines pretreatment preferences and patient-reported outcomes to provide treatment recommendations. Design. We present the methodological details of a Bayesian CF algorithm designed to accomplish 3 tasks: 1) eliciting patient preferences using conjoint analysis surveys, 2) clustering patients into preference phenotypes, and 3) making treatment recommendations based on the posttreatment satisfaction of like-minded patients. We conduct a series of simulation studies to test the algorithm and to compare it to a 2-stage approach. Results. The Bayesian CF algorithm and 2-stage approaches performed similarly when there was extensive overlap between preference phenotypes. When the treatment was moderately associated with satisfaction, both methods made accurate recommendations. The kappa estimates measuring agreement between the true and predicted recommendations were 0.70 (95% confidence interval = 0.052-0.88) and 0.73 (0.56-0.90) under the Bayesian CF and 2-stage approaches, respectively. The 2-stage approach failed to converge in settings in which clusters were well separated, whereas the Bayesian CF algorithm produced acceptable results, with kappas of 0.73 (0.56-0.90) and 0.83 (0.69-0.97) for scenarios with moderate and large treatment effects, respectively. Limitations. Our approach assumes that the patient population is composed of distinct preference phenotypes, there is association between treatment and outcomes, and treatment effects vary across phenotypes. Findings are also limited to simulated data. Conclusion. The Bayesian CF algorithm is feasible, provides accurate cluster treatment recommendations, and outperforms 2-stage estimation when clusters are well separated. As such, the approach serves as a roadmap for incorporating predictive analytics into shared decision making. Collapse Key Words collaborative filtering conjoint analysis preference phenotypes recommender systems shared decision making treatment recommendation Collapse MESH Headings Adult Bayes Theorem Data Science/methods Data Science/trends Decision Making, Shared Female Humans Male Patient Participation/methods Patient Participation/psychology Patient Preference/psychology Collapse Grants R21 LM012866 NLM NIH HHS Collapse
31	Xu H, Li J, Jiang X, Chen Q. Electronic Health Records for Drug Repurposing: Current Status, Challenges, and Future Directions. Clin Pharmacol Ther 2020;107:712-714. [PMID: 32012237 PMCID: PMC10815929 DOI: 10.1002/cpt.1769] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Accepted: 01/06/2020] [Indexed: 12/20/2022] Abstract It is well recognized that the global pharmaceutical industry now faces challenges such as high costs and low productivity when developing new drugs (e.g., it is estimated that the average cost for developing a new drug ranges from US $2 billion to $3 billion with the total time to bring it to the market being about 13–15 years).1 Therefore, drug repurposing (also called drug repositioning/reprofiling), which finds new indications for existing drugs, has received great attention in the past decade. Drug repurposing can reduce drug development time, while improving success rates because the toxicity profiles of existing drugs are already known. Studies have shown that new applications for repurposed drugs have nearly a 30% success rate for US Food and Drug Administration (FDA) approval, whereas traditional new drug applications have < 10% approval rate. Collapse Key Words Collapse MESH Headings Data Science/methods Drug Repositioning/methods Electronic Health Records/statistics & numerical data Forecasting Humans Collapse Grants U24 CA194215 NCI NIH HHS Collapse
32	Polasek TM, Rostami-Hodjegan A. Virtual Twins: Understanding the Data Required for Model-Informed Precision Dosing. Clin Pharmacol Ther 2020;107:742-745. [PMID: 32056199 DOI: 10.1002/cpt.1778] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 01/13/2020] [Indexed: 12/16/2022] Abstract Collapse Key Words Collapse MESH Headings Data Science/methods Humans Liquid Biopsy/statistics & numerical data Models, Biological Patient-Specific Modeling Precision Medicine/methods Virtual Reality Collapse Grants Collapse
33	Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020;21:30. [PMID: 32033565 PMCID: PMC7006217 DOI: 10.1186/s13059-020-1935-5] [Citation(s) in RCA: 685] [Impact Index Per Article: 171.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 01/15/2020] [Indexed: 12/11/2022] Open Abstract Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain. Collapse Key Words Data analysis Long-read sequencing Oxford Nanopore PacBio Collapse MESH Headings Animals Data Science/methods Data Science/standards Genomics/methods Genomics/standards Humans Nanopore Sequencing/methods Nanopore Sequencing/standards Whole Genome Sequencing/methods Whole Genome Sequencing/standards Collapse Grants Collapse
34	Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020;21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 545] [Impact Index Per Article: 136.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open Abstract The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years. Collapse Key Words Collapse MESH Headings Animals Data Science/methods Genomics/methods Humans RNA-Seq/methods Single-Cell Analysis/methods Collapse Grants R35 HG010717 NHGRI NIH HHS R01 HG007069 NHGRI NIH HHS P30 CA008748 NCI NIH HHS R00 HG009007 NHGRI NIH HHS 22231 Cancer Research UK C9545/A29580 Cancer Research UK R01 EB025022 NIBIB NIH HHS R00 HG008399 NHGRI NIH HHS Collapse
35	Tran DT, Bhaskara A, Kuberan B, Might M. A graph-based algorithm for RNA-seq data normalization. PLoS One 2020;15:e0227760. [PMID: 31978105 PMCID: PMC6980396 DOI: 10.1371/journal.pone.0227760] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Accepted: 12/28/2019] [Indexed: 12/16/2022] Open Abstract The use of RNA-sequencing has garnered much attention in recent years for characterizing and understanding various biological systems. However, it remains a major challenge to gain insights from a large number of RNA-seq experiments collectively, due to the normalization problem. Normalization has been challenging due to an inherent circularity, requiring that RNA-seq data be normalized before any pattern of differential (or non-differential) expression can be ascertained; meanwhile, the prior knowledge of non-differential transcripts is crucial to the normalization process. Some methods have successfully overcome this problem by the assumption that most transcripts are not differentially expressed. However, when RNA-seq profiles become more abundant and heterogeneous, this assumption fails to hold, leading to erroneous normalization. We present a normalization procedure that does not rely on this assumption, nor prior knowledge about the reference transcripts. This algorithm is based on a graph constructed from intrinsic correlations among RNA-seq transcripts and seeks to identify a set of densely connected vertices as references. Application of this algorithm on our synthesized validation data showed that it could recover the reference transcripts with high precision, thus resulting in high-quality normalization. On a realistic data set from the ENCODE project, this algorithm gave good results and could finish in a reasonable time. These preliminary results imply that we may be able to break the long persisting circularity problem in RNA-seq normalization. Collapse Key Words Collapse MESH Headings Algorithms Computational Biology/methods Data Science/methods Databases, Genetic/statistics & numerical data Feasibility Studies RNA-Seq/methods RNA-Seq/statistics & numerical data Collapse Grants P01 HL107152 NHLBI NIH HHS P30 DK079626 NIDDK NIH HHS National Science Foundation National Heart, Lung, and Blood Institute Collapse
36	Pittard WS, Villaveces CK, Li S. A Bioinformatics Primer to Data Science, with Examples for Metabolomics. Methods Mol Biol 2020;2104:245-263. [PMID: 31953822 DOI: 10.1007/978-1-0716-0239-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Abstract With the increasing importance of big data in biomedicine, skills in data science are a foundation for the individual career development and for the progress of science. This chapter is a practical guide to working with high-throughput biomedical data. It covers how to understand and set up the computing environment, to start a research project with proper and effective data management, and to perform common bioinformatics tasks such as data wrangling, quality control, statistical analysis, and visualization, with examples on metabolomics data. Concepts and tools related to coding and scripting are discussed. Version control, knitr and Jupyter notebooks are important to project management, collaboration, and research reproducibility. Overall, this chapter describes a core set of skills to work in bioinformatics, and can serve as a reference text at the level of a graduate course and interfacing with data science. Collapse Key Words Bioinformatics Cloud computing Data management Data science Data visualization Metabolomics Quality control Scripting Collapse MESH Headings Cloud Computing Computational Biology/methods Computational Biology/standards Data Management Data Science/methods Data Science/standards Database Management Systems Databases, Factual Humans Metabolomics/standards Metabolomics/statistics & numerical data Software Collapse Grants U2C ES030163 NIEHS NIH HHS U2C ES026560 NIEHS NIH HHS P50 ES026071 NIEHS NIH HHS P30 ES019776 NIEHS NIH HHS UH2 AI132345 NIAID NIH HHS U01 CA235493 NCI NIH HHS Collapse
37	Glatzel M, Love S. Dear Reader: Data citation in changing times. Brain Pathol 2019;29:153-154. [PMID: 30821025 DOI: 10.1111/bpa.12702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open Abstract Collapse Key Words Collapse MESH Headings Animals Big Data/supply & distribution Data Science/methods Editorial Policies Humans Collapse Grants Collapse
38	Olatosi B, Zhang J, Weissman S, Hu J, Haider MR, Li X. Using big data analytics to improve HIV medical care utilisation in South Carolina: A study protocol. BMJ Open 2019;9:e027688. [PMID: 31326931 PMCID: PMC6661700 DOI: 10.1136/bmjopen-2018-027688] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 03/28/2019] [Accepted: 06/04/2019] [Indexed: 12/23/2022] Open Abstract INTRODUCTION Linkage and retention in HIV medical care remains problematic in the USA. Extensive health utilisation data collection through electronic health records (EHR) and claims data represent new opportunities for scientific discovery. Big data science (BDS) is a powerful tool for investigating HIV care utilisation patterns. The South Carolina (SC) office of Revenue and Fiscal Affairs (RFA) data warehouse captures individual-level longitudinal health utilisation data for persons living with HIV (PLWH). The data warehouse includes EHR, claims and data from private institutions, housing, prisons, mental health, Medicare, Medicaid, State Health Plan and the department of health and human services. The purpose of this study is to describe the process for creating a comprehensive database of all SC PLWH, and plans for using BDS to explore, identify, characterise and explain new predictors of missed opportunities for HIV medical care utilisation. METHODS AND ANALYSIS This project will create person-level profiles guided by the Gelberg-Andersen Behavioral Model and describe new patterns of HIV care utilisation. The population for the comprehensive database comes from statewide HIV surveillance data (2005-2016) for all SC PLWH (N≈18000). Surveillance data are available from the state health department's enhanced HIV/AIDS Reporting System (e-HARS). Additional data pulls for the e-HARS population will include Ryan White HIV/AIDS Program Service Reports, Health Sciences SC data and Area Health Resource Files. These data will be linked to the RFA data and serve as sources for traditional and vulnerable domain Gelberg-Anderson Behavioral Model variables. The project will use BDS techniques such as machine learning to identify new predictors of HIV care utilisation behaviour among PLWH, and 'missed opportunities' for re-engaging them back into care. ETHICS AND DISSEMINATION The study team applied for data from different sources and submitted individual Institutional Review Board (IRB) applications to the University of South Carolina (USC) IRB and other local authorities/agencies/state departments. This study was approved by the USC IRB (#Pro00068124) in 2017. To protect the identity of the persons living with HIV (PLWH), researchers will only receive linked deidentified data from the RFA. Study findings will be disseminated at local community forums, community advisory group meetings, meetings with our state agencies, local partners and other key stakeholders (including PLWH, policy-makers and healthcare providers), presentations at academic conferences and through publication in peer-reviewed articles. Data security and patient confidentiality are the bedrock of this study. Extensive data agreements ensuring data security and patient confidentiality for the deidentified linked data have been established and are stringently adhered to. The RFA is authorised to collect and merge data from these different sources and to ensure the privacy of all PLWH. The legislatively mandated SC data oversight council reviewed the proposed process stringently before approving it. Researchers will get only the encrypted deidentified dataset to prevent any breach of privacy in the data transfer, management and analysis processes. In addition, established secure data governance rules, data encryption and encrypted predictive techniques will be deployed. In addition to the data anonymisation as a part of privacy-preserving analytics, encryption schemes that protect running prediction algorithms on encrypted data will also be deployed. Best practices and lessons learnt about the complex processes involved in negotiating and navigating multiple data sharing agreements between different entities are being documented for dissemination. Collapse Key Words HIV/AIDS big data science health care utilisation machine learning predictive modeling Collapse MESH Headings Big Data Confidentiality Data Science/methods Electronic Health Records HIV Infections/epidemiology HIV Infections/therapy Humans Insurance Coverage/statistics & numerical data Logistic Models Patient Acceptance of Health Care/statistics & numerical data Population Surveillance Research Design South Carolina Collapse Grants R01 AI127203 NIAID NIH HHS Collapse
39	Grzegorczyk M, Aderhold A, Husmeier D. Overview and Evaluation of Recent Methods for Statistical Inference of Gene Regulatory Networks from Time Series Data. Methods Mol Biol 2019;1883:49-94. [PMID: 30547396 DOI: 10.1007/978-1-4939-8882-2_3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023] Abstract A challenging problem in systems biology is the reconstruction of gene regulatory networks from postgenomic data. A variety of reverse engineering methods from machine learning and computational statistics have been proposed in the literature. However, deciding on the best method to adopt for a particular application or data set might be a confusing task. The present chapter provides a broad overview of state-of-the-art methods with an emphasis on conceptual understanding rather than a deluge of mathematical details, and the pros and cons of the various approaches are discussed. Guidance on practical applications with pointers to publicly available software implementations are included. The chapter concludes with a comprehensive comparative benchmark study on simulated data and a real-work application taken from the current plant systems biology. Collapse Key Words Arabidopsis thaliana Bayesian networks Bio-PEPA Chemical model averaging Circadian regulation Gaussian graphical models Gaussian processes Gene regulatory networks Hierarchical Bayesian models Network inference scoring scheme Sparse regression Collapse MESH Headings Algorithms Arabidopsis/genetics Bayes Theorem Data Science/instrumentation Data Science/methods Gene Expression Profiling/instrumentation Gene Expression Profiling/methods Gene Regulatory Networks Models, Genetic Normal Distribution Software Systems Biology/instrumentation Systems Biology/methods Collapse Grants Collapse
40	Kampe C, Reid G, Jones P, S C, S S, Vogel KM. Bringing the National Security Agency into the Classroom: Ethical Reflections on Academia-Intelligence Agency Partnerships. SCIENCE AND ENGINEERING ETHICS 2019;25:869-898. [PMID: 29318451 DOI: 10.1007/s11948-017-9938-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/21/2017] [Indexed: 06/07/2023] Abstract Academia-intelligence agency collaborations are on the rise for a variety of reasons. These can take many forms, one of which is in the classroom, using students to stand in for intelligence analysts. Classrooms, however, are ethically complex spaces, with students considered vulnerable populations, and become even more complex when layering multiple goals, activities, tools, and stakeholders over those traditionally present. This does not necessarily mean one must shy away from academia-intelligence agency partnerships in classrooms, but that these must be conducted carefully and reflexively. This paper hopes to contribute to this conversation by describing one purposeful classroom encounter that occurred between a professor, students, and intelligence practitioners in the fall of 2015 at North Carolina State University: an experiment conducted as part of a graduate-level political science class that involved students working with a prototype analytic technology, a type of participatory sensing/self-tracking device, developed by the National Security Agency. This experiment opened up the following questions that this paper will explore: What social, ethical, and pedagogical considerations arise with the deployment of a prototype intelligence technology in the college classroom, and how can they be addressed? How can academia-intelligence agency collaboration in the classroom be conducted in ways that provide benefits to all parties, while minimizing disruptions and negative consequences? This paper will discuss the experimental findings in the context of ethical perspectives involved in values in design and participatory/self-tracking data practices, and discuss lessons learned for the ethics of future academia-intelligence agency partnerships in the classroom. Collapse Key Words Intelligence Participatory sensing Prototype Research ethics Self-tracking Values in design Collapse MESH Headings Curriculum Data Science/ethics Data Science/methods Education, Graduate/ethics Education, Graduate/methods Humans North Carolina Privacy Software Students United States United States Government Agencies Universities Workflow Collapse Grants Collapse
41	Musa A, Tripathi S, Dehmer M, Yli-Harja O, Kauffman SA, Emmert-Streib F. Systems Pharmacogenomic Landscape of Drug Similarities from LINCS data: Drug Association Networks. Sci Rep 2019;9:7849. [PMID: 31127155 PMCID: PMC6534546 DOI: 10.1038/s41598-019-44291-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 05/08/2019] [Indexed: 02/01/2023] Open Abstract Modern research in the biomedical sciences is data-driven utilizing high-throughput technologies to generate big genomic data. The Library of Integrated Network-based Cellular Signatures (LINCS) is an example for a large-scale genomic data repository providing hundred thousands of high-dimensional gene expression measurements for thousands of drugs and dozens of cell lines. However, the remaining challenge is how to use these data effectively for pharmacogenomics. In this paper, we use LINCS data to construct drug association networks (DANs) representing the relationships between drugs. By using the Anatomical Therapeutic Chemical (ATC) classification of drugs we demonstrate that the DANs represent a systems pharmacogenomic landscape of drugs summarizing the entire LINCS repository on a genomic scale meaningfully. Here we identify the modules of the DANs as therapeutic attractors of the ATC drug classes. Collapse Key Words computational biology and bioinformatics mathematics and computing Collapse MESH Headings Data Science/methods Databases, Genetic/statistics & numerical data Databases, Pharmaceutical/statistics & numerical data Gene Expression Profiling Gene Regulatory Networks Humans Pharmacogenetics/methods Pharmacological Phenomena/genetics Systems Biology/methods Collapse Grants Collapse
42	Rivière E, Quinton A, Dehail P. [Analysis of the discrimination of the final marks after the first computerized national ranking exam in Medicine in June 2016 in France]. Rev Med Interne 2019;40:286-290. [PMID: 30902508 DOI: 10.1016/j.revmed.2018.10.386] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 10/07/2018] [Accepted: 10/18/2018] [Indexed: 11/18/2022] Abstract INTRODUCTION The first computerised national ranking exam (cNRE) in Medicine was introduced in June 2016 for 8214 students. It was made of 18 progressive clinical cases (PCCs) with multiple choice questions (MCQs), 120 independent MCQs and 2 scientific articles to criticize. A lack of mark discrimination grounded the cNRE reform. We aimed to assess the discrimination of the final marks after this first cNRE. METHODS A national Excel^® file gathering overall statistics and marks were transmitted to the medical faculties after the cNRE. The mean points deviation between two papers and the percentage of points ranking 75% of students allowed us to analyse marks' discrimination. RESULTS The national distribution sigmoid curve of the marks is superimposable with previous NRE in 2015. In PCCs, 72% of students were ranked in 1090 points out of 7560 (14%). In independents MCQs, 73% of students were ranked in 434 points out of 2160 (20%). In critical analysis of articles, 75% of students were ranked in 225 points out of 1080 (21%). The above percentages of students are on the plateau of each discrimination curve for PCCs, independent MCQs and critical analysis of scientific articles. CONCLUSION The cNRE reduced equally-ranked students compared to 2015, with a mean deviation between two papers of 0.28 in 2016 vs 0.04 in 2015. Despite the new format introduced by the cNRE, 75% of students are still ranked in a low proportion of points that is equivalent to previous NRE in 2015 (between 15 et 20% of points). Collapse Key Words Computerized national ranking exam Discrimination des notes Docimologie Educational assessment Mark discrimination Medical education Pédagogie médicale Épreuves classantes nationales informatisées Collapse MESH Headings Computers Data Collection/instrumentation Data Collection/standards Data Science/instrumentation Data Science/methods Education, Medical/classification Education, Medical/methods Education, Medical/standards Education, Medical/statistics & numerical data Educational Measurement/methods France/epidemiology Humans Medicine/instrumentation Medicine/methods Students, Medical/classification Collapse Grants Collapse
43	Alghamdi SM, Sundberg BA, Sundberg JP, Schofield PN, Hoehndorf R. Quantitative evaluation of ontology design patterns for combining pathology and anatomy ontologies. Sci Rep 2019;9:4025. [PMID: 30858527 PMCID: PMC6411989 DOI: 10.1038/s41598-019-40368-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 02/14/2019] [Indexed: 12/28/2022] Open Abstract Data are increasingly annotated with multiple ontologies to capture rich information about the features of the subject under investigation. Analysis may be performed over each ontology separately, but recently there has been a move to combine multiple ontologies to provide more powerful analytical possibilities. However, it is often not clear how to combine ontologies or how to assess or evaluate the potential design patterns available. Here we use a large and well-characterized dataset of anatomic pathology descriptions from a major study of aging mice. We show how different design patterns based on the MPATH and MA ontologies provide orthogonal axes of analysis, and perform differently in over-representation and semantic similarity applications. We discuss how such a data-driven approach might be used generally to generate and evaluate ontology design patterns. Collapse Key Words Collapse MESH Headings Aging/pathology Algorithms Animals Biological Ontologies Data Science/methods Databases as Topic Datasets as Topic Female Male Mice Semantics Collapse Grants P30 AG038070 NIA NIH HHS King Abdullah University of Science and Technology (KAUST) U.S. Department of Health & Human Services \| NIH \| National Institute on Aging (U.S. National Institute on Aging) Warden and Fellows of Robinson College Cambridge Collapse
44	Ezer D, Whitaker K. Data science for the scientific life cycle. eLife 2019;8:e43979. [PMID: 30839275 PMCID: PMC6402833 DOI: 10.7554/elife.43979] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 02/27/2019] [Indexed: 01/18/2023] Open Abstract Data science can be incorporated into every stage of a scientific study. Here we describe how data science can be used to generate hypotheses, to design experiments, to perform experiments, and to analyse data. We also present our vision for how data science techniques will be an integral part of the laboratory of the future. Collapse Key Words computational biology data science experimental design none open science reproducibility systems biology Collapse MESH Headings Biomedical Research/methods Biomedical Research/trends Computational Biology/methods Computational Biology/trends Data Science/methods Data Science/trends Collapse Grants EP/S001360/1 Engineering and Physical Sciences Research Council TU/A/000017 Alan Turing Institute Collapse
45	Huang H, Tang H, Huang J, Chen B, Liu R, Tang RS, Lu Y, Yang P. Special Issue: Selected Papers of the Inaugural DahShu Data Science Symposium: Computational Precision Health (CPH 2017). J Comput Biol 2019;24:635-636. [PMID: 28657834 DOI: 10.1089/cmb.2017.29007.hh] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open Abstract Collapse Key Words Collapse MESH Headings Congresses as Topic Data Science/methods Precision Medicine/methods Collapse Grants Collapse
46	Lapidus M. Not All Library Analytics are Created Equal: LibAnswers to the Rescue! Med Ref Serv Q 2019;38:41-55. [PMID: 30942679 DOI: 10.1080/02763869.2019.1548892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Revised: 11/06/2018] [Accepted: 11/07/2018] [Indexed: 06/09/2023] Abstract The reasons for implementing and the advantages of switching to the Reference Analytics system, a part of the Springshare LibAnswers platform, for collecting reference statistics at a three-campus university library are described. The benefits of using this web-based product are highlighted based on the comparison with the previously used analytical tools and the annual statistical data. Transitioning to Reference Analytics allowed librarians to take advantage of such features, as seamless access to reference transactions, easy customization, cross-tabulation, and data visualization, proving beneficial for overall library reference services. Collapse Key Words Library statistics reference services reference statistics software reference transactions analysis Collapse MESH Headings Data Science/methods Humans Internet Libraries, Medical/statistics & numerical data Library Services/statistics & numerical data Massachusetts Organizational Case Studies Reference Books, Medical Software User-Computer Interface Collapse Grants Collapse
47	Huynh-Thu VA, Sanguinetti G. Gene Regulatory Network Inference: An Introductory Survey. Methods Mol Biol 2019;1883:1-23. [PMID: 30547394 DOI: 10.1007/978-1-4939-8882-2_1] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Abstract Gene regulatory networks are powerful abstractions of biological systems. Since the advent of high-throughput measurement technologies in biology in the late 1990s, reconstructing the structure of such networks has been a central computational problem in systems biology. While the problem is certainly not solved in its entirety, considerable progress has been made in the last two decades, with mature tools now available. This chapter aims to provide an introduction to the basic concepts underpinning network inference tools, attempting a categorization which highlights commonalities and relative strengths. While the chapter is meant to be self-contained, the material presented should provide a useful background to the later, more specialized chapters of this book. Collapse Key Words Data-driven methods Dynamical models Gene regulatory networks Network inference Network reverse-engineering Probabilistic models Unsupervised inference Collapse MESH Headings Algorithms Computational Biology/instrumentation Computational Biology/methods Data Science/instrumentation Data Science/methods Gene Expression Profiling/instrumentation Gene Expression Profiling/methods Gene Expression Regulation Gene Regulatory Networks High-Throughput Screening Assays/instrumentation High-Throughput Screening Assays/methods Models, Genetic Software Collapse Grants Collapse
48	Levin-Schwartz Y, Calhoun VD, Adalı T. A method to compare the discriminatory power of data-driven methods: Application to ICA and IVA. J Neurosci Methods 2019;311:267-276. [PMID: 30389489 PMCID: PMC6258321 DOI: 10.1016/j.jneumeth.2018.10.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2018] [Revised: 08/24/2018] [Accepted: 10/08/2018] [Indexed: 11/20/2022] Abstract BACKGROUND The widespread application of data-driven factorization-based methods, such as independent component analysis (ICA), to functional magnetic resonance imaging data facilitates the study of neural function and how it is disrupted by psychiatric disorders such as schizophrenia. While the increasing number of these methods motivates a comparison of their relative performance, such a comparison is difficult to perform on real fMRI data, since the ground truth is, relatively, unknown and the alignment of factors across different methods is impractical and imprecise. NEW METHOD We present a novel method, global difference maps (GDMs), to compare the results of different fMRI analysis techniques on real fMRI data, quantify their relative performances, and highlight the differences between the decompositions visually. COMPARISON WITH EXISTING METHODS We apply this method to compare the performances of two different factorization-based methods, ICA and its multiset extension independent vector analysis (IVA), for the analysis of fMRI data from 109 patients with schizophrenia and 138 healthy controls during the performance of three tasks. RESULTS Through this application of GDMs, we find that IVA can determine regions that are more discriminatory between patients and controls than ICA, though IVA is less effective at emphasizing regions found in only a subset of the tasks. CONCLUSIONS These results demonstrate that GDMs are an effective way to compare the performances of different factorization-based methods as well as regression-based analyses. Collapse Key Words ICA Schizophrenia fMRI Collapse MESH Headings Brain/diagnostic imaging Brain/physiopathology Brain Mapping/methods Data Interpretation, Statistical Data Science/methods Humans Image Processing, Computer-Assisted/methods Magnetic Resonance Imaging Neuropsychological Tests Schizophrenia/diagnostic imaging Schizophrenia/physiopathology Schizophrenic Psychology Collapse Grants R01 MH118695 NIMH NIH HHS R01 EB005846 NIBIB NIH HHS R01 EB006841 NIBIB NIH HHS T32 HD049311 NICHD NIH HHS R01 EB020407 NIBIB NIH HHS Collapse
49	Stevens SLR, Kuzak M, Martinez C, Moser A, Bleeker P, Galland M. Building a local community of practice in scientific programming for life scientists. PLoS Biol 2018;16:e2005561. [PMID: 30485260 PMCID: PMC6287879 DOI: 10.1371/journal.pbio.2005561] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/10/2018] [Indexed: 11/18/2022] Open Abstract In this paper, we describe why and how to build a local community of practice in scientific programming for life scientists who use computers and programming in their research. A community of practice is a small group of scientists who meet regularly to help each other and promote good practices in scientific programming. While most life scientists are well trained in the laboratory to conduct experiments, good practices with (big) data sets and their analysis are often missing. We propose a model on how to build such a community of practice at a local academic institution, present two real-life examples, and introduce challenges and implemented solutions. We believe that the current data deluge that life scientists face can benefit from the implementation of these small communities. Good practices spread among experimental scientists will foster open, transparent, and sound scientific results beneficial to society. Collapse Key Words Collapse MESH Headings Big Data Community Participation/methods Data Analysis Data Science/methods Education, Professional Humans Models, Theoretical Research Research Design/standards Collapse Grants Collapse
50	Cohen MC, Guetta CD, Jiao K, Provost F. Data-Driven Investment Strategies for Peer-to-Peer Lending: A Case Study for Teaching Data Science. BIG DATA 2018;6:191-213. [PMID: 30283728 PMCID: PMC6154448 DOI: 10.1089/big.2018.0092] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/08/2023] Abstract We develop a number of data-driven investment strategies that demonstrate how machine learning and data analytics can be used to guide investments in peer-to-peer loans. We detail the process starting with the acquisition of (real) data from a peer-to-peer lending platform all the way to the development and evaluation of investment strategies based on a variety of approaches. We focus heavily on how to apply and evaluate the data science methods, and resulting strategies, in a real-world business setting. The material presented in this article can be used by instructors who teach data science courses, at the undergraduate or graduate levels. Importantly, we go beyond just evaluating predictive performance of models, to assess how well the strategies would actually perform, using real, publicly available data. Our treatment is comprehensive and ranges from qualitative to technical, but is also modular-which gives instructors the flexibility to focus on specific parts of the case, depending on the topics they want to cover. The learning concepts include the following: data cleaning and ingestion, classification/probability estimation modeling, regression modeling, analytical engineering, calibration curves, data leakage, evaluation of model performance, basic portfolio optimization, evaluation of investment strategies, and using Python for data science. Collapse Key Words data science machine learning peer-to-peer lending teaching Collapse MESH Headings Data Science/education Data Science/methods Datasets as Topic Female Humans Investments/trends Machine Learning Organizational Case Studies Peer Group Collapse Grants Collapse