1
|
Chen L, Josephs N, Lin L, Zhou J, Kolaczyk ED. A Spectral-Based Framework for Hypothesis Testing in Populations of Networks. Stat Sin 2024. [DOI: 10.5705/ss.202021.0306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
2
|
Zhu X, Shappell H, Kramer MA, Chu CJ, Kolaczyk ED. Distinguishing between different percolation regimes in noisy dynamic networks with an application to epileptic seizures. PLoS Comput Biol 2023; 19:e1011188. [PMID: 37327238 PMCID: PMC10310035 DOI: 10.1371/journal.pcbi.1011188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/29/2023] [Accepted: 05/17/2023] [Indexed: 06/18/2023] Open
Abstract
In clinical neuroscience, epileptic seizures have been associated with the sudden emergence of coupled activity across the brain. The resulting functional networks-in which edges indicate strong enough coupling between brain regions-are consistent with the notion of percolation, which is a phenomenon in complex networks corresponding to the sudden emergence of a giant connected component. Traditionally, work has concentrated on noise-free percolation with a monotonic process of network growth, but real-world networks are more complex. We develop a class of random graph hidden Markov models (RG-HMMs) for characterizing percolation regimes in noisy, dynamically evolving networks in the presence of edge birth and edge death. This class is used to understand the type of phase transitions undergone in a seizure, and in particular, distinguishing between different percolation regimes in epileptic seizures. We develop a hypothesis testing framework for inferring putative percolation mechanisms. As a necessary precursor, we present an EM algorithm for estimating parameters from a sequence of noisy networks only observed at a longitudinal subsampling of time points. Our results suggest that different types of percolation can occur in human seizures. The type inferred may suggest tailored treatment strategies and provide new insights into the fundamental science of epilepsy.
Collapse
Affiliation(s)
- Xiaojing Zhu
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts, United States of America
| | - Heather Shappell
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Mark A. Kramer
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts, United States of America
| | - Catherine J. Chu
- Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Eric D. Kolaczyk
- Department of Mathematics and Statistics, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
3
|
Josephs N, Lin L, Rosenberg S, Kolaczyk ED. Bayesian classification, anomaly detection, and survival analysis using network inputs with application to the microbiome. Ann Appl Stat 2023. [DOI: 10.1214/22-aoas1623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
| | - Lizhen Lin
- Department of Applied and Computational Mathematics and Statistics, The University of Notre Dame
| | | | | |
Collapse
|
4
|
Petros BA, Turcinovic J, Welch NL, White LF, Kolaczyk ED, Bauer MR, Cleary M, Dobbins ST, Doucette-Stamm L, Gore M, Nair P, Nguyen TG, Rose S, Taylor BP, Tsang D, Wendlandt E, Hope M, Platt JT, Jacobson KR, Bouton T, Yune S, Auclair JR, Landaverde L, Klapperich CM, Hamer DH, Hanage WP, MacInnis BL, Sabeti PC, Connor JH, Springer M. Early Introduction and Rise of the Omicron Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Variant in Highly Vaccinated University Populations. Clin Infect Dis 2023; 76:e400-e408. [PMID: 35616119 PMCID: PMC9213864 DOI: 10.1093/cid/ciac413] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/10/2022] [Accepted: 05/19/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The Omicron variant of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is highly transmissible in vaccinated and unvaccinated populations. The dynamics that govern its establishment and propensity toward fixation (reaching 100% frequency in the SARS-CoV-2 population) in communities remain unknown. Here, we describe the dynamics of Omicron at 3 institutions of higher education (IHEs) in the greater Boston area. METHODS We use diagnostic and variant-specifying molecular assays and epidemiological analytical approaches to describe the rapid dominance of Omicron following its introduction into 3 IHEs with asymptomatic surveillance programs. RESULTS We show that the establishment of Omicron at IHEs precedes that of the state and region and that the time to fixation is shorter at IHEs (9.5-12.5 days) than in the state (14.8 days) or region. We show that the trajectory of Omicron fixation among university employees resembles that of students, with a 2- to 3-day delay. Finally, we compare cycle threshold values in Omicron vs Delta variant cases on college campuses and identify lower viral loads among college affiliates who harbor Omicron infections. CONCLUSIONS We document the rapid takeover of the Omicron variant at IHEs, reaching near-fixation within the span of 9.5-12.5 days despite lower viral loads, on average, than the previously dominant Delta variant. These findings highlight the transmissibility of Omicron, its propensity to rapidly dominate small populations, and the ability of robust asymptomatic surveillance programs to offer early insights into the dynamics of pathogen arrival and spread.
Collapse
Affiliation(s)
- Brittany A Petros
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA.,Division of Health Sciences and Technology, Harvard Medical School and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.,Harvard/Massachusetts Institute of Technology, MD-PhD Program, Boston, Massachusetts, USA
| | - Jacquelyn Turcinovic
- National Emerging Infectious Diseases Laboratories, Boston, Massachusetts, USA.,Bioinformatics Program, Boston University, Boston, Massachusetts, USA
| | - Nicole L Welch
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA.,Harvard Program in Virology, Division of Medical Sciences, Harvard Medical School, Boston, Massachusetts, USA
| | - Laura F White
- Department of Biostatistics, School of Public Health, Boston University, Boston, Massachusetts, USA
| | - Eric D Kolaczyk
- Department of Mathematics & Statistics, Boston University, Boston, Massachusetts, USA.,Rafik B. Hariri Institute for Computing and Computational Science and Engineering, Boston University, Boston, Massachusetts, USA
| | - Matthew R Bauer
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA.,Harvard Program in Biological and Biomedical Sciences, Division of Medical Sciences, Harvard Medical School, Boston, Massachusetts, USA
| | - Michael Cleary
- Harvard University Clinical Laboratory, Harvard University, Cambridge, Massachusetts, USA
| | - Sabrina T Dobbins
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA
| | - Lynn Doucette-Stamm
- Boston University Clinical Testing Laboratory, Boston University Boston, Massachusetts, USA
| | - Mitch Gore
- Integrated DNA Technologies, Inc, Coralville, Iowa, USA
| | - Parvathy Nair
- Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Tien G Nguyen
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA
| | - Scott Rose
- Integrated DNA Technologies, Inc, Coralville, Iowa, USA
| | - Bradford P Taylor
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Daniel Tsang
- Integrated DNA Technologies, Inc, Coralville, Iowa, USA
| | | | - Michele Hope
- Harvard University Clinical Laboratory, Harvard University, Cambridge, Massachusetts, USA
| | - Judy T Platt
- Boston University Student Health Services, Boston, Massachusetts, USA
| | - Karen R Jacobson
- Section of Infectious Diseases, Boston University School of Medicine and Boston Medical Center, Boston, Massachusetts, USA
| | - Tara Bouton
- Section of Infectious Diseases, Boston University School of Medicine and Boston Medical Center, Boston, Massachusetts, USA
| | - Seyho Yune
- Student Affairs, Northeastern University, Boston, Massachusetts, USA
| | - Jared R Auclair
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, USA.,Life Sciences Testing Center, Northeastern University, Burlington, Massachusetts, USA.,Biopharmaceutical Analysis and Training Laboratory, Burlington, Massachusetts, USA
| | - Lena Landaverde
- Boston University Clinical Testing Laboratory, Boston University Boston, Massachusetts, USA.,Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Catherine M Klapperich
- Boston University Clinical Testing Laboratory, Boston University Boston, Massachusetts, USA.,Boston University Student Health Services, Boston, Massachusetts, USA.,Boston University Precision Diagnostics Center, Boston University, Boston, Massachusetts, USA
| | - Davidson H Hamer
- National Emerging Infectious Diseases Laboratories, Boston, Massachusetts, USA.,Section of Infectious Diseases, Boston University School of Medicine and Boston Medical Center, Boston, Massachusetts, USA.,Boston University Precision Diagnostics Center, Boston University, Boston, Massachusetts, USA.,Department of Global Health, Boston University School of Public Health, Boston, Massachusetts, USA.,Center for Emerging Infectious Disease Research and Policy, Boston University, Boston, Massachusetts, USA
| | - William P Hanage
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Bronwyn L MacInnis
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA
| | - Pardis C Sabeti
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA.,Howard Hughes Medical Institute, Chevy Chase, Maryland, USA.,Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA.,Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, USA.,Department of Medicine, Division of Infectious Diseases, Massachusetts General Hospital, Boston, Massachusetts, USA.,Massachusetts Consortium on Pathogen Readiness, Boston, Massachusetts, USA
| | - John H Connor
- National Emerging Infectious Diseases Laboratories, Boston, Massachusetts, USA.,Bioinformatics Program, Boston University, Boston, Massachusetts, USA.,Department of Microbiology, Boston University School of Medicine, Boston, Massachusetts, USA
| | - Michael Springer
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
5
|
Kańduła MM, Aldoshin AD, Singh S, Kolaczyk ED, Kreil D. ViLoN-a multi-layer network approach to data integration demonstrated for patient stratification. Nucleic Acids Res 2022; 51:e6. [PMID: 36395816 PMCID: PMC9841426 DOI: 10.1093/nar/gkac988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 10/11/2022] [Accepted: 11/02/2022] [Indexed: 11/19/2022] Open
Abstract
With more and more data being collected, modern network representations exploit the complementary nature of different data sources as well as similarities across patients. We here introduce the Variation of information fused Layers of Networks algorithm (ViLoN), a novel network-based approach for the integration of multiple molecular profiles. As a key innovation, it directly incorporates prior functional knowledge (KEGG, GO). In the constructed network of patients, patients are represented by networks of pathways, comprising genes that are linked by common functions and joint regulation in the disease. Patient stratification remains a key challenge both in the clinic and for research on disease mechanisms and treatments. We thus validated ViLoN for patient stratification on multiple data type combinations (gene expression, methylation, copy number), showing substantial improvements and consistently competitive performance for all. Notably, the incorporation of prior functional knowledge was critical for good results in the smaller cohorts (rectum adenocarcinoma: 90, esophageal carcinoma: 180), where alternative methods failed.
Collapse
Affiliation(s)
- Maciej M Kańduła
- Institute of Molecular Biotechnology, Boku University Vienna, Austria,Janssen Pharmaceutica NV, Beerse, Belgium
| | | | - Swati Singh
- Institute of Molecular Biotechnology, Boku University Vienna, Austria,Department of Biological Sciences and Bioengineering, Indian Institute of Technology Kanpur, Kanpur, India
| | - Eric D Kolaczyk
- Correspondence may also be addressed to Eric D. Kolaczyk. Tel: +1 514 398 3805;
| | - David P Kreil
- To whom correspondence should be addressed. Tel: +43 1 47654 79009;
| |
Collapse
|
6
|
Li W, Bulekova K, Gregor B, White LF, Kolaczyk ED. Estimation of local time-varying reproduction numbers in noisy surveillance data. Philos Trans A Math Phys Eng Sci 2022; 380:20210303. [PMID: 35965456 PMCID: PMC9376722 DOI: 10.1098/rsta.2021.0303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 04/11/2022] [Indexed: 05/04/2023]
Abstract
A valuable metric in understanding local infectious disease dynamics is the local time-varying reproduction number, i.e. the expected number of secondary local cases caused by each infected individual. Accurate estimation of this quantity requires distinguishing cases arising from local transmission from those imported from elsewhere. Realistically, we can expect identification of cases as local or imported to be imperfect. We study the propagation of such errors in estimation of the local time-varying reproduction number. In addition, we propose a Bayesian framework for estimation of the true local time-varying reproduction number when identification errors exist. And we illustrate the practical performance of our estimator through simulation studies and with outbreaks of COVID-19 in Hong Kong and Victoria, Australia. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'.
Collapse
Affiliation(s)
- Wenrui Li
- Department of Mathematics and Statistics, Boston University, Boston, MA 02215, USA
| | - Katia Bulekova
- Research Computing Services, Information Services and Technology Boston University, Boston, MA 02215, USA
| | - Brian Gregor
- Research Computing Services, Information Services and Technology Boston University, Boston, MA 02215, USA
| | - Laura F. White
- Department of Biostatistics, Boston University, Boston, MA 02215, USA
| | - Eric D. Kolaczyk
- Department of Mathematics and Statistics, Boston University, Boston, MA 02215, USA
- Hariri Institute for Computing, Boston University, Boston,MA 02215, USA
| |
Collapse
|
7
|
Zhou Z, Kolaczyk ED, Thompson RN, White LF. Estimation of heterogeneous instantaneous reproduction numbers with application to characterize SARS-CoV-2 transmission in Massachusetts counties. PLoS Comput Biol 2022; 18:e1010434. [PMID: 36048890 PMCID: PMC9473631 DOI: 10.1371/journal.pcbi.1010434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 09/14/2022] [Accepted: 07/25/2022] [Indexed: 11/23/2022] Open
Abstract
The reproductive number is an important metric that has been widely used to quantify the infectiousness of communicable diseases. The time-varying instantaneous reproductive number is useful for monitoring the real-time dynamics of a disease to inform policy making for disease control. Local estimation of this metric, for instance at a county or city level, allows for more targeted interventions to curb transmission. However, simultaneous estimation of local reproductive numbers must account for potential sources of heterogeneity in these time-varying quantities-a key element of which is human mobility. We develop a statistical method that incorporates human mobility between multiple regions for estimating region-specific instantaneous reproductive numbers. The model also can account for exogenous cases imported from outside of the regions of interest. We propose two approaches to estimate the reproductive numbers, with mobility data used to adjust incidence in the first approach and to inform a formal priori distribution in the second (Bayesian) approach. Through a simulation study, we show that region-specific reproductive numbers can be well estimated if human mobility is reasonably well approximated by available data. We use this approach to estimate the instantaneous reproductive numbers of COVID-19 for 14 counties in Massachusetts using CDC case report data and the human mobility data collected by SafeGraph. We found that, accounting for mobility, our method produces estimates of reproductive numbers that are distinct across counties. In contrast, independent estimation of county-level reproductive numbers tends to produce similar values, as trends in county case-counts for the state are fairly concordant. These approaches can also be used to estimate any heterogeneity in transmission, for instance, age-dependent instantaneous reproductive number estimates. As people are more mobile and interact frequently in ways that permit transmission, it is important to account for this in the estimation of the reproductive number.
Collapse
Affiliation(s)
- Zhenwei Zhou
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America
| | - Eric D. Kolaczyk
- Department of Mathematics & Statistics, Boston University, Boston, Massachusetts, United States of America
- Department of Mathematics and Statistics, McGill University, Montreal, Canada
| | - Robin N. Thompson
- Mathematics Institute and SBIDER, University of Warwick, Coventry, England, United Kingdom
| | - Laura F. White
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
8
|
Li W, Bulekova K, Gregor B, White LF, Kolaczyk ED. Estimation of local time-varying reproduction numbers in noisy surveillance data. medRxiv 2022:2021.04.23.21255958. [PMID: 33948612 PMCID: PMC8095231 DOI: 10.1101/2021.04.23.21255958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A valuable metric in understanding local infectious disease dynamics is the local time-varying reproduction number, i.e. the expected number of secondary local cases caused by each infected individual. Accurate estimation of this quantity requires distinguishing cases arising from local transmission from those imported from elsewhere. Realistically, we can expect identification of cases as local or imported to be imperfect. We study the propagation of such errors in estimation of the local time-varying reproduction number. In addition, we propose a Bayesian framework for estimation of the true local time-varying reproduction number when identification errors exist. And we illustrate the practical performance of our estimator through simulation studies and with outbreaks of COVID-19 in Hong Kong and Victoria, Australia.
Collapse
Affiliation(s)
- Wenrui Li
- Department of Mathematics and Statistics, Boston University, Boston MA, USA
| | - Katia Bulekova
- Research Computing Services, Information Services and Technology, Boston University, Boston MA, USA
| | - Brian Gregor
- Research Computing Services, Information Services and Technology, Boston University, Boston MA, USA
| | - Laura F. White
- Department of Biostatistics, Boston University, Boston MA, USA
| | - Eric D. Kolaczyk
- Department of Mathematics and Statistics, Boston University, Boston MA, USA
- Hariri Institute for Computing, Boston University, Boston MA, USA
| |
Collapse
|
9
|
Li W, Kolaczyk ED, White LF. Projecting Quarantine Utilization During a Pandemic. Am J Public Health 2022; 112:277-283. [PMID: 35080960 PMCID: PMC8802605 DOI: 10.2105/ajph.2021.306573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Objectives. To develop an approach to project quarantine needs during an outbreak, particularly for communally housed individuals who interact with outside individuals. Methods. We developed a method that uses basic surveillance data to do short-term projections of future quarantine needs. The development of this method was rigorous, but it is conceptually simple and easy to implement and allows one to anticipate potential superspreading events. We demonstrate how this method can be used with data from the fall 2020 semester of a large urban university in Boston, Massachusetts, that provided quarantine housing for students living on campus in response to the COVID-19 pandemic. Our approach accounted for potentially infectious interactions between individuals living in university housing and those who did not. Results. Our approach was able to accurately project 10-day-ahead quarantine utilization for on-campus students in a large urban university. Our projections were most accurate when we anticipated weekend superspreading events around holidays. Conclusions. We provide an easy-to-use software tool to project quarantine utilization for institutions that can account for mixing with outside populations. This software tool has potential application for universities, corrections facilities, and the military. (Am J Public Health. 2022;112(2):277-283. https://doi.org/10.2105/AJPH.2021.306573).
Collapse
Affiliation(s)
- Wenrui Li
- Wenrui Li is with the Department of Mathematics and Statistics, Boston University, Boston, MA. Eric D. Kolaczyk is with the Department of Mathematics and Statistics and the Hariri Institute for Computing, Boston University. Laura F. White is with the Department of Biostatistics, Boston University
| | - Eric D. Kolaczyk
- Wenrui Li is with the Department of Mathematics and Statistics, Boston University, Boston, MA. Eric D. Kolaczyk is with the Department of Mathematics and Statistics and the Hariri Institute for Computing, Boston University. Laura F. White is with the Department of Biostatistics, Boston University
| | - Laura F. White
- Wenrui Li is with the Department of Mathematics and Statistics, Boston University, Boston, MA. Eric D. Kolaczyk is with the Department of Mathematics and Statistics and the Hariri Institute for Computing, Boston University. Laura F. White is with the Department of Biostatistics, Boston University
| |
Collapse
|
10
|
Hamer DH, White LF, Jenkins HE, Gill CJ, Landsberg HE, Klapperich C, Bulekova K, Platt J, Decarie L, Gilmore W, Pilkington M, MacDowell TL, Faria MA, Densmore D, Landaverde L, Li W, Rose T, Burgay SP, Miller C, Doucette-Stamm L, Lockard K, Elmore K, Schroeder T, Zaia AM, Kolaczyk ED, Waters G, Brown RA. Assessment of a COVID-19 Control Plan on an Urban University Campus During a Second Wave of the Pandemic. JAMA Netw Open 2021; 4:e2116425. [PMID: 34170303 PMCID: PMC8233704 DOI: 10.1001/jamanetworkopen.2021.16425] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/03/2021] [Indexed: 01/15/2023] Open
Abstract
Importance The COVID-19 pandemic has severely disrupted US educational institutions. Given potential adverse financial and psychosocial effects of campus closures, many institutions developed strategies to reopen campuses in the fall 2020 semester despite the ongoing threat of COVID-19. However, many institutions opted to have limited campus reopening to minimize potential risk of spread of SARS-CoV-2. Objective To analyze how Boston University (BU) fully reopened its campus in the fall of 2020 and controlled COVID-19 transmission despite worsening transmission in Boston, Massachusetts. Design, Setting, and Participants This multifaceted intervention case series was conducted at a large urban university campus in Boston, Massachusetts, during the fall 2020 semester. The BU response included a high-throughput SARS-CoV-2 polymerase chain reaction testing facility with capacity to deliver results in less than 24 hours; routine asymptomatic screening for COVID-19; daily health attestations; adherence monitoring and feedback; robust contact tracing, quarantine, and isolation in on-campus facilities; face mask use; enhanced hand hygiene; social distancing recommendations; dedensification of classrooms and public places; and enhancement of all building air systems. Data were analyzed from December 20, 2020, to January 31, 2021. Main Outcomes and Measures SARS-CoV-2 diagnosis confirmed by reverse transcription-polymerase chain reaction of anterior nares specimens and sources of transmission, as determined through contact tracing. Results Between August and December 2020, BU conducted more than 500 000 COVID-19 tests and identified 719 individuals with COVID-19, including 496 students (69.0%), 11 faculty (1.5%), and 212 staff (29.5%). Overall, 718 individuals, or 1.8% of the BU community, had test results positive for SARS-CoV-2. Of 837 close contacts traced, 86 individuals (10.3%) had test results positive for COVID-19. BU contact tracers identified a source of transmission for 370 individuals (51.5%), with 206 individuals (55.7%) identifying a non-BU source. Among 5 faculty and 84 staff with SARS-CoV-2 with a known source of infection, most reported a transmission source outside of BU (all 5 faculty members [100%] and 67 staff members [79.8%]). A BU source was identified by 108 of 183 undergraduate students with SARS-CoV-2 (59.0%) and 39 of 98 graduate students with SARS-CoV-2 (39.8%); notably, no transmission was traced to a classroom setting. Conclusions and Relevance In this case series of COVID-19 transmission, BU used a coordinated strategy of testing, contact tracing, isolation, and quarantine, with robust management and oversight, to control COVID-19 transmission in an urban university setting.
Collapse
Affiliation(s)
- Davidson H. Hamer
- Department of Global Health, Boston University School of Public Health, Boston, Massachusetts
- Section of Infectious Disease, Department of Medicine, Boston University School of Medicine, Boston, Massachusetts
- National Emerging Infectious Disease Laboratory, Boston, Massachusetts
- Precision Diagnostics Center, Boston University, Boston, Massachusetts
| | - Laura F. White
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Helen E. Jenkins
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Christopher J. Gill
- Department of Global Health, Boston University School of Public Health, Boston, Massachusetts
| | - Hannah E. Landsberg
- Student Health Services, Healthway, Boston University, Boston, Massachusetts
| | - Catherine Klapperich
- Precision Diagnostics Center, Boston University, Boston, Massachusetts
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Katia Bulekova
- Information Services and Technology, Boston University, Boston, Massachusetts
| | - Judy Platt
- Student Health Services, Healthway, Boston University, Boston, Massachusetts
| | - Linette Decarie
- Boston University Analytical Services & Institutional Research, Boston, Massachusetts
| | - Wayne Gilmore
- Information Services and Technology, Boston University, Boston, Massachusetts
| | - Megan Pilkington
- Boston University Analytical Services & Institutional Research, Boston, Massachusetts
| | - Trevor L. MacDowell
- Information Services and Technology, Boston University, Boston, Massachusetts
| | - Mark A. Faria
- Information Services and Technology, Boston University, Boston, Massachusetts
| | - Douglas Densmore
- Electrical and Computer Engineering, Boston University, Boston, Massachusetts
- Biological Design Center, Boston University, Boston, Massachusetts
| | - Lena Landaverde
- Student Health Services, Healthway, Boston University, Boston, Massachusetts
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Wenrui Li
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts
| | - Tom Rose
- Human Resources, Boston University, Boston, Massachusetts
| | - Stephen P. Burgay
- Office of External Affairs, Boston University, Boston, Massachusetts
| | - Candice Miller
- BU Clinical Testing Laboratory, Research Department, Boston University, Boston, Massachusetts
| | - Lynn Doucette-Stamm
- BU Clinical Testing Laboratory, Research Department, Boston University, Boston, Massachusetts
| | - Kelly Lockard
- Continuous Improvement & Data Analytics, Boston University, Boston, Massachusetts
| | - Kenneth Elmore
- Office of the Provost, Boston University, Boston, Massachusetts
| | - Tracy Schroeder
- Information Services and Technology, Boston University, Boston, Massachusetts
| | - Ann M. Zaia
- Occupational Health Center, Boston University, Boston Massachusetts
| | - Eric D. Kolaczyk
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts
- Hariri Institute for Computing, Boston University, Boston, Massachusetts
| | - Gloria Waters
- Office of the Provost, Boston University, Boston, Massachusetts
- College of Health and Rehabilitation Services, Sargent College, Boston University, Boston, Massachusetts
| | - Robert A. Brown
- College of Engineering, Boston University, Boston, Massachusetts
- Office of the President, Boston University, Boston, Massachusetts
| |
Collapse
|
11
|
Li J, Manitz J, Bertuzzo E, Kolaczyk ED. Sensor-based localization of epidemic sources on human mobility networks. PLoS Comput Biol 2021; 17:e1008545. [PMID: 33503024 PMCID: PMC7870066 DOI: 10.1371/journal.pcbi.1008545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Revised: 02/08/2021] [Accepted: 11/17/2020] [Indexed: 11/18/2022] Open
Abstract
We investigate the source detection problem in epidemiology, which is one of the most important issues for control of epidemics. Mathematically, we reformulate the problem as one of identifying the relevant component in a multivariate Gaussian mixture model. Focusing on the study of cholera and diseases with similar modes of transmission, we calibrate the parameters of our mixture model using human mobility networks within a stochastic, spatially explicit epidemiological model for waterborne disease. Furthermore, we adopt a Bayesian perspective, so that prior information on source location can be incorporated (e.g., reflecting the impact of local conditions). Posterior-based inference is performed, which permits estimates in the form of either individual locations or regions. Importantly, our estimator only requires first-arrival times of the epidemic by putative observers, typically located only at a small proportion of nodes. The proposed method is demonstrated within the context of the 2000-2002 cholera outbreak in the KwaZulu-Natal province of South Africa. Tracking the source of an epidemic outbreak is of crucial importance as it allows for identification of communities where control efforts should be focused for both short and long-term management and control of the disease. However, such identification is often problematic, time-consuming, and data-intensive. Recently network-based analysis approaches have been established for source detection to account for complex modern spreading, driven substantially by human mobility. Here we develop a probabilistic framework for waterborne disease, that allows investigators to infer the community or the region sparking an outbreak based on a sparse surveillance network. The framework can integrate prior information on the likelihood of a community being the source, for instance as a function of population size or hygiene conditions. Furthermore, we assign an accuracy measure to the resulting source estimate, which is crucial for its practical usability. We test the method in the context of the 2000-2002 cholera outbreak in the KwaZulu-Natal province with promising results. Moreover, we outline a series of guidelines in terms of data needs and preliminary operations to implement the proposed framework in practice.
Collapse
Affiliation(s)
- Jun Li
- Department of Mathematics & Statistics, Boston University, Boston, MA, United States of America
| | - Juliane Manitz
- Department of Mathematics & Statistics, Boston University, Boston, MA, United States of America
| | - Enrico Bertuzzo
- Dipartimento di Scienze Ambientali, Informatica e Statistica, University of Venice Cà Foscari, Italy
| | - Eric D. Kolaczyk
- Department of Mathematics & Statistics, Boston University, Boston, MA, United States of America
- * E-mail:
| |
Collapse
|
12
|
Affiliation(s)
- Jinyuan Chang
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
| | - Eric D. Kolaczyk
- Department of Mathematics and Statistics, Boston University, Boston, MA
| | - Qiwei Yao
- Department of Statistics, London School of Economics and Political Science, London, UK
| |
Collapse
|
13
|
Posner DC, Lin H, Meigs JB, Kolaczyk ED, Dupuis J. Convex combination sequence kernel association test for rare-variant studies. Genet Epidemiol 2020; 44:352-367. [PMID: 32100372 DOI: 10.1002/gepi.22287] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 12/17/2019] [Accepted: 01/27/2020] [Indexed: 02/06/2023]
Abstract
We propose a novel variant set test for rare-variant association studies, which leverages multiple single-nucleotide variant (SNV) annotations. Our approach optimizes a convex combination of different sequence kernel association test (SKAT) statistics, where each statistic is constructed from a different annotation and combination weights are optimized through a multiple kernel learning algorithm. The combination test statistic is evaluated empirically through data splitting. In simulations, we find our method preserves type I error at α = 2.5 × 1 0 - 6 and has greater power than SKAT(-O) when SNV weights are not misspecified and sample sizes are large ( N ≥ 5 , 000 ). We utilize our method in the Framingham Heart Study (FHS) to identify SNV sets associated with fasting glucose. While we are unable to detect any genome-wide significant associations between fasting glucose and 4-kb windows of rare variants ( p < 1 0 - 7 ) in 6,419 FHS participants, our method identifies suggestive associations between fasting glucose and rare variants near ROCK2 ( p = 2.1 × 1 0 - 5 ) and within CPLX1 ( p = 5.3 × 1 0 - 5 ). These two genes were previously reported to be involved in obesity-mediated insulin resistance and glucose-induced insulin secretion by pancreatic beta-cells, respectively. These findings will need to be replicated in other cohorts and validated by functional genomic studies.
Collapse
Affiliation(s)
- Daniel C Posner
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts
| | - Honghuang Lin
- National Heart Lung and Blood Institute's, Boston University's Framingham Heart Study, Framingham, Massachusetts.,Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, Massachusetts
| | - James B Meigs
- Division of General Internal Medicine, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Eric D Kolaczyk
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts.,National Heart Lung and Blood Institute's, Boston University's Framingham Heart Study, Framingham, Massachusetts
| |
Collapse
|
14
|
Kolaczyk ED, Lin L, Rosenberg S, Walters J, Xu J. Averages of unlabeled networks: Geometric characterization and asymptotic behavior. Ann Stat 2020. [DOI: 10.1214/19-aos1820] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
15
|
Hahm HC, Zhou L, Lee C, Maru M, Petersen JM, Kolaczyk ED. Feasibility, preliminary efficacy, and safety of a randomized clinical trial for Asian Women's Action for Resilience and Empowerment (AWARE) intervention. Am J Orthopsychiatry 2019; 89:462-474. [PMID: 31305114 DOI: 10.1037/ort0000383] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
To our knowledge, Asian Women's Action for Resilience and Empowerment (AWARE) is the first gender- and culture-specific and trauma-informed group psychotherapy intervention designed for Asian-American young women with histories of interpersonal violence and trauma and/or Post-Traumatic Stress Disorder (PTSD) diagnosis. We employed a 2-arm randomized controlled trial. Sixty-three women who met clinical criteria for trauma were randomized to the intervention (n = 32) or waitlist control (n = 31) group. We documented retention rates, preliminary efficacy for sexual risk behaviors and depressive symptoms (overall and stratified by PTSD at baseline), and safety in terms of suicidality at baseline, postintervention, and 3-month follow-up. AWARE demonstrated high retention rates, in that 87.50% of those enrolled in the program completed at least 6 out of the 8 sessions. Although there were no differences overall for sexual risk behaviors or depressive symptoms, among women with PTSD, significant reductions in depressive symptoms were observed in treatment compared to control, with an effect size of .84. Suicidal ideation and intent were reduced in both the treatment and control groups, with no attempts during the trial. AWARE is uniquely tailored to serve a pressing clinical need. These results support its feasibility and safety. A large-scale trial targeted at women with PTSD is recommended to further explore the efficacy of AWARE. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
|
16
|
Shappell H, Tripodis Y, Killiany RJ, Kolaczyk ED. A Paradigm for Longitudinal Complex Network Analysis over Patient Cohorts in Neuroscience. Netw Sci (Camb Univ Press) 2019; 7:196-214. [PMID: 33312566 PMCID: PMC7731975 DOI: 10.1017/nws.2019.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The study of complex brain networks, where structural or functional connections are evaluated to create an interconnected representation of the brain, has grown tremendously over the past decade. Much of the statistical network science tools for analyzing brain networks have been developed for cross-sectional studies and for the analysis of static networks. However, with both an increase in longitudinal study designs, as well as an increased interest in the neurological network changes that occur during the progression of a disease, sophisticated methods for longitudinal brain network analysis are needed. We propose a paradigm for longitudinal brain network analysis over patient cohorts, with the key challenge being the adaptation of Stochastic Actor-Oriented Models (SAOMs) to the neuroscience setting. SAOMs are designed to capture network dynamics representing a variety of influences on network change in a continuous-time Markov chain framework. Network dynamics are characterized through both endogenous (i.e., network related) and exogenous effects, where the latter include mechanisms conjectured in the literature. We outline an application to the resting-state fMRI setting with data from the Alzheimers Disease Neuroimaging Initiative (ADNI) study. We draw illustrative conclusions at the subject level and make a comparison between elderly controls and individuals with AD.
Collapse
Affiliation(s)
- Heather Shappell
- Department of Biostatistics, Johns Hopkins University Bloomberg
School of Public Health, Baltimore, MD
| | | | | | - Eric D. Kolaczyk
- Department of Mathematics and Statistics, Boston University,
Boston, MA
| |
Collapse
|
17
|
Spencer E, Martinet LE, Eskandar EN, Chu CJ, Kolaczyk ED, Cash SS, Eden UT, Kramer MA. A procedure to increase the power of Granger-causal analysis through temporal smoothing. J Neurosci Methods 2018; 308:48-61. [PMID: 30031776 PMCID: PMC6200653 DOI: 10.1016/j.jneumeth.2018.07.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 07/06/2018] [Accepted: 07/14/2018] [Indexed: 11/24/2022]
Abstract
BACKGROUND How the human brain coordinates network activity to support cognition and behavior remains poorly understood. New high-resolution recording modalities facilitate a more detailed understanding of the human brain network. Several approaches have been proposed to infer functional networks, indicating the transient coordination of activity between brain regions, from neural time series. One category of approach is based on statistical modeling of time series recorded from multiple sensors (e.g., multivariate Granger causality). However, fitting such models remains computationally challenging as the history structure may be long in neural activity, requiring many model parameters to fully capture the dynamics. NEW METHOD We develop a method based on Granger causality that makes the assumption that the history dependence varies smoothly. We fit multivariate autoregressive models such that the coefficients of the lagged history terms are smooth functions. We do so by modelling the history terms with a lower dimensional spline basis, which requires many fewer parameters than the standard approach and increases the statistical power of the model. RESULTS We show that this procedure allows accurate estimation of brain dynamics and functional networks in simulations and examples of brain voltage activity recorded from a patient with pharmacoresistant epilepsy. COMPARISON WITH EXISTING METHOD The proposed method has more statistical power than the Granger method for networks of signals that exhibit extended and smooth history dependencies. CONCLUSIONS The proposed tool permits conditional inference of functional networks from many brain regions with extended history dependence, furthering the applicability of Granger causality to brain network science.
Collapse
Affiliation(s)
- E Spencer
- Graduate Program in Neuroscience, Boston University, United States
| | - L-E Martinet
- Department of Neurology, Massachusetts General Hospital, United States
| | - E N Eskandar
- Department of Neurology, Massachusetts General Hospital, United States; Department of Neurological Surgery, Albert Einstein College of Medicine, Montefiore Medical Center, United States
| | - C J Chu
- Department of Neurology, Massachusetts General Hospital, United States
| | - E D Kolaczyk
- Department of Mathematics and Statistics, Boston University, United States
| | - S S Cash
- Department of Neurology, Massachusetts General Hospital, United States
| | - U T Eden
- Department of Mathematics and Statistics, Boston University, United States
| | - M A Kramer
- Department of Mathematics and Statistics, Boston University, United States.
| |
Collapse
|
18
|
Kramer JM, Helfrich C, Levin M, Hwang IT, Samuel PS, Carrellas A, Schwartz AE, Goeva A, Kolaczyk ED. Initial evaluation of the effects of an environmental-focused problem-solving intervention for transition-age young people with developmental disabilities: Project TEAM. Dev Med Child Neurol 2018. [PMID: 29528103 DOI: 10.1111/dmcn.13715] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AIM Project TEAM (Teens making Environment and Activity Modifications) teaches transition-age young people with developmental disabilities, including those with co-occurring intellectual or cognitive disabilities, to identify and resolve environmental barriers to participation. We examined its effects on young people's attainment of participation goals, knowledge, problem-solving, self-determination, and self-efficacy. METHOD We used a quasi-experimental, repeated measures design (initial, outcome, 6-week follow-up) with two groups: (1) Project TEAM (28 males, 19 females; mean age 17y 6mo); and (2) goal-setting comparison (21 males, 14 females; mean age 17y 6mo). A matched convenience sample was recruited in two US states. Attainment of participation goals and goal attainment scaling (GAS) T scores were compared at outcome. Differences between groups for all other outcomes were analyzed using linear mixed effects models. RESULTS At outcome, Project TEAM participants demonstrated greater knowledge (estimated mean difference: 1.82; confidence interval [CI]: 0.90, 2.74) and ability to apply knowledge during participation (GAS: t[75]=4.21; CI: 5.21, 14.57) compared to goal-setting. While both groups achieved significant improvements in knowledge, problem-solving, and self-determination, increases in parent reported self-determination remained at 6-week follow-up only for Project TEAM (estimated mean difference: 4.65; CI: 1.32, 7.98). Significantly more Project TEAM participants attained their participation goals by follow-up (Project TEAM=97.6%, goal-setting=77.1%, p=0.009). INTERPRETATION Both approaches support attainment of participation goals. Although inconclusive, Project TEAM may uniquely support young people with developmental disabilities to act in a self-determined manner and apply an environmental problem-solving approach over time. WHAT THIS PAPER ADDS Individualized goal-setting, alone or during Project TEAM (Teens making Environment and Activity Modifications) appears to support attainment of participation goals. Project TEAM appears to support young people with developmental disabilities to apply an environmental problem-solving approach to participation barriers. Parents of young people with developmental disabilities report sustained changes in self-determination 6 weeks after Project TEAM.
Collapse
Affiliation(s)
- Jessica M Kramer
- Department of Occupational Therapy, Boston University, Boston, MA, USA
| | - Christine Helfrich
- Division of Health Sciences, Bristol Community College, New Bedford, MA, USA
| | - Melissa Levin
- Department of Occupational Therapy, Boston University, Boston, MA, USA
| | - I-Ting Hwang
- PhD Program in Rehabilitation Sciences, Boston University, Boston, MA, USA
| | - Preethy S Samuel
- Department of Health Care Sciences, Wayne State University, Detroit, MI, USA
| | - Ann Carrellas
- Michigan Developmental Disabilities Institute, Wayne State University, Detroit, MI, USA
| | - Ariel E Schwartz
- PhD Program in Rehabilitation Sciences, Boston University, Boston, MA, USA
| | | | - Eric D Kolaczyk
- Department of Mathematics and Statistics, Boston University, Boston, MA, USA
| |
Collapse
|
19
|
Hachigian LJ, Carmona V, Fenster RJ, Kulicke R, Heilbut A, Sittler A, Pereira de Almeida L, Mesirov JP, Gao F, Kolaczyk ED, Heiman M. Control of Huntington's Disease-Associated Phenotypes by the Striatum-Enriched Transcription Factor Foxp2. Cell Rep 2018; 21:2688-2695. [PMID: 29212017 DOI: 10.1016/j.celrep.2017.11.018] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 09/19/2017] [Accepted: 11/02/2017] [Indexed: 10/18/2022] Open
Abstract
Alteration of corticostriatal glutamatergic function is an early pathophysiological change associated with Huntington's disease (HD). The factors that regulate the maintenance of corticostriatal glutamatergic synapses post-developmentally are not well understood. Recently, the striatum-enriched transcription factor Foxp2 was implicated in the development of these synapses. Here, we show that, in mice, overexpression of Foxp2 in the adult striatum of two models of HD leads to rescue of HD-associated behaviors, while knockdown of Foxp2 in wild-type mice leads to development of HD-associated behaviors. We note that Foxp2 encodes the longest polyglutamine repeat protein in the human reference genome, and we show that it can be sequestered into aggregates with polyglutamine-expanded mutant Huntingtin protein (mHTT). Foxp2 overexpression in HD model mice leads to altered expression of several genes associated with synaptic function, genes that present additional targets for normalization of corticostriatal dysfunction in HD.
Collapse
Affiliation(s)
- Lea J Hachigian
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA; Picower Institute for Learning and Memory, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Vitor Carmona
- Center for Neuroscience and Cell Biology (CNC) and Faculty of Pharmacy, The University of Coimbra Rua Larga, 3004-504 Coimbra, Portugal
| | - Robert J Fenster
- Picower Institute for Learning and Memory, Cambridge, MA 02139, USA
| | - Ruth Kulicke
- Picower Institute for Learning and Memory, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Adrian Heilbut
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Bioinformatics, Boston University, Boston, MA 02215, USA
| | - Annie Sittler
- ICM (Brain and Spine Institute) Pitié-Salpêtrière Hospital, CNRS UMR 7225, 75013 Paris, France
| | - Luís Pereira de Almeida
- Center for Neuroscience and Cell Biology (CNC) and Faculty of Pharmacy, The University of Coimbra Rua Larga, 3004-504 Coimbra, Portugal
| | - Jill P Mesirov
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Fan Gao
- Picower Institute for Learning and Memory, Cambridge, MA 02139, USA
| | - Eric D Kolaczyk
- Program in Bioinformatics, Boston University, Boston, MA 02215, USA; Department of Mathematics and Statistics, Boston University, Boston, MA 02215, USA
| | - Myriam Heiman
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA; Picower Institute for Learning and Memory, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
20
|
Griffin PJ, Zhang Y, Johnson WE, Kolaczyk ED. Detection of multiple perturbations in multi-omics biological networks. Biometrics 2018; 74:1351-1361. [PMID: 29772079 DOI: 10.1111/biom.12893] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 04/01/2018] [Accepted: 04/01/2018] [Indexed: 01/24/2023]
Abstract
Cellular mechanism-of-action is of fundamental concern in many biological studies. It is of particular interest for identifying the cause of disease and learning the way in which treatments act against disease. However, pinpointing such mechanisms is difficult, due to the fact that small perturbations to the cell can have wide-ranging downstream effects. Given a snapshot of cellular activity, it can be challenging to tell where a disturbance originated. The presence of an ever-greater variety of high-throughput biological data offers an opportunity to examine cellular behavior from multiple angles, but also presents the statistical challenge of how to effectively analyze data from multiple sources. In this setting, we propose a method for mechanism-of-action inference by extending network filtering to multi-attribute data. We first estimate a joint Gaussian graphical model across multiple data types using penalized regression and filter for network effects. We then apply a set of likelihood ratio tests to identify the most likely site of the original perturbation. In addition, we propose a conditional testing procedure to allow for detection of multiple perturbations. We demonstrate this methodology on paired gene expression and methylation data from The Cancer Genome Atlas (TCGA).
Collapse
Affiliation(s)
- Paula J Griffin
- Department of Biostatistics, Boston University School of Public Health, Boston, U.S.A
| | - Yuqing Zhang
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, U.S.A.,Graduate Program in Bioinformatics, Boston University, Boston, U.S.A
| | - William Evan Johnson
- Department of Biostatistics, Boston University School of Public Health, Boston, U.S.A.,Division of Computational Biomedicine, Boston University School of Medicine, Boston, U.S.A.,Graduate Program in Bioinformatics, Boston University, Boston, U.S.A
| | - Eric D Kolaczyk
- Graduate Program in Bioinformatics, Boston University, Boston, U.S.A.,Department of Mathematics and Statistics, Boston University, Boston, U.S.A
| |
Collapse
|
21
|
|
22
|
Affiliation(s)
- Aleksandrina Goeva
- Department of Mathematics & Statistics, Boston University, Boston, MA, USA
| | - Eric D. Kolaczyk
- Department of Mathematics & Statistics, Boston University, Boston, MA, USA
| |
Collapse
|
23
|
Lu C, O'Connor GT, Dupuis J, Kolaczyk ED. Meta-Analysis for Penalized Regression Methods with Multi-Cohort Genome-Wide Association Studies. Hum Hered 2016; 81:142-149. [PMID: 28002817 DOI: 10.1159/000447969] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2016] [Accepted: 06/25/2016] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVE Penalized regression has been successfully applied in genome-wide association studies. While meta-analysis is often conducted to increase power and protect patients' confidentiality, methods for meta-analyzing results of penalized regression in multi-cohort setting are still under development. METHODS We propose to use a data-splitting method to obtain valid p values (or equivalently, coefficient estimates and standard errors) for meta-analysis across multiple cohorts. We examine two ways of splitting data in multi-cohort setting and propose three methods to conduct meta-analysis based on p values. We compare the three meta-analysis methods to mega-analysis, which consists of pooling individual level data. We also apply our proposed meta-analysis approaches to the Framingham Heart Study data, where we divide the original dataset into four parts to create a multi-cohort scenario. RESULTS The simulations suggest that splitting cohorts has better performance than splitting data within each cohort. The real data application also shows that this method provides results that are similar to the mega-analysis. CONCLUSION After comparing the three methods that we proposed to conduct meta-analysis, we recommend splitting cohorts rather than datasets to obtain valid p values for meta-analysis of results from penalized regression in multi-cohort setting.
Collapse
Affiliation(s)
- Chen Lu
- Department of Biostatistics, Boston University School of Public Health, Boston, Mass., USA
| | | | | | | |
Collapse
|
24
|
Chen BH, Hivert MF, Peters MJ, Pilling LC, Hogan JD, Pham LM, Harries LW, Fox CS, Bandinelli S, Dehghan A, Hernandez DG, Hofman A, Hong J, Joehanes R, Johnson AD, Munson PJ, Rybin DV, Singleton AB, Uitterlinden AG, Ying S, Melzer D, Levy D, van Meurs JBJ, Ferrucci L, Florez JC, Dupuis J, Meigs JB, Kolaczyk ED. Peripheral Blood Transcriptomic Signatures of Fasting Glucose and Insulin Concentrations. Diabetes 2016; 65:3794-3804. [PMID: 27625022 PMCID: PMC5127245 DOI: 10.2337/db16-0470] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 09/04/2016] [Indexed: 01/09/2023]
Abstract
Genome-wide association studies (GWAS) have successfully identified genetic loci associated with glycemic traits. However, characterizing the functional significance of these loci has proven challenging. We sought to gain insights into the regulation of fasting insulin and fasting glucose through the use of gene expression microarray data from peripheral blood samples of participants without diabetes in the Framingham Heart Study (FHS) (n = 5,056), the Rotterdam Study (RS) (n = 723), and the InCHIANTI Study (Invecchiare in Chianti) (n = 595). Using a false discovery rate q <0.05, we identified three transcripts associated with fasting glucose and 433 transcripts associated with fasting insulin levels after adjusting for age, sex, technical covariates, and complete blood cell counts. Among the findings, circulating IGF2BP2 transcript levels were positively associated with fasting insulin in both the FHS and RS. Using 1000 Genomes-imputed genotype data, we identified 47,587 cis-expression quantitative trait loci (eQTL) and 6,695 trans-eQTL associated with the 433 significant insulin-associated transcripts. Of note, we identified a trans-eQTL (rs592423), where the A allele was associated with higher IGF2BP2 levels and with fasting insulin in an independent genetic meta-analysis comprised of 50,823 individuals. We conclude that integration of genomic and transcriptomic data implicate circulating IGF2BP2 mRNA levels associated with glucose and insulin homeostasis.
Collapse
Affiliation(s)
- Brian H Chen
- Longitudinal Studies Section, Translational Gerontology Branch, Intramural Research Program, National Institute on Aging, National Institutes of Health, Baltimore, MD
- Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, MA
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, Bethesda, MD
| | - Marie-France Hivert
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Harvard Medical School, Boston, MA
- Diabetes Research Center, Department of Medicine, Massachusetts General Hospital, Boston, MA
- Department of Medicine, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| | - Marjolein J Peters
- Department of Internal Medicine, Erasmus University Medical Center Rotterdam, Rotterdam, the Netherlands
- Netherlands Genomics Initiative-sponsored Netherlands Consortium for Healthy Aging, Leiden and Rotterdam, the Netherlands
| | - Luke C Pilling
- Epidemiology and Public Health Group, Institute of Biomedical and Clinical Sciences, University of Exeter Medical School, Exeter, U.K
| | - John D Hogan
- Program in Bioinformatics, Boston University, Boston, MA
| | - Lisa M Pham
- Program in Bioinformatics, Boston University, Boston, MA
| | - Lorna W Harries
- Institute of Biomedical and Clinical Sciences, University of Exeter Medical School, Exeter, U.K
| | - Caroline S Fox
- Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, MA
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, Bethesda, MD
| | - Stefania Bandinelli
- Geriatric Rehabilitation Unit, Azienda Sanitaria di Firenze, Florence, Italy
| | - Abbas Dehghan
- Department of Epidemiology, Erasmus University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Dena G Hernandez
- Laboratory of Neurogenetics, Intramural Research Program, National Institute on Aging, National Institutes of Health, Bethesda, MD
| | - Albert Hofman
- Department of Epidemiology, Erasmus University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Jaeyoung Hong
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - Roby Joehanes
- Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, MA
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, Bethesda, MD
- Hebrew SeniorLife, Harvard Medical School, Boston, MA
| | - Andrew D Johnson
- Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, MA
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, Bethesda, MD
| | - Peter J Munson
- Mathematical and Statistical Computing Laboratory, Center for Information Technology, National Institutes of Health, Bethesda, MD
| | - Denis V Rybin
- Data Coordinating Center, Boston University, Boston, MA
| | - Andrew B Singleton
- Laboratory of Neurogenetics, Intramural Research Program, National Institute on Aging, National Institutes of Health, Bethesda, MD
| | - André G Uitterlinden
- Department of Internal Medicine, Erasmus University Medical Center Rotterdam, Rotterdam, the Netherlands
- Netherlands Genomics Initiative-sponsored Netherlands Consortium for Healthy Aging, Leiden and Rotterdam, the Netherlands
- Department of Epidemiology, Erasmus University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Saixia Ying
- Mathematical and Statistical Computing Laboratory, Center for Information Technology, National Institutes of Health, Bethesda, MD
| | | | - David Melzer
- Epidemiology and Public Health Group, Institute of Biomedical and Clinical Sciences, University of Exeter Medical School, Exeter, U.K
| | - Daniel Levy
- Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, MA
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, Bethesda, MD
| | - Joyce B J van Meurs
- Department of Internal Medicine, Erasmus University Medical Center Rotterdam, Rotterdam, the Netherlands
- Netherlands Genomics Initiative-sponsored Netherlands Consortium for Healthy Aging, Leiden and Rotterdam, the Netherlands
| | - Luigi Ferrucci
- Longitudinal Studies Section, Translational Gerontology Branch, Intramural Research Program, National Institute on Aging, National Institutes of Health, Baltimore, MD
| | - Jose C Florez
- Diabetes Research Center, Department of Medicine, Massachusetts General Hospital, Boston, MA
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
- Metabolism Program and Program in Medical and Population Genetics, Broad Institute, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | - Josée Dupuis
- Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, MA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA
| | - James B Meigs
- Metabolism Program and Program in Medical and Population Genetics, Broad Institute, Cambridge, MA
- Department of Medicine, Harvard Medical School, Boston, MA
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA
| | - Eric D Kolaczyk
- Program in Bioinformatics, Boston University, Boston, MA
- Department of Mathematics and Statistics, Boston University, MA
| |
Collapse
|
25
|
Pham LM, Carvalho L, Schaus S, Kolaczyk ED. Perturbation Detection Through Modeling of Gene Expression on a Latent Biological Pathway Network: A Bayesian hierarchical approach. J Am Stat Assoc 2016; 111:73-92. [PMID: 27647944 DOI: 10.1080/01621459.2015.1110523] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Cellular response to a perturbation is the result of a dynamic system of biological variables linked in a complex network. A major challenge in drug and disease studies is identifying the key factors of a biological network that are essential in determining the cell's fate. Here our goal is the identification of perturbed pathways from high-throughput gene expression data. We develop a three-level hierarchical model, where (i) the first level captures the relationship between gene expression and biological pathways using confirmatory factor analysis, (ii) the second level models the behavior within an underlying network of pathways induced by an unknown perturbation using a conditional autoregressive model, and (iii) the third level is a spike-and-slab prior on the perturbations. We then identify perturbations through posterior-based variable selection. We illustrate our approach using gene transcription drug perturbation profiles from the DREAM7 drug sensitivity predication challenge data set. Our proposed method identified regulatory pathways that are known to play a causative role and that were not readily resolved using gene set enrichment analysis or exploratory factor models. Simulation results are presented assessing the performance of this model relative to a network-free variant and its robustness to inaccuracies in biological databases.
Collapse
|
26
|
Viles W, Ginestet CE, Tang A, Kramer MA, Kolaczyk ED. Percolation under noise: Detecting explosive percolation using the second-largest component. Phys Rev E 2016; 93:052301. [PMID: 27300904 DOI: 10.1103/physreve.93.052301] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Indexed: 11/07/2022]
Abstract
We consider the problem of distinguishing between different rates of percolation under noise. A statistical model of percolation is constructed allowing for the birth and death of edges as well as the presence of noise in the observations. This graph-valued stochastic process is composed of a latent and an observed nonstationary process, where the observed graph process is corrupted by type-I and type-II errors. This produces a hidden Markov graph model. We show that for certain choices of parameters controlling the noise, the classical (Erdős-Rényi) percolation is visually indistinguishable from a more rapid form of percolation. In this setting, we compare two different criteria for discriminating between these two percolation models, based on the interquartile range (IQR) of the first component's size, and on the maximal size of the second-largest component. We show through data simulations that this second criterion outperforms the IQR of the first component's size, in terms of discriminatory power. The maximal size of the second component therefore provides a useful statistic for distinguishing between different rates of percolation, under physically motivated conditions for the birth and death of edges, and under noise. The potential application of the proposed criteria for the detection of clinically relevant percolation in the context of applied neuroscience is also discussed.
Collapse
Affiliation(s)
- Wes Viles
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA
| | - Cedric E Ginestet
- Department of Biostatistics, Institute of Psychiatry, Psychology and Neuroscience, King's College, London, United Kingdom
| | - Ariana Tang
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA
| | - Mark A Kramer
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA
| | - Eric D Kolaczyk
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
27
|
Zhang S, Wei JS, Li SQ, Badgett TC, Song YK, Agarwal S, Coarfa C, Tolman C, Hurd L, Liao H, He J, Wen X, Liu Z, Thiele CJ, Westermann F, Asgharzadeh S, Seeger RC, Maris JM, Guidry Auvil JM, Smith MA, Kolaczyk ED, Shohet J, Khan J. MYCN controls an alternative RNA splicing program in high-risk metastatic neuroblastoma. Cancer Lett 2016; 371:214-24. [PMID: 26683771 PMCID: PMC4738031 DOI: 10.1016/j.canlet.2015.11.045] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Revised: 11/29/2015] [Accepted: 11/30/2015] [Indexed: 12/20/2022]
Abstract
The molecular mechanisms underlying the aggressive behavior of MYCN driven neuroblastoma (NBL) is under intense investigation; however, little is known about the impact of this family of transcription factors on the splicing program. Here we used high-throughput RNA sequencing to systematically study the expression of RNA isoforms in stage 4 MYCN-amplified NBL, an aggressive subtype of metastatic NBL. We show that MYCN-amplified NBL tumors display a distinct gene splicing pattern affecting multiple cancer hallmark functions. Six splicing factors displayed unique differential expression patterns in MYCN-amplified tumors and cell lines, and the binding motifs for some of these splicing factors are significantly enriched in differentially-spliced genes. Direct binding of MYCN to promoter regions of the splicing factors PTBP1 and HNRNPA1 detected by ChIP-seq demonstrates that MYCN controls the splicing pattern by direct regulation of the expression of these key splicing factors. Furthermore, high expression of PTBP1 and HNRNPA1 was significantly associated with poor overall survival of stage4 NBL patients (p ≤ 0.05). Knocking down PTBP1, HNRNPA1 and their downstream target PKM2, an isoform of pro-tumor-growth, result in repressed growth of NBL cells. Therefore, our study reveals a novel role of MYCN in controlling global splicing program through regulation of splicing factors in addition to its well-known role in the transcription program. These findings suggest a therapeutically potential to target the key splicing factors or gene isoforms in high-risk NBL with MYCN-amplification.
Collapse
Affiliation(s)
- Shile Zhang
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA; Program in Bioinformatics, Boston University, Boston, MA 02218, USA
| | - Jun S Wei
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Samuel Q Li
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Tom C Badgett
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA; Pediatric Hematology and Oncology, Kentucky Children's Hospital, Lexington, KY 40536, USA
| | - Young K Song
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Saurabh Agarwal
- Texas Children's Cancer Center, Center for Cell and Gene Therapy, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Cristian Coarfa
- Texas Children's Cancer Center, Center for Cell and Gene Therapy, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Catherine Tolman
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Laura Hurd
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Hongling Liao
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Jianbin He
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Xinyu Wen
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Zhihui Liu
- Cell & Molecular Biology Section, Pediatric Oncology Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Carol J Thiele
- Cell & Molecular Biology Section, Pediatric Oncology Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA
| | - Frank Westermann
- Neuroblastoma Genomics, B030, German Cancer Research Center, Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
| | - Shahab Asgharzadeh
- Division of Hematology/Oncology, The Children's Hospital Los Angeles, Los Angeles, CA 90027, USA; Saban Research Institute, The Children's Hospital Los Angeles, Los Angeles, CA 90027, USA; Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Robert C Seeger
- Division of Hematology/Oncology, The Children's Hospital Los Angeles, Los Angeles, CA 90027, USA; Saban Research Institute, The Children's Hospital Los Angeles, Los Angeles, CA 90027, USA; Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - John M Maris
- Center for Childhood Cancer Research, the Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Division of Oncology, the Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA; Abramson Family Cancer Research Institute, Philadelphia, PA 19104, USA
| | | | - Malcolm A Smith
- Clinical Investigation Branch, National Cancer Institute, Rockville, MD 20850, USA
| | - Eric D Kolaczyk
- Program in Bioinformatics, Boston University, Boston, MA 02218, USA; Department of Mathematics & Statistics, Boston University, Boston, MA 02218, USA
| | - Jason Shohet
- Texas Children's Cancer Center, Center for Cell and Gene Therapy, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Javed Khan
- Oncogenomics Section, Genetics Branch, National Cancer Institute, National Institute of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
28
|
Telesford QK, Simpson SL, Kolaczyk ED. Editorial: Complexity and emergence in brain network analyses. Front Comput Neurosci 2015; 9:65. [PMID: 26082712 PMCID: PMC4451334 DOI: 10.3389/fncom.2015.00065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 05/17/2015] [Indexed: 11/23/2022] Open
Affiliation(s)
- Qawi K. Telesford
- Complex Systems Group, Department of Bioengineering, University of PennsylvaniaPhiladelphia, PA, USA
- *Correspondence: Qawi K. Telesford,
| | - Sean L. Simpson
- Laboratory for Complex Brain Networks, Division of Public Health Sciences, Wake Forest University School of MedicineWinston-Salem, NC, USA
| | - Eric D. Kolaczyk
- Department of Mathematics and Statistics, Boston UniversityBoston, MA, USA
| |
Collapse
|
29
|
Abstract
The modeling and analysis of networks and network data has seen an explosion of interest in recent years and represents an exciting direction for potential growth in statistics. Despite the already substantial amount of work done in this area to date by researchers from various disciplines, however, there remain many questions of a decidedly foundational nature - natural analogues of standard questions already posed and addressed in more classical areas of statistics - that have yet to even be posed, much less addressed. Here we raise and consider one such question in connection with network modeling. Specifically, we ask, "Given an observed network, what is the sample size?" Using simple, illustrative examples from the class of exponential random graph models, we show that the answer to this question can very much depend on basic properties of the networks expected under the model, as the number of vertices nV in the network grows. In particular, adopting the (asymptotic) scaling of the variance of the maximum likelihood parameter estimates as a notion of effective sample size, say neff, we show that whether the networks are sparse or not under our model (i.e., having relatively few or many edges between vertices, respectively) is sufficient to yield an order of magnitude difference in neff, from O(nV ) to [Formula: see text]. We then explore some practical implications of this result, using both simulation and data on food-sharing from Lamalera, Indonesia.
Collapse
Affiliation(s)
- Eric D Kolaczyk
- Department of Mathematics and Statistics, Boston University, Boston, MA 02215, USA
| | - Pavel N Krivitsky
- School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW 2500, Australia
| |
Collapse
|
30
|
Zhang Y, Kolaczyk ED, Spencer BD. Estimating network degree distributions under sampling: An inverse problem, with applications to monitoring social media networks. Ann Appl Stat 2015. [DOI: 10.1214/14-aoas800] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
31
|
Christadore LM, Pham L, Kolaczyk ED, Schaus SE. Improvement of experimental testing and network training conditions with genome-wide microarrays for more accurate predictions of drug gene targets. BMC Syst Biol 2014; 8:7. [PMID: 24444313 PMCID: PMC3911882 DOI: 10.1186/1752-0509-8-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Accepted: 11/21/2013] [Indexed: 11/10/2022]
Abstract
Background Genome-wide microarrays have been useful for predicting chemical-genetic interactions at the gene level. However, interpreting genome-wide microarray results can be overwhelming due to the vast output of gene expression data combined with off-target transcriptional responses many times induced by a drug treatment. This study demonstrates how experimental and computational methods can interact with each other, to arrive at more accurate predictions of drug-induced perturbations. We present a two-stage strategy that links microarray experimental testing and network training conditions to predict gene perturbations for a drug with a known mechanism of action in a well-studied organism. Results S. cerevisiae cells were treated with the antifungal, fluconazole, and expression profiling was conducted under different biological conditions using Affymetrix genome-wide microarrays. Transcripts were filtered with a formal network-based method, sparse simultaneous equation models and Lasso regression (SSEM-Lasso), under different network training conditions. Gene expression results were evaluated using both gene set and single gene target analyses, and the drug’s transcriptional effects were narrowed first by pathway and then by individual genes. Variables included: (i) Testing conditions – exposure time and concentration and (ii) Network training conditions – training compendium modifications. Two analyses of SSEM-Lasso output – gene set and single gene – were conducted to gain a better understanding of how SSEM-Lasso predicts perturbation targets. Conclusions This study demonstrates that genome-wide microarrays can be optimized using a two-stage strategy for a more in-depth understanding of how a cell manifests biological reactions to a drug treatment at the transcription level. Additionally, a more detailed understanding of how the statistical model, SSEM-Lasso, propagates perturbations through a network of gene regulatory interactions is achieved.
Collapse
Affiliation(s)
| | | | | | - Scott E Schaus
- Department of Chemistry, Boston University, Boston, MA, USA.
| |
Collapse
|
32
|
Lu C, Latourelle J, O'Connor GT, Dupuis J, Kolaczyk ED. Network-guided sparse regression modeling for detection of gene-by-gene interactions. ACTA ACUST UNITED AC 2013; 29:1241-9. [PMID: 23599501 DOI: 10.1093/bioinformatics/btt139] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
MOTIVATION Genetic variants identified by genome-wide association studies to date explain only a small fraction of total heritability. Gene-by-gene interaction is one important potential source of unexplained total heritability. We propose a novel approach to detect such interactions that uses penalized regression and sparse estimation principles, and incorporates outside biological knowledge through a network-based penalty. RESULTS We tested our new method on simulated and real data. Simulation showed that with reasonable outside biological knowledge, our method performs noticeably better than stage-wise strategies (i.e. selecting main effects first, and interactions second, from those main effects selected) in finding true interactions, especially when the marginal strength of main effects is weak. We applied our method to Framingham Heart Study data on total plasma immunoglobulin E (IgE) concentrations and found a number of interactions among different classes of human leukocyte antigen genes that may interact to influence the risk of developing IgE dysregulation and allergy. AVAILABILITY The proposed method is implemented in R and available at http://math.bu.edu/people/kolaczyk/software.html. CONTACT chenlu@bu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chen Lu
- Department of Biostatistics, Boston University School of Public Health, Pulmonary Center, Department of Medicine and Department of Neurology, Boston University School of Medicine, Boston, MA, USA.
| | | | | | | | | |
Collapse
|
33
|
Katenka N, Kolaczyk ED. Inference and characterization of multi-attribute networks with application to computational biology. Ann Appl Stat 2012. [DOI: 10.1214/12-aoas550] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
34
|
Kramer MA, Kolaczyk ED, Eden UT, Cash SS. Functional networks and dynamics in human seizure activity. BMC Neurosci 2011. [PMCID: PMC3240434 DOI: 10.1186/1471-2202-12-s1-p32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
35
|
Fast EM, Toomey ME, Panaram K, Desjardins D, Kolaczyk ED, Frydman HM. Wolbachia enhance Drosophila stem cell proliferation and target the germline stem cell niche. Science 2011; 334:990-2. [PMID: 22021671 DOI: 10.1126/science.1209609] [Citation(s) in RCA: 135] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Wolbachia are widespread maternally transmitted intracellular bacteria that infect most insect species and are able to alter the reproduction of innumerous hosts. The cellular bases of these alterations remain largely unknown. Here, we report that Drosophila mauritiana infected with a native Wolbachia wMau strain produces about four times more eggs than the noninfected counterpart. Wolbachia infection leads to an increase in the mitotic activity of germline stem cells (GSCs), as well as a decrease in programmed cell death in the germarium. Our results suggest that up-regulation of GSC division is mediated by a tropism of Wolbachia for the GSC niche, the cellular microenvironment that supports GSCs.
Collapse
Affiliation(s)
- Eva M Fast
- Department of Biology, Boston University, Boston, MA 02215, USA
| | | | | | | | | | | |
Collapse
|
36
|
Abstract
Predicting the functional roles of proteins based on various genome-wide data, such as protein-protein association networks, has become a canonical problem in computational biology. Approaching this task as a binary classification problem, we develop a network-based extension of the spatial auto-probit model. In particular, we develop a hierarchical Bayesian probit-based framework for modeling binary network-indexed processes, with a latent multivariate conditional autoregressive Gaussian process. The latter allows for the easy incorporation of protein-protein association network topologies-either binary or weighted-in modeling protein functional similarity. We use this framework to predict protein functions, for functions defined as terms in the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functionality. Furthermore, we show how a natural extension of this framework can be used to model and correct for the high percentage of false negative labels in training data derived from GO, a serious shortcoming endemic to biological databases of this type. Our method performance is evaluated and compared with standard algorithms on weighted yeast protein-protein association networks, extracted from a recently developed integrative database called Search Tool for the Retrieval of INteracting Genes/proteins (STRING). Results show that our basic method is competitive with these other methods, and that the extended method-incorporating the uncertainty in negative labels among the training data-can yield nontrivial improvements in predictive accuracy.
Collapse
Affiliation(s)
- Xiaoyu Jiang
- Boehringer Ingelheim Pharmaceuticals, Inc., 900 Ridgebury Road, Ridgefield, Connecticut 06877, USA
| | | | | |
Collapse
|
37
|
Cosgrove EJ, Gardner TS, Kolaczyk ED. On the choice and number of microarrays for transcriptional regulatory network inference. BMC Bioinformatics 2010; 11:454. [PMID: 20825684 PMCID: PMC2949888 DOI: 10.1186/1471-2105-11-454] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Accepted: 09/09/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transcriptional regulatory network inference (TRNI) from large compendia of DNA microarrays has become a fundamental approach for discovering transcription factor (TF)-gene interactions at the genome-wide level. In correlation-based TRNI, network edges can in principle be evaluated using standard statistical tests. However, while such tests nominally assume independent microarray experiments, we expect dependency between the experiments in microarray compendia, due to both project-specific factors (e.g., microarray preparation, environmental effects) in the multi-project compendium setting and effective dependency induced by gene-gene correlations. Herein, we characterize the nature of dependency in an Escherichia coli microarray compendium and explore its consequences on the problem of determining which and how many arrays to use in correlation-based TRNI. RESULTS We present evidence of substantial effective dependency among microarrays in this compendium, and characterize that dependency with respect to experimental condition factors. We then introduce a measure neff of the effective number of experiments in a compendium, and find that corresponding to the dependency observed in this particular compendium there is a huge reduction in effective sample size i.e., neff = 14.7 versus n = 376. Furthermore, we found that the neff of select subsets of experiments actually exceeded neff of the full compendium, suggesting that the adage 'less is more' applies here. Consistent with this latter result, we observed improved performance in TRNI using subsets of the data compared to results using the full compendium. We identified experimental condition factors that trend with changes in TRNI performance and neff , including growth phase and media type. Finally, using the set of known E. coli genetic regulatory interactions from RegulonDB, we demonstrated that false discovery rates (FDR) derived from neff -adjusted p-values were well-matched to FDR based on the RegulonDB truth set. CONCLUSIONS These results support utilization of neff as a potent descriptor of microarray compendia. In addition, they highlight a straightforward correlation-based method for TRNI with demonstrated meaningful statistical testing for significant edges, readily applicable to compendia from any species, even when a truth set is not available. This work facilitates a more refined approach to construction and utilization of mRNA expression compendia in TRNI.
Collapse
Affiliation(s)
- Elissa J Cosgrove
- Department of Mathematics and Statistics, Boston University, Boston, MA, USA
| | | | | |
Collapse
|
38
|
Abstract
A method of 'network filtering' has been proposed recently to detect the effects of certain external perturbations on the interacting members in a network. However, with large networks, the goal of detection seems a priori difficult to achieve, especially since the number of observations available often is much smaller than the number of variables describing the effects of the underlying network. Under the assumption that the network possesses a certain sparsity property, we provide a formal characterization of the accuracy with which the external effects can be detected, using a network filtering system that combines Lasso regression in a sparse simultaneous equation model with simple residual analysis. We explore the implications of the technical conditions underlying our characterization, in the context of various network topologies, and we illustrate our method using simulated data.
Collapse
|
39
|
Kramer MA, Eden UT, Cash SS, Kolaczyk ED. Network inference with confidence from multivariate time series. Phys Rev E Stat Nonlin Soft Matter Phys 2009; 79:061916. [PMID: 19658533 DOI: 10.1103/physreve.79.061916] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2009] [Revised: 05/14/2009] [Indexed: 05/22/2023]
Abstract
Networks--collections of interacting elements or nodes--abound in the natural and manmade worlds. For many networks, complex spatiotemporal dynamics stem from patterns of physical interactions unknown to us. To infer these interactions, it is common to include edges between those nodes whose time series exhibit sufficient functional connectivity, typically defined as a measure of coupling exceeding a predetermined threshold. However, when uncertainty exists in the original network measurements, uncertainty in the inferred network is likely, and hence a statistical propagation of error is needed. In this manuscript, we describe a principled and systematic procedure for the inference of functional connectivity networks from multivariate time series data. Our procedure yields as output both the inferred network and a quantification of uncertainty of the most fundamental interest: uncertainty in the number of edges. To illustrate this approach, we apply a measure of linear coupling to simulated data and electrocorticogram data recorded from a human subject during an epileptic seizure. We demonstrate that the procedure is accurate and robust in both the determination of edges and the reporting of uncertainty associated with that determination.
Collapse
Affiliation(s)
- Mark A Kramer
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts 02215, USA.
| | | | | | | |
Collapse
|
40
|
|
41
|
Cosgrove EJ, Zhou Y, Gardner TS, Kolaczyk ED. Predicting gene targets of perturbations via network-based filtering of mRNA expression compendia. ACTA ACUST UNITED AC 2008; 24:2482-90. [PMID: 18779235 DOI: 10.1093/bioinformatics/btn476] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION DNA microarrays are routinely applied to study diseased or drug-treated cell populations. A critical challenge is distinguishing the genes directly affected by these perturbations from the hundreds of genes that are indirectly affected. Here, we developed a sparse simultaneous equation model (SSEM) of mRNA expression data and applied Lasso regression to estimate the model parameters, thus constructing a network model of gene interaction effects. This inferred network model was then used to filter data from a given experimental condition of interest and predict the genes directly targeted by that perturbation. RESULTS Our proposed SSEM-Lasso method demonstrated substantial improvement in sensitivity compared with other tested methods for predicting the targets of perturbations in both simulated datasets and microarray compendia. In simulated data, for two different network types, and over a wide range of signal-to-noise ratios, our algorithm demonstrated a 167% increase in sensitivity on average for the top 100 ranked genes, compared with the next best method. Our method also performed well in identifying targets of genetic perturbations in microarray compendia, with up to a 24% improvement in sensitivity on average for the top 100 ranked genes. The overall performance of our network-filtering method shows promise for identifying the direct targets of genetic dysregulation in cancer and disease from expression profiles. AVAILABILITY Microarray data are available at the Many Microbe Microarrays Database (M3D, http://m3d.bu.edu). Algorithm scripts are available at the Gardner Lab website (http://gardnerlab.bu.edu/SSEMLasso).
Collapse
Affiliation(s)
- Elissa J Cosgrove
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | | | | | | |
Collapse
|
42
|
Jiang X, Nariai N, Steffen M, Kasif S, Kolaczyk ED. Integration of relational and hierarchical network information for protein function prediction. BMC Bioinformatics 2008; 9:350. [PMID: 18721473 PMCID: PMC2535605 DOI: 10.1186/1471-2105-9-350] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Accepted: 08/22/2008] [Indexed: 11/22/2022] Open
Abstract
Background In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions. Results We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing. Conclusion A cross-validation study, using data from the yeast Saccharomyces cerevisiae, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional in silico validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.
Collapse
Affiliation(s)
- Xiaoyu Jiang
- Department of Mathematics and Statistics, Boston University, Boston, MA 02215, USA.
| | | | | | | | | |
Collapse
|
43
|
Viger F, Barrat A, Dall'Asta L, Zhang CH, Kolaczyk ED. What is the real size of a sampled network? The case of the Internet. Phys Rev E Stat Nonlin Soft Matter Phys 2007; 75:056111. [PMID: 17677137 DOI: 10.1103/physreve.75.056111] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2007] [Indexed: 05/16/2023]
Abstract
Most data concerning the topology of complex networks are the result of mapping projects which bear intrinsic limitations and cannot give access to complete, unbiased datasets. A particularly interesting case is represented by the physical Internet. Router-level Internet mapping projects generally consist of sampling the network from a limited set of sources by using traceroute probes. This methodology, akin to the merging of spanning trees from the different sources to a set of destinations, leads necessarily to a partial, incomplete map of the Internet. The determination of the real Internet topology characteristics from such sampled maps is therefore, in part, a problem of statistical inference. In this paper we present a twofold contribution in order to address this problem. First, we argue that inference of some of the standard topological quantities is, in fact, a version of the so-called "species" problem in statistics, which is important in categorizing the problem and providing some indication of its inherent difficulties. Second, we tackle the issue of estimating arguably the most basic of network characteristics-its number of nodes-and propose two estimators for this quantity, based on subsampling principles. Numerical simulations, as well as an experiment based on probing the Internet, suggest the feasibility of accounting for measurement bias in reporting Internet topology characteristics.
Collapse
Affiliation(s)
- Fabien Viger
- LIP6, UMR 7606 du CNRS, Université de Paris-6, 4 place Jussieu, 75005, Paris, France
| | | | | | | | | |
Collapse
|
44
|
Abstract
Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.
Collapse
Affiliation(s)
- Naoki Nariai
- Bioinformatics Program, Boston University, Boston, Massachusetts, United States of America.
| | | | | |
Collapse
|
45
|
Abstract
We present a modelling framework for detection of potentially anomalous structure in aggregate spatial disease incidence data in a manner sensitive to localization at multiple scales and/or positions. The key technical contribution is the re-casting of the components of a multiscale disease mapping methodology, recently introduced by the authors in an earlier paper, into a form appropriate for hypothesis testing. In particular, we describe how hypotheses of spatially clustered variations in disease incidence may be linked in one-to-one correspondence with collections of hypotheses on the values of certain multiscale parameters associated with a user-defined hierarchy of nested partitions of an overall spatial region. A Bayesian hypothesis testing methodology is developed in the context of a standard Poisson measurement model, over the collection of possible multiscale hypotheses. We discuss the specification of hyper parameters and prior distributions on the space of models. The methodology is illustrated on both simulated and real data.
Collapse
Affiliation(s)
- Mary M Louie
- National Center for Health Statistics, 3311 Toledo Road, Room 3215, Hyattsville, MD 20782, USA.
| | | |
Collapse
|
46
|
Abstract
The effects of spatial scale in disease mapping are well-recognized, in that the information conveyed by such maps varies with scale. Here we provide an inferential framework, in the context of tract count data, for describing the distribution of relative risk simultaneously across a hierarchy of multiple scales. In particular, we offer a multiscale extension of the canonical standardized mortality ratio (SMR), consisting of Bayesian posterior-based strategies for both estimation and characterization of uncertainty. As a result, a hierarchy of informative disease and confidence maps can be produced, without the need to first try to identify a single appropriate scale of analysis. We explore the behaviour of the proposed methodology in a small simulation study, and we illustrate its usage through an application to data on gastric cancer in Tuscany.
Collapse
Affiliation(s)
- Mary M Louie
- Channing Laboratory, Brigham and Women's Hospital and Harvard Medical School, 181 Longwood Avenue, Boston, MA 02115, USA.
| | | |
Collapse
|
47
|
|
48
|
|
49
|
Abstract
Network traffic arises from the superposition of Origin-Destination (OD) flows. Hence, a thorough understanding of OD flows is essential for modeling network traffic, and for addressing a wide variety of problems including traffic engineering, traffic matrix estimation, capacity planning, forecasting and anomaly detection. However, to date, OD flows have not been closely studied, and there is very little known about their properties.We present the first analysis of complete sets of OD flow time-series, taken from two different backbone networks (Abilene and Sprint-Europe). Using Principal Component Analysis (PCA), we find that the set of OD flows has small intrinsic dimension. In fact, even in a network with over a hundred OD flows, these flows can be accurately modeled in time using a small number (10 or less) of independent components or dimensions.We also show how to use PCA to systematically decompose the structure of OD flow timeseries into three main constituents: common periodic trends, short-lived bursts, and noise. We provide insight into how the various constitutents contribute to the overall structure of OD flows and explore the extent to which this decomposition varies over time.
Collapse
|
50
|
Affiliation(s)
| | - Robert D. Nowak
- Department of Electrical and Computer Engineering, University of Wisconsin
| |
Collapse
|