1
|
Kowalski MH, Wessels HH, Linder J, Dalgarno C, Mascio I, Choudhary S, Hartman A, Hao Y, Kundaje A, Satija R. Multiplexed single-cell characterization of alternative polyadenylation regulators. Cell 2024; 187:4408-4425.e23. [PMID: 38925112 DOI: 10.1016/j.cell.2024.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/12/2024] [Accepted: 06/05/2024] [Indexed: 06/28/2024]
Abstract
Most mammalian genes have multiple polyA sites, representing a substantial source of transcript diversity regulated by the cleavage and polyadenylation (CPA) machinery. To better understand how these proteins govern polyA site choice, we introduce CPA-Perturb-seq, a multiplexed perturbation screen dataset of 42 CPA regulators with a 3' scRNA-seq readout that enables transcriptome-wide inference of polyA site usage. We develop a framework to detect perturbation-dependent changes in polyadenylation and characterize modules of co-regulated polyA sites. We find groups of intronic polyA sites regulated by distinct components of the nuclear RNA life cycle, including elongation, splicing, termination, and surveillance. We train and validate a deep neural network (APARENT-Perturb) for tandem polyA site usage, delineating a cis-regulatory code that predicts perturbation response and reveals interactions between regulatory complexes. Our work highlights the potential for multiplexed single-cell perturbation screens to further our understanding of post-transcriptional regulation.
Collapse
Affiliation(s)
- Madeline H Kowalski
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York University Grossman School of Medicine, New York, NY, USA
| | - Hans-Hermann Wessels
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA.
| | - Johannes Linder
- Department of Genetics, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Isabella Mascio
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Saket Choudhary
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | | | - Yuhan Hao
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Rahul Satija
- New York Genome Center, New York, NY, USA; Center for Genomics and Systems Biology, New York University, New York, NY, USA; New York University Grossman School of Medicine, New York, NY, USA.
| |
Collapse
|
2
|
Osberg TM, Doxbeck CR. Partying during a pandemic: role of descriptive partying norms, residence, college alcohol beliefs, and political ideology in COVID-19 party behavior. JOURNAL OF AMERICAN COLLEGE HEALTH : J OF ACH 2023; 71:2938-2948. [PMID: 34855573 DOI: 10.1080/07448481.2021.2008400] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 10/18/2021] [Accepted: 11/12/2021] [Indexed: 06/13/2023]
Abstract
OBJECTIVE Non-adherence to COVID-19 guidelines is a major public health issue. This study explored factors that explain college student party behavior (PB; defined as attending a college party wherein COVID-19 guidelines, including masks and social distancing were ignored) during the pandemic. METHOD Freshmen students at a northeastern university (N = 207; 72% women) responded to an online Fall 2020 semester survey. RESULTS The percentage of students who participated in on-campus partying during past month was 11.6%, with 20.3% participating in off-campus partying. Living on campus and higher perceived norms for partying were associated with higher levels of on-campus PB, whereas higher perceived norms for partying, stronger college alcohol beliefs, and a more conservative political ideology accounted for significant variance in off-campus PB. CONCLUSIONS Efforts to reduce party behavior should target misperception of party behavior norms as well as college alcohol beliefs, and take into account students' residence and political ideology.
Collapse
Affiliation(s)
| | - Courtney R Doxbeck
- Department of Counseling, School, and Educational Psychology, University at Buffalo, Buffalo, New York, USA
| |
Collapse
|
3
|
Liu J, Xu K, Wu T, Yao L, Nguyen TT, Jeste D, Zhang X. Deciphering the 'gut-brain axis' through microbiome diversity. Gen Psychiatr 2023; 36:e101090. [PMID: 37920405 PMCID: PMC10618967 DOI: 10.1136/gpsych-2023-101090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 09/05/2023] [Indexed: 11/04/2023] Open
Abstract
Incentivised by breakthroughs and data generated by the high-throughput sequencing technology, this paper proposes a distance-based framework to fulfil the emerging needs in elucidating insights from the high-dimensional microbiome data in psychiatric studies. By shifting focus from traditional methods that focus on the observations from each subject to the between-subject attributes that aggregate two or more subjects' entire feature vectors, the described approach revolutionises the conventional prescription for high-dimensional observations via microbiome diversity. To this end, we enrich the classical generalised linear models to articulate the multivariable regression relationship between distance-based variables. We also discuss a robust and computationally feasible semiparametric inference technique. Benefitting from the latest advances in the semiparametric efficiency theory for such attributes, the proposed estimators enjoy robustness and good asymptotic properties that guarantee sensitivity in detecting signals between clinical outcomes and microbiome diversity. It offers a readily implementable and easily interpretable solution for deciphering the gut-brain axis in mental health research.
Collapse
Affiliation(s)
- Jinyuan Liu
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, USA
| | - Ke Xu
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, USA
| | - Tsungchin Wu
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health and Human Longevity Science, UC San Diego, La Jolla, California, USA
| | - Lydia Yao
- Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, USA
| | - Tanya T Nguyen
- Department of Psychiatry, Stein Institute for Research on Aging, UC San Diego, La Jolla, California, USA
| | - Dilip Jeste
- Department of Psychiatry, Stein Institute for Research on Aging, UC San Diego, La Jolla, California, USA
| | - Xinlian Zhang
- Division of Biostatistics and Bioinformatics, Herbert Wertheim School of Public Health and Human Longevity Science, UC San Diego, La Jolla, California, USA
| |
Collapse
|
4
|
Fu J, Koslovsky MD, Neophytou AM, Vannucci M. A Bayesian joint model for compositional mediation effect selection in microbiome data. Stat Med 2023. [PMID: 37173609 DOI: 10.1002/sim.9764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 04/17/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Analyzing multivariate count data generated by high-throughput sequencing technology in microbiome research studies is challenging due to the high-dimensional and compositional structure of the data and overdispersion. In practice, researchers are often interested in investigating how the microbiome may mediate the relation between an assigned treatment and an observed phenotypic response. Existing approaches designed for compositional mediation analysis are unable to simultaneously determine the presence of direct effects, relative indirect effects, and overall indirect effects, while quantifying their uncertainty. We propose a formulation of a Bayesian joint model for compositional data that allows for the identification, estimation, and uncertainty quantification of various causal estimands in high-dimensional mediation analysis. We conduct simulation studies and compare our method's mediation effects selection performance with existing methods. Finally, we apply our method to a benchmark data set investigating the sub-therapeutic antibiotic treatment effect on body weight in early-life mice.
Collapse
Affiliation(s)
- Jingyan Fu
- Department of Statistics, Rice University, Houston, Texas, USA
| | - Matthew D Koslovsky
- Department of Statistics, Colorado State University, Fort Collins, Colorado, USA
| | - Andreas M Neophytou
- Department of Environmental & Radiological Health Sciences, Colorado State University, Fort Collins, Colorado, USA
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, Texas, USA
| |
Collapse
|
5
|
Pedone M, Amedei A, Stingo FC. Subject-specific Dirichlet-multinomial regression for multi-district microbiota data analysis. Ann Appl Stat 2023. [DOI: 10.1214/22-aoas1641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Matteo Pedone
- Department of Statistics, Computer Science, Applications, University of Florence
| | - Amedeo Amedei
- Department of Clinical and Experimental Medicine, University of Florence
| | - Francesco C. Stingo
- Department of Statistics, Computer Science, Applications, University of Florence
| |
Collapse
|
6
|
D’Angelo N, Adelfio G, Chiodi M, D’Alessandro A. Statistical Picking of Multivariate Waveforms. SENSORS (BASEL, SWITZERLAND) 2022; 22:9636. [PMID: 36560007 PMCID: PMC9788455 DOI: 10.3390/s22249636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/14/2022] [Accepted: 12/06/2022] [Indexed: 06/17/2023]
Abstract
In this paper, we propose a new approach based on the fitting of a generalized linear regression model in order to detect points of change in the variance of a multivariate-covariance Gaussian variable, where the variance function is piecewise constant. By applying this new approach to multivariate waveforms, our method provides simultaneous detection of change points in functional time series. The proposed approach can be used as a new picking algorithm in order to automatically identify the arrival times of P- and S-waves in different seismograms that are recording the same seismic event. A seismogram is a record of ground motion at a measuring station as a function of time, and it typically records motions along three orthogonal axes (X, Y, and Z), with the Z-axis being perpendicular to the Earth's surface and the X- and Y-axes being parallel to the surface and generally oriented in North-South and East-West directions, respectively. The proposed method was tested on a dataset of simulated waveforms in order to capture changes in the performance according to the waveform characteristics. In an application to real seismic data, our results demonstrated the ability of the multivariate algorithm to pick the arrival times in quite noisy waveforms coming from seismic events with low magnitudes.
Collapse
Affiliation(s)
- Nicoletta D’Angelo
- Dipartimento di Scienze Economiche, Aziendali e Statistiche, Università degli Studi di Palermo, 90128 Palermo, Italy
| | - Giada Adelfio
- Dipartimento di Scienze Economiche, Aziendali e Statistiche, Università degli Studi di Palermo, 90128 Palermo, Italy
- Osservatorio Nazionale Terremoti, Istituto Nazionale di Geofisica e Vulcanologia (INGV), 90146 Palermo, Italy
| | - Marcello Chiodi
- Dipartimento di Scienze Economiche, Aziendali e Statistiche, Università degli Studi di Palermo, 90128 Palermo, Italy
- Osservatorio Nazionale Terremoti, Istituto Nazionale di Geofisica e Vulcanologia (INGV), 90146 Palermo, Italy
| | - Antonino D’Alessandro
- Osservatorio Nazionale Terremoti, Istituto Nazionale di Geofisica e Vulcanologia (INGV), 90146 Palermo, Italy
| |
Collapse
|
7
|
Ryan KJ, Brydon MS, Leatherman ER, Hamada MS. Analysis of overlapping count data. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2126496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- Kenneth J. Ryan
- Department of Statistics, West Virginia University, Morgantown, West Virginia, USA
| | - Michaela S. Brydon
- Department of Mathematics and Statistics, Kenyon College, Gambier, Ohio, USA
| | - Erin R. Leatherman
- Department of Mathematics and Statistics, Kenyon College, Gambier, Ohio, USA
| | - Michael S. Hamada
- Statistical Science Group, Los Alamos National Laboratory, Los Alamos, New Mexico, USA
| |
Collapse
|
8
|
Liu J, Zhang X, Chen T, Wu T, Lin T, Jiang L, Lang S, Liu L, Natarajan L, Tu J, Kosciolek T, Morton J, Nguyen T, Schnabl B, Knight R, Feng C, Zhong Y, Tu X. A semiparametric model for between-subject attributes: Applications to beta-diversity of microbiome data. Biometrics 2022; 78:950-962. [PMID: 34010477 PMCID: PMC8602427 DOI: 10.1111/biom.13487] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 04/23/2021] [Accepted: 05/03/2021] [Indexed: 01/25/2023]
Abstract
The human microbiome plays an important role in our health and identifying factors associated with microbiome composition provides insights into inherent disease mechanisms. By amplifying and sequencing the marker genes in high-throughput sequencing, with highly similar sequences binned together, we obtain operational taxonomic units (OTUs) profiles for each subject. Due to the high-dimensionality and nonnormality features of the OTUs, the measure of diversity is introduced as a summarization at the microbial community level, including the distance-based beta-diversity between individuals. Analyses of such between-subject attributes are not amenable to the predominant within-subject-based statistical paradigm, such as t-tests and linear regression. In this paper, we propose a new approach to model beta-diversity as a response within a regression setting by utilizing the functional response models (FRMs), a class of semiparametric models for between- as well as within-subject attributes. The new approach not only addresses limitations of current methods for beta-diversity with cross-sectional data, but also provides a premise for extending the approach to longitudinal and other clustered data in the future. The proposed approach is illustrated with both real and simulated data.
Collapse
Affiliation(s)
- J. Liu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,Stein Institute for Research on Aging, UC San Diego, San Diego, California, U.S.A
| | - X. Zhang
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,
| | - T. Chen
- Department of Mathematics, University of Toledo, Toledo, Ohio, U.S.A
| | - T. Wu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,Stein Institute for Research on Aging, UC San Diego, San Diego, California, U.S.A
| | - T. Lin
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - L. Jiang
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,Center for Microbiome Innovation, UC San Diego, San Diego, California, U.S.A
| | - S. Lang
- Department of Medicine, UC San Diego, San Diego, California, U.S.A
| | - L. Liu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - L. Natarajan
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A
| | - J.X. Tu
- Physical Medicine and Rehabilitation, University of Virginia Health System, Charlottesville, Virginia, U.S.A
| | - T. Kosciolek
- Department of Pediatrics, UC San Diego, San Diego, California, U.S.A.,Ma lopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
| | - J. Morton
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, New York, U.S.A
| | - T.T Nguyen
- Department of Psychiatry, UC San Diego, San Diego, California, U.S.A.,Stein Institute for Research on Aging, UC San Diego, San Diego, California, U.S.A
| | - B. Schnabl
- Department of Medicine, UC San Diego, San Diego, California, U.S.A
| | - R. Knight
- Department of Pediatrics, UC San Diego, San Diego, California, U.S.A.,Department of Computer Science and Engineering, UC San Diego, San Diego, California, U.S.A.,Department of Bioengineering, UC San Diego, San Diego, California, U.S.A.,Center for Microbiome Innovation, UC San Diego, San Diego, California, U.S.A
| | - C. Feng
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York, U.S.A
| | - Y. Zhong
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A
| | - X.M. Tu
- Department of Family Medicine and Public Health, UC San Diego, San Diego, California, U.S.A.,Stein Institute for Research on Aging, UC San Diego, San Diego, California, U.S.A
| |
Collapse
|
9
|
Shahsavari S, Mohammadi A, Mostafaei S, Zereshki E, Tabatabaei SM, Zhaleh M, Shahsavari M, Zeini F. Analysis of injuries and deaths from road traffic accidents in Iran: bivariate regression approach. BMC Emerg Med 2022; 22:130. [PMID: 35843936 PMCID: PMC9290223 DOI: 10.1186/s12873-022-00686-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/01/2022] [Indexed: 11/10/2022] Open
Abstract
Backgrounds This study aims to estimate and compare the parameters of some univariate and bivariate count models to identify the factors affecting the number of mortality and the number of injured in road accidents. Methods The accident data used in this study are related to Kermanshah province in march2020 to march2021. Accidents areas were divided into 125 areas based on density characteristics. In a one-year period, 3090 accidents happened on the suburban roads of Kermanshah province, which resulted in 398 deaths and 4805 injuries. Accident information, including longitude and latitude of accident location, type of accident (fatal and injury), number of deaths, number of injuries, accident type, the reason of the accident, and the kind of accident were all included as population-level variables in the regression models. We investigated four frequently used bivariate count regression models for accident data in the literature. Results In bivariate analysis, except for the DNM model, there is a reasonable decrease in the AIC measures of the saturated model compared to the reduced model for the other three models. For the injury models, MSE is lowest, respectively for DIBP (137.87), BNB (289.46), BP (412.36) and DNM (3640.89) models. These results are also established for death models. But, in univariate analysis, only injury models almost present reasonable results. Conclusions Our findings show that the IDBP model is better suitable for evaluating accident datasets than other models. Motorcycle accidents, pedestrian accidents, left turn deviance, and dangerous speeding were all significant variables in the IDBP death model, and these parameters were linked to accident mortality. Supplementary Information The online version contains supplementary material available at 10.1186/s12873-022-00686-6.
Collapse
Affiliation(s)
- Soodeh Shahsavari
- Department of Health Information Technology, Faculty of Allied Sciences, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Ali Mohammadi
- Department of Health Information Technology, Faculty of Allied Sciences, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Shayan Mostafaei
- Department of Biostatistics, Faculty of Health, Kermanshah University of Medical Sciences, Kermanshah, Iran.,Inflammation Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Ehsan Zereshki
- Department of Biostatistics, Faculty of Health, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Seyyed Mohammad Tabatabaei
- Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohsen Zhaleh
- Department of Anatomy and Cell Biology, Medicine Faculty, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Meisam Shahsavari
- Imam Ali Hospital Heart Center, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Frouzan Zeini
- Department of Biostatistics, Faculty of Health, Kermanshah University of Medical Sciences, Kermanshah, Iran.
| |
Collapse
|
10
|
Corsini N, Viroli C. Dealing with overdispersion in multivariate count data. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
11
|
Fokianos K, Fried R, Kharin Y, Voloshko V. Statistical analysis of multivariate discrete-valued time series. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2021.104805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
12
|
Zamzami N, Bouguila N. Sparse Count Data Clustering Using an Exponential Approximation to Generalized Dirichlet Multinomial Distributions. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:89-102. [PMID: 33079676 DOI: 10.1109/tnnls.2020.3027539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Clustering frequency vectors is a challenging task on large data sets considering its high dimensionality and sparsity nature. Generalized Dirichlet multinomial (GDM) distribution is a competitive generative model for count data in terms of accuracy, yet its parameters estimation process is slow. The exponential-family approximation of the multivariate Polya distribution has shown to be efficient to train and cluster data directly, without dimensionality reduction. In this article, we derive an exponential-family approximation to the GDM distributions, and we call it (EGDM). A mixture model is developed based on the new member of the exponential-family of distributions, and its parameters are learned through the deterministic annealing expectation-maximization (DAEM) approach as a new clustering algorithm for count data. Moreover, we propose to estimate the optimal number of EGDM mixture components based on the minimum message length (MML) criterion. We have conducted a set of empirical experiments, concerning text, image, and video clustering, to evaluate the proposed approach performance. Results show that the new model attains a superior performance, and it is considerably faster than the corresponding method for GDM distributions.
Collapse
|
13
|
Vilor-Tejedor N, Garrido-Martín D, Rodriguez-Fernandez B, Lamballais S, Guigó R, Gispert JD. Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO! Comput Struct Biotechnol J 2021; 19:5800-5810. [PMID: 34765095 PMCID: PMC8567328 DOI: 10.1016/j.csbj.2021.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 10/08/2021] [Accepted: 10/12/2021] [Indexed: 12/01/2022] Open
Abstract
Imaging genetic studies aim to test how genetic information influences brain structure and function by combining neuroimaging-based brain features and genetic data from the same individual. Most studies focus on individual correlation and association tests between genetic variants and a single measurement of the brain. Despite the great success of univariate approaches, given the capacity of neuroimaging methods to provide a multiplicity of cerebral phenotypes, the development and application of multivariate methods become crucial. In this article, we review novel methods and strategies focused on the analysis of multiple phenotypes and genetic data. We also discuss relevant aspects of multi-trait modelling in the context of neuroimaging data.
Collapse
Affiliation(s)
- Natalia Vilor-Tejedor
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Diego Garrido-Martín
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | | | - Sander Lamballais
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, Netherlands
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Juan Domingo Gispert
- Barcelonaβeta Brain Research Center (BBRC), Pasqual Maragall Foundation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
- IMIM (Hospital del Mar Medical Research Institute), Barcelona, Spain
- Centro de Investigación Biomédica en Red Bioingeniería, Biomateriales y Nanomedicina, Madrid, Spain
| |
Collapse
|
14
|
Zhou C, Zhao H, Wang T. Transformation and differential abundance analysis of microbiome data incorporating phylogeny. Bioinformatics 2021; 37:4652-4660. [PMID: 34302462 DOI: 10.1093/bioinformatics/btab543] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 05/31/2021] [Accepted: 07/22/2021] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Microbiome data have proven extremely useful for understanding microbial communities and their impacts in health and disease. Although microbiome analysis methods and standards are evolving rapidly, obtaining meaningful and interpretable results from microbiome studies still requires careful statistical treatment. In particular, many existing and emerging methods for differential abundance analysis fail to account for the fact that microbiome data are high-dimensional and sparse, compositional, negatively and positively correlated, and phylogenetically structured. To better describe microbiome data and improve the power of differential abundance testing, there is still a great need for the continued development of appropriate statistical methodology. RESULTS In this paper, we propose a model-based approach for microbiome data transformation, and a phylogenetically informed procedure for differential abundance (DA) testing based on the transformed data. First, we extend the Dirichlet-tree multinomial (DTM) to zero-inflated DTM (ZIDTM) for multivariate modeling of microbial counts, addressing data sparsity, and correlation and phylogeny among bacterial taxa. Then, within this framework and using a Bayesian formulation, we introduce posterior mean transformation to convert raw counts into nonzero relative abundances that sum to one, accounting for the compositionality nature of microbiome data. Second, using the transformed data, we propose adaptive analysis of composition of microbiomes (adaANCOM) for DA testing by constructing log-ratios adaptively on the tree for each taxon, greatly reducing the computational complexity of ANCOM in high dimensions. Finally, we present extensive simulation studies, an analysis of HMP data across 18 body sites and 2 visits, and an application to a gut microbiome and malnutrition study, to investigate the performance of posterior mean transformation and adaANCOM. Comparisons with ANCOM and other DA testing procedures show that adaANCOM controls the false discovery rate well, allows for easy interpretation of the results, and is computationally efficient for high-dimensional problems. AVAILABILITY The developed R package is available at https://github.com/ZRChao/adaANCOM. For replicability purposes, scripts for our simulations and data analysis are available at https://github.com/ZRChao/Papers_supplementary. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chao Zhou
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China.,SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, Connecticut, U.S.A.,SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China
| | - Tao Wang
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, Shanghai, China.,SJTU-Yale Joint Center for Biostatistics and Data Science, Shanghai Jiao Tong University, Shanghai, China.,MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
15
|
|
16
|
Koslovsky MD, Hoffman KL, Daniel CR, Vannucci M. A Bayesian model of microbiome data for simultaneous identification of covariate associations and prediction of phenotypic outcomes. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1354] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
17
|
Ma Z, Hanson TE, Ho Y. Flexible bivariate correlated count data regression. Stat Med 2020; 39:3476-3490. [DOI: 10.1002/sim.8676] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 06/07/2020] [Accepted: 06/08/2020] [Indexed: 02/06/2023]
Affiliation(s)
- Zichen Ma
- Department of Statistics University of South Carolina Columbia South Carolina USA
| | | | - Yen‐Yi Ho
- Department of Statistics University of South Carolina Columbia South Carolina USA
| |
Collapse
|
18
|
Koslovsky MD, Vannucci M. MicroBVS: Dirichlet-tree multinomial regression models with Bayesian variable selection - an R package. BMC Bioinformatics 2020; 21:301. [PMID: 32660471 PMCID: PMC7359232 DOI: 10.1186/s12859-020-03640-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 07/02/2020] [Indexed: 11/29/2022] Open
Abstract
Background Understanding the relation between the human microbiome and modulating factors, such as diet, may help researchers design intervention strategies that promote and maintain healthy microbial communities. Numerous analytical tools are available to help identify these relations, oftentimes via automated variable selection methods. However, available tools frequently ignore evolutionary relations among microbial taxa, potential relations between modulating factors, as well as model selection uncertainty. Results We present MicroBVS, an R package for Dirichlet-tree multinomial models with Bayesian variable selection, for the identification of covariates associated with microbial taxa abundance data. The underlying Bayesian model accommodates phylogenetic structure in the abundance data and various parameterizations of covariates’ prior probabilities of inclusion. Conclusion While developed to study the human microbiome, our software can be employed in various research applications, where the aim is to generate insights into the relations between a set of covariates and compositional data with or without a known tree-like structure.
Collapse
|
19
|
Lynch ML, Dudek MF, Bowman SE. A Searchable Database of Crystallization Cocktails in the PDB: Analyzing the Chemical Condition Space. PATTERNS (NEW YORK, N.Y.) 2020; 1:100024. [PMID: 32776019 PMCID: PMC7409820 DOI: 10.1016/j.patter.2020.100024] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 03/22/2020] [Accepted: 03/30/2020] [Indexed: 10/26/2022]
Abstract
Nearly 90% of structural models in the Protein Data Bank (PDB), the central resource worldwide for three-dimensional structural information, are currently derived from macromolecular crystallography (MX). A major bottleneck in determining MX structures is finding conditions in which a biomolecule will crystallize. Here, we present a searchable database of the chemicals associated with successful crystallization experiments from the PDB. We use these data to examine the relationship between protein secondary structure and average molecular weight of polyethylene glycol and to investigate patterns in crystallization conditions. Our analyses reveal striking patterns of both redundancy of chemical compositions in crystallization experiments and extreme sparsity of specific chemical combinations, underscoring the challenges faced in generating predictive models for de novo optimal crystallization experiments.
Collapse
Affiliation(s)
- Miranda L. Lynch
- High-Throughput Crystallization Screening Center, Hauptman-Woodward Medical Research Institute, Buffalo, NY 14203, USA
| | - Max F. Dudek
- University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Sarah E.J. Bowman
- High-Throughput Crystallization Screening Center, Hauptman-Woodward Medical Research Institute, Buffalo, NY 14203, USA
- Department of Biochemistry, Jacobs School of Medicine & Biomedical Sciences at the University at Buffalo, Buffalo, NY 14203, USA
| |
Collapse
|
20
|
Stoner O, Shaddick G, Economou T, Gumy S, Lewis J, Lucio I, Ruggeri G, Adair‐Rohani H. Global household energy model: a multivariate hierarchical approach to estimating trends in the use of polluting and clean fuels for cooking. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12428] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | | | | | - Sophie Gumy
- World Health Organization Geneva Switzerland
| | | | - Itzel Lucio
- World Health Organization Geneva Switzerland
| | | | | |
Collapse
|
21
|
Joseph TA, Pasarkar AP, Pe'er I. Efficient and Accurate Inference of Mixed Microbial Population Trajectories from Longitudinal Count Data. Cell Syst 2020; 10:463-469.e6. [PMID: 32684275 DOI: 10.1016/j.cels.2020.05.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Revised: 03/18/2020] [Accepted: 03/19/2020] [Indexed: 11/15/2022]
Abstract
The recently completed second phase of the Human Microbiome Project has highlighted the relationship between dynamic changes in the microbiome and disease, motivating new microbiome study designs based on longitudinal sampling. Yet, analysis of such data is hindered by presence of technical noise, high dimensionality, and data sparsity. Here, we introduce LUMINATE (longitudinal microbiome inference and zero detection), a fast and accurate method for inferring relative abundances from noisy read count data. We demonstrate that LUMINATE is orders of magnitude faster than current approaches, with better or similar accuracy. We further show that LUMINATE can accurately distinguish biological zeros, when a taxon is absent from the community, from technical zeros, when a taxon is below the detection threshold. We conclude by demonstrating the utility of LUMINATE on a real dataset, showing that LUMINATE smooths trajectories observed from noisy data. LUMINATE is freely available from https://github.com/tyjo/luminate.
Collapse
Affiliation(s)
- Tyler A Joseph
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Amey P Pasarkar
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Itsik Pe'er
- Department of Computer Science, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University, New York, NY 10027, USA; Data Science Institute, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
22
|
Yang D, Johnson J, Zhou X, Deych E, Shands B, Hanson B, Sodergren E, Weinstock G, Shannon WD. New statistical method identifies cytokines that distinguish stool microbiomes. Sci Rep 2019; 9:20082. [PMID: 31882682 PMCID: PMC6934614 DOI: 10.1038/s41598-019-56397-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 11/07/2019] [Indexed: 12/20/2022] Open
Abstract
Regressing an outcome or dependent variable onto a set of input or independent variables allows the analyst to measure associations between the two so that changes in the outcome can be described by and predicted by changes in the inputs. While there are many ways of doing this in classical statistics, where the dependent variable has certain properties (e.g., a scalar, survival time, count), little progress on regression where the dependent variable are microbiome taxa counts has been made that do not impose extremely strict conditions on the data. In this paper, we propose and apply a new regression model combining the Dirichlet-multinomial distribution with recursive partitioning providing a fully non-parametric regression model. This model, called DM-RPart, is applied to cytokine data and microbiome taxa count data and is applicable to any microbiome taxa count/metadata, is automatically fit, and intuitively interpretable. This is a model which can be applied to any microbiome or other compositional data and software (R package HMP) available through the R CRAN website.
Collapse
Affiliation(s)
| | - Jethro Johnson
- Jackson Laboratory for Genomic Medicine, Hartford, CT, USA
| | - Xin Zhou
- Jackson Laboratory for Genomic Medicine, Hartford, CT, USA
| | | | | | - Blake Hanson
- University of Texas Health Sciences Center, Houston, TX, USA
| | | | | | - William D Shannon
- BioRankings, St. Louis, MO, USA. .,Washington University School of Medicine, St Louis, MO, USA.
| |
Collapse
|
23
|
Chauvet J, Trottier C, Bry X. Component-Based Regularization of Multivariate Generalized Linear Mixed Models. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2019.1598870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
| | - Catherine Trottier
- IMAG, Univ Montpellier, CNRS, Montpellier, France
- Univ Paul-Valéry Montpellier 3, Montpellier, France
| | - Xavier Bry
- IMAG, Univ Montpellier, CNRS, Montpellier, France
| |
Collapse
|
24
|
Tang ZZ, Chen G. Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis. Biostatistics 2019; 20:698-713. [PMID: 29939212 PMCID: PMC7410344 DOI: 10.1093/biostatistics/kxy025] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 04/26/2018] [Accepted: 05/06/2018] [Indexed: 12/19/2022] Open
Abstract
There is heightened interest in using high-throughput sequencing technologies to quantify abundances of microbial taxa and linking the abundance to human diseases and traits. Proper modeling of multivariate taxon counts is essential to the power of detecting this association. Existing models are limited in handling excessive zero observations in taxon counts and in flexibly accommodating complex correlation structures and dispersion patterns among taxa. In this article, we develop a new probability distribution, zero-inflated generalized Dirichlet multinomial (ZIGDM), that overcomes these limitations in modeling multivariate taxon counts. Based on this distribution, we propose a ZIGDM regression model to link microbial abundances to covariates (e.g. disease status) and develop a fast expectation-maximization algorithm to efficiently estimate parameters in the model. The derived tests enable us to reveal rich patterns of variation in microbial compositions including differential mean and dispersion. The advantages of the proposed methods are demonstrated through simulation studies and an analysis of a gut microbiome dataset.
Collapse
Affiliation(s)
- Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of
Wisconsin-Madison, Madison, WI, USA and Wisconsin Institute for
Discovery, Madison, WI, USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of
Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
25
|
Das I, Sen S, Chaganty NR, Sengupta P. Regression for doubly inflated multivariate Poisson distributions. J STAT COMPUT SIM 2019. [DOI: 10.1080/00949655.2019.1625051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Ishapathik Das
- Department of Mathematics, Indian Institute of Technology Tirupati, Tirupati, India
| | - Sumen Sen
- Department of Mathematics and Statistics, Austin Peay State University, Clarksville, TN, USA
| | - N. Rao Chaganty
- Department of Mathematics and Statistics, Old Dominion University, Norfolk, VA, USA
| | | |
Collapse
|
26
|
D’Angelo F, Ceccarelli M, Tala, Garofano L, Zhang J, Frattini V, Caruso FP, Lewis G, Alfaro KD, Bauchet L, Berzero G, Cachia D, Cangiano M, Capelle L, de Groot J, DiMeco F, Ducray F, Farah W, Finocchiaro G, Goutagny S, Kamiya-Matsuoka C, Lavarino C, Loiseau H, Lorgis V, Marras CE, McCutcheon I, Nam DH, Ronchi S, Saletti V, Seizeur R, Slopis J, Suñol M, Vandenbos F, Varlet P, Vidaud D, Watts C, Tabar V, Reuss DE, Kim SK, Meyronet D, Mokhtari K, Salvador H, Bhat KP, Eoli M, Sanson M, Lasorella A, lavarone A. The molecular landscape of glioma in patients with Neurofibromatosis 1. Nat Med 2019; 25:176-187. [PMID: 30531922 PMCID: PMC6857804 DOI: 10.1038/s41591-018-0263-8] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 10/17/2018] [Indexed: 12/30/2022]
Abstract
Neurofibromatosis type 1 (NF1) is a common tumor predisposition syndrome in which glioma is one of the prevalent tumors. Gliomagenesis in NF1 results in a heterogeneous spectrum of low- to high-grade neoplasms occurring during the entire lifespan of patients. The pattern of genetic and epigenetic alterations of glioma that develops in NF1 patients and the similarities with sporadic glioma remain unknown. Here, we present the molecular landscape of low- and high-grade gliomas in patients affected by NF1 (NF1-glioma). We found that the predisposing germline mutation of the NF1 gene was frequently converted to homozygosity and the somatic mutational load of NF1-glioma was influenced by age and grade. High-grade tumors harbored genetic alterations of TP53 and CDKN2A, frequent mutations of ATRX associated with Alternative Lengthening of Telomere, and were enriched in genetic alterations of transcription/chromatin regulation and PI3 kinase pathways. Low-grade tumors exhibited fewer mutations that were over-represented in genes of the MAP kinase pathway. Approximately 50% of low-grade NF1-gliomas displayed an immune signature, T lymphocyte infiltrates, and increased neo-antigen load. DNA methylation assigned NF1-glioma to LGm6, a poorly defined Isocitrate Dehydrogenase 1 wild-type subgroup enriched with ATRX mutations. Thus, the profiling of NF1-glioma defined a distinct landscape that recapitulates a subset of sporadic tumors.
Collapse
Affiliation(s)
- Fulvio D’Angelo
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA.,BIOGEM Istituto di Ricerche Genetiche ‘G. Salvatore’, Ariano Irpino, Italy.,These authors contributed equally: F. D’Angelo, M. Ceccarelli
| | - Michele Ceccarelli
- BIOGEM Istituto di Ricerche Genetiche ‘G. Salvatore’, Ariano Irpino, Italy.,Department of Science and Technology, Università degli Studi del Sannio, Benevento, Italy.,These authors contributed equally: F. D’Angelo, M. Ceccarelli
| | - Tala
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA
| | - Luciano Garofano
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA.,BIOGEM Istituto di Ricerche Genetiche ‘G. Salvatore’, Ariano Irpino, Italy
| | - Jing Zhang
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA
| | - Véronique Frattini
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA
| | - Francesca P. Caruso
- BIOGEM Istituto di Ricerche Genetiche ‘G. Salvatore’, Ariano Irpino, Italy.,Department of Science and Technology, Università degli Studi del Sannio, Benevento, Italy
| | - Genevieve Lewis
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA
| | - Kristin D. Alfaro
- The University of Texas M.D. Anderson Cancer Center John Mendelsohn Faculty Center (FC7.3025) – Neuro-Oncology – Unit 0431, Houston, TX, USA
| | - Luc Bauchet
- Department of Neurosurgery, Gui de Chauliac Hospital, Montpellier University Medical Center, Montpellier, France
| | - Giulia Berzero
- Sorbonne Universités UPMC Université Paris 06, UMR S 1127, Inserm U 1127, CNRS UMR 7225, ICM, APHP, Paris, France
| | - David Cachia
- Department of Neuro-Oncology, Medical University of South Carolina, Charleston, SC, USA.,Department of Neurosurgery, Medical University of South Carolina, Charleston, SC, USA
| | - Mario Cangiano
- BIOGEM Istituto di Ricerche Genetiche ‘G. Salvatore’, Ariano Irpino, Italy
| | - Laurent Capelle
- AP-HP, Hôpital de la Pitié-Salpêtrière, Service de Neurochirurgie, Paris, France
| | - John de Groot
- The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Francesco DiMeco
- Department of Neurological Surgery, Carlo Besta Neurological Institute, Milan, Italy.,Department of Pathophysiology and Transplantation, University of Milan, Milan, Italy.,Hunterian Brain Tumor Research Laboratory CRB2 2M41, Baltimore, MD, USA
| | - François Ducray
- Service de Neuro-Oncologie, Hospices Civils de Lyon, Université Claude Bernard Lyon 1, Department of Cancer Cell Plasticity, Cancer Research Center of Lyon, INSERM U1052, CNRS UMR5286, Lyon, France
| | - Walid Farah
- Department of Neurosurgery, CHU, Dijon, France
| | - Gaetano Finocchiaro
- Unit of Molecular Neuro-Oncology, IRCCS Foundation, Carlo Besta Neurological Institute, Milan, Italy
| | - Stéphane Goutagny
- Service de Neurochirurgie, Hôpital Beaujon, Assistance PubliqueHôpitaux de Paris, Clichy, France
| | | | - Cinzia Lavarino
- Developmental Tumor Laboratory, Fundación Sant Joan de Déu, Barcelona, Spain
| | - Hugues Loiseau
- Department of Neurosurgery, Bordeaux University Hospital. Labex TRAIL (ANR-10-LABX-57). EA 7435 – IMOTION Bordeaux University, Bordeaux, France
| | - Véronique Lorgis
- Department of Medical Oncology, Centre GF Leclerc, Dijon, France
| | - Carlo E. Marras
- Pediatric Neurosurgery Unit, Department of Neuroscience and Neurorehabilitation, Bambino Gesù Children’s Hospital, Rome, Italy
| | - Ian McCutcheon
- The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Do-Hyun Nam
- Department of Neurosurgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.,Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Republic of Korea
| | - Susanna Ronchi
- Sorbonne Universités UPMC Université Paris 06, UMR S 1127, Inserm U 1127, CNRS UMR 7225, ICM, APHP, Paris, France
| | - Veronica Saletti
- Developmental Neurology Unit, IRCCS Foundation, Carlo Besta Neurological Institute, Milan, Italy
| | - Romuald Seizeur
- Service de Neurochirurgie, Hôpital de la Cavale Blanche, CHRU de Brest, Université de Brest, Brest, France
| | - John Slopis
- The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Mariona Suñol
- Department of Pathology, Hospital Sant Joan de Déu, Barcelona, Spain
| | - Fanny Vandenbos
- Central Laboratory of Pathology, Pasteur I University Hospital, Nice, France
| | - Pascale Varlet
- Department of Neuropathology, Sainte-Anne Hospital, Paris, France.,IMA-Brain, Inserm U894, Institute of Psychiatry and Neuroscience of Paris, Paris, France
| | - Dominique Vidaud
- EA7331, Université Paris Descartes, France; Service de Génétique et Biologie Moléculaires, Hôpital Cochin, AP-HP, Paris, France
| | - Colin Watts
- Institute of Cancer and Genomic Sciences University of Birmingham Edgbaston, Birmingham, United Kingdom
| | - Viviane Tabar
- Department of Neurosurgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - David E. Reuss
- Clinical Cooperation Unit Neuropathology, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Department of Neuropathology, Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
| | - Seung-Ki Kim
- Division of Pediatric Neurosurgery, Seoul National University Children’s Hospital, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - David Meyronet
- Centre de Pathologie Et Neuropathologie Est Hospices Civils de Lyon, Lyon, France
| | - Karima Mokhtari
- Sorbonne Universités UPMC Université Paris 06, UMR S 1127, Inserm U 1127, CNRS UMR 7225, ICM, APHP, Paris, France
| | - Hector Salvador
- Pediatric Oncology Unit, Hospital Sant Joan de Déu, Esplugues, Barcelona, Spain
| | - Krishna P. Bhat
- The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | - Marica Eoli
- Unit of Molecular Neuro-Oncology, IRCCS Foundation, Carlo Besta Neurological Institute, Milan, Italy
| | - Marc Sanson
- Sorbonne Universités UPMC Université Paris 06, UMR S 1127, Inserm U 1127, CNRS UMR 7225, ICM, APHP, Paris, France
| | - Anna Lasorella
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA. .,Department of Pediatrics, Columbia University Medical Center, New York, NY, USA. .,Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA.
| | - Antonio lavarone
- Institute for Cancer Genetics, Columbia University Medical Center, New York, NY, USA.,Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA.,Department of Neurology, Columbia University Medical Center, New York, NY, USA.,These authors jointly supervised this work: A. Lasorella, A. Iavarone.,Correspondence and requests for materials should be addressed to A.L. or A.I. ;
| |
Collapse
|
27
|
Kim J, Zhang Y, Day J, Zhou H. MGLM: An R Package for Multivariate Categorical Data Analysis. THE R JOURNAL 2018; 10:73-90. [PMID: 32523781 DOI: 10.32614/rj-2018-015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Data with multiple responses is ubiquitous in modern applications. However, few tools are available for regression analysis of multivariate counts. The most popular multinomial-logit model has a very restrictive mean-variance structure, limiting its applicability to many data sets. This article introduces an R package MGLM, short for multivariate response generalized linear models, that expands the current tools for regression analysis of polytomous data. Distribution fitting, random number generation, regression, and sparse regression are treated in a unifying framework. The algorithm, usage, and implementation details are discussed.
Collapse
Affiliation(s)
- Juhyun Kim
- Department of Biostatistics, University of California, Los Angeles
| | | | - Joshua Day
- Department of Statistics, North Carolina State University
| | - Hua Zhou
- Department of Biostatistics, University of California, Los Angeles
| |
Collapse
|
28
|
Bolouri H, Farrar JE, Triche T, Ries RE, Lim EL, Alonzo TA, Ma Y, Moore R, Mungall AJ, Marra MA, Zhang J, Ma X, Liu Y, Liu Y, Auvil JMG, Davidsen TM, Gesuwan P, Hermida LC, Salhia B, Capone S, Ramsingh G, Zwaan CM, Noort S, Piccolo SR, Kolb EA, Gamis AS, Smith MA, Gerhard DS, Meshinchi S. The molecular landscape of pediatric acute myeloid leukemia reveals recurrent structural alterations and age-specific mutational interactions. Nat Med 2018; 24:103-112. [PMID: 29227476 PMCID: PMC5907936 DOI: 10.1038/nm.4439] [Citation(s) in RCA: 489] [Impact Index Per Article: 81.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 10/12/2017] [Indexed: 02/07/2023]
Abstract
We present the molecular landscape of pediatric acute myeloid leukemia (AML) and characterize nearly 1,000 participants in Children's Oncology Group (COG) AML trials. The COG-National Cancer Institute (NCI) TARGET AML initiative assessed cases by whole-genome, targeted DNA, mRNA and microRNA sequencing and CpG methylation profiling. Validated DNA variants corresponded to diverse, infrequent mutations, with fewer than 40 genes mutated in >2% of cases. In contrast, somatic structural variants, including new gene fusions and focal deletions of MBNL1, ZEB2 and ELF1, were disproportionately prevalent in young individuals as compared to adults. Conversely, mutations in DNMT3A and TP53, which were common in adults, were conspicuously absent from virtually all pediatric cases. New mutations in GATA2, FLT3 and CBL and recurrent mutations in MYC-ITD, NRAS, KRAS and WT1 were frequent in pediatric AML. Deletions, mutations and promoter DNA hypermethylation convergently impacted Wnt signaling, Polycomb repression, innate immune cell interactions and a cluster of zinc finger-encoding genes associated with KMT2A rearrangements. These results highlight the need for and facilitate the development of age-tailored targeted therapies for the treatment of pediatric AML.
Collapse
Affiliation(s)
- Hamid Bolouri
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Jason E Farrar
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences and Arkansas Children's Research Institute, Little Rock, Arkansas, USA
| | - Timothy Triche
- Jane Anne Nohl Division of Hematology, University of Southern California Norris Comprehensive Cancer Center, Los Angeles, California, USA
| | - Rhonda E Ries
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Emilia L Lim
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | - Todd A Alonzo
- Keck School of Medicine, University of Southern California, Los Angeles, California, USA
- Children's Oncology Group, Monrovia, California, USA
| | - Yussanne Ma
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | - Richard Moore
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | - Andrew J Mungall
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | - Marco A Marra
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | - Jinghui Zhang
- Division of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| | - Xiaotu Ma
- Division of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| | - Yu Liu
- Division of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| | - Yanling Liu
- Division of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| | | | - Tanja M Davidsen
- Office of Cancer Genomics, National Cancer Institute, Bethesda, Maryland, USA
| | - Patee Gesuwan
- Office of Cancer Genomics, National Cancer Institute, Bethesda, Maryland, USA
| | - Leandro C Hermida
- Office of Cancer Genomics, National Cancer Institute, Bethesda, Maryland, USA
| | - Bodour Salhia
- Department of Translational Genomics, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
| | - Stephen Capone
- Jane Anne Nohl Division of Hematology, University of Southern California Norris Comprehensive Cancer Center, Los Angeles, California, USA
| | - Giridharan Ramsingh
- Jane Anne Nohl Division of Hematology, University of Southern California Norris Comprehensive Cancer Center, Los Angeles, California, USA
| | - Christian Michel Zwaan
- Department of Pediatric Oncology, Erasmus MC-Sophia Children's Hospital, Rotterdam, the Netherlands
| | - Sanne Noort
- Department of Pediatric Oncology, Erasmus MC-Sophia Children's Hospital, Rotterdam, the Netherlands
| | - Stephen R Piccolo
- Department of Biology, Brigham Young University, Provo, Utah, USA
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA
| | - E Anders Kolb
- Nemours Center for Cancer and Blood Disorders, Alfred I. DuPont Hospital for Children, Wilmington, Delaware, USA
| | - Alan S Gamis
- Division of Hematology, Oncology and Bone Marrow Transplantation, Children's Mercy Hospitals and Clinics, Kansas City, Missouri, USA
| | - Malcolm A Smith
- Cancer Therapy Evaluation Program, National Cancer Institute, Bethesda, Maryland, USA
| | - Daniela S Gerhard
- Office of Cancer Genomics, National Cancer Institute, Bethesda, Maryland, USA
| | - Soheil Meshinchi
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| |
Collapse
|
29
|
Abstract
Many statistical learning methods such as matrix completion, matrix regression, and multiple response regression estimate a matrix of parameters. The nuclear norm regularization is frequently employed to achieve shrinkage and low rank solutions. To minimize a nuclear norm regularized loss function, a vital and most time-consuming step is singular value thresholding, which seeks the singular values of a large matrix exceeding a threshold and their associated singular vectors. Currently MATLAB lacks a function for singular value thresholding. Its built-in svds function computes the top r singular values/vectors by Lanczos iterative method but is only efficient for sparse matrix input, while aforementioned statistical learning algorithms perform singular value thresholding on dense but structured matrices. To address this issue, we provide a MATLAB wrapper function svt that implements singular value thresholding. It encompasses both top singular value decomposition and thresholding, handles both large sparse matrices and structured matrices, and reduces the computation cost in matrix learning algorithms.
Collapse
Affiliation(s)
- Cai Li
- North Carolina State University
| | - Hua Zhou
- University of California, Los Angeles
| |
Collapse
|
30
|
Bucci V, Tzen B, Li N, Simmons M, Tanoue T, Bogart E, Deng L, Yeliseyev V, Delaney ML, Liu Q, Olle B, Stein RR, Honda K, Bry L, Gerber GK. MDSINE: Microbial Dynamical Systems INference Engine for microbiome time-series analyses. Genome Biol 2016; 17:121. [PMID: 27259475 PMCID: PMC4893271 DOI: 10.1186/s13059-016-0980-6] [Citation(s) in RCA: 151] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Accepted: 05/06/2016] [Indexed: 12/11/2022] Open
Abstract
Predicting dynamics of host-microbial ecosystems is crucial for the rational design of bacteriotherapies. We present MDSINE, a suite of algorithms for inferring dynamical systems models from microbiome time-series data and predicting temporal behaviors. Using simulated data, we demonstrate that MDSINE significantly outperforms the existing inference method. We then show MDSINE’s utility on two new gnotobiotic mice datasets, investigating infection with Clostridium difficile and an immune-modulatory probiotic. Using these datasets, we demonstrate new capabilities, including accurate forecasting of microbial dynamics, prediction of stable sub-communities that inhibit pathogen growth, and identification of bacteria most crucial to community integrity in response to perturbations.
Collapse
Affiliation(s)
- Vanni Bucci
- Department of Biology, Program in Biotechnology and Biomedical Engineering, University of Massachusetts Dartmouth, 285 Old Westport Road, N. Dartmouth, MA, 02747, USA.
| | - Belinda Tzen
- Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Ave, Boston, MA, 02115, USA
| | - Ning Li
- Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Ave, Boston, MA, 02115, USA
| | - Matt Simmons
- Department of Biology, Program in Biotechnology and Biomedical Engineering, University of Massachusetts Dartmouth, 285 Old Westport Road, N. Dartmouth, MA, 02747, USA
| | - Takeshi Tanoue
- RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, 230-0045, Japan
| | - Elijah Bogart
- Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Ave, Boston, MA, 02115, USA
| | - Luxue Deng
- Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Ave, Boston, MA, 02115, USA
| | - Vladimir Yeliseyev
- Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Ave, Boston, MA, 02115, USA
| | - Mary L Delaney
- Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Ave, Boston, MA, 02115, USA
| | - Qing Liu
- Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Ave, Boston, MA, 02115, USA
| | - Bernat Olle
- Vedanta Biosciences, 501 Boylston Street, Suite 6102, Boston, MA, 02116, USA
| | - Richard R Stein
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA
| | - Kenya Honda
- RIKEN Center for Integrative Medical Sciences (IMS), Yokohama, Kanagawa, 230-0045, Japan
| | - Lynn Bry
- Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Ave, Boston, MA, 02115, USA
| | - Georg K Gerber
- Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Ave, Boston, MA, 02115, USA.
| |
Collapse
|