1
|
Dell'Olio A, Rubert J, Capozzi V, Tonezzer M, Betta E, Fogliano V, Biasioli F. Non-invasive VOCs detection to monitor the gut microbiota metabolism in-vitro. Sci Rep 2024; 14:15842. [PMID: 38982163 PMCID: PMC11233675 DOI: 10.1038/s41598-024-66303-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 07/01/2024] [Indexed: 07/11/2024] Open
Abstract
This work implemented a non-invasive volatile organic compounds (VOCs) monitoring approach to study how food components are metabolised by the gut microbiota in-vitro. The fermentability of a model food matrix rich in dietary fibre (oat bran), and a pure prebiotic (inulin), added to a minimal gut medium was compared by looking at global changes in the volatilome. The substrates were incubated with a stabilised human faecal inoculum over a 24-h period, and VOCs were monitored without interfering with biological processes. The fermentation was performed in nitrogen-filled vials, with controlled temperature, and tracked by automated headspace-solid-phase microextraction coupled with gas chromatography-mass spectrometry. To understand the molecular patterns over time, we applied a multivariate longitudinal statistical framework: repeated measurements-ANOVA simultaneous component analysis. The methodology was able to discriminate the studied groups by looking at VOCs temporal profiles. The volatilome showed a time-dependency that was more distinct after 12 h. Short to medium-chain fatty acids showed increased peak intensities, mainly for oat bran and for inulin, but with different kinetics. At the same time, alcohols, aldehydes, and esters showed distinct trends with discriminatory power. The proposed approach can be applied to study the intertwined pathways of gut microbiota food components interaction in-vitro.
Collapse
Affiliation(s)
- Andrea Dell'Olio
- Food Quality and Design, Wageningen University & Research, 6708 WG, Wageningen, Netherlands
- Reserach and Innovation Centre, Fondazione Edmund Mach, 39098, San Michele All'Adige, Italy
| | - Josep Rubert
- Food Quality and Design, Wageningen University & Research, 6708 WG, Wageningen, Netherlands
| | - Vittorio Capozzi
- Institute of Food Production Sciences, National Research Council, 71121, Foggia, Italy
| | - Matteo Tonezzer
- Reserach and Innovation Centre, Fondazione Edmund Mach, 39098, San Michele All'Adige, Italy
- Department of Chemical and Geological Sciences, University of Cagliari, 09042, Monserrato , Italy
| | - Emanuela Betta
- Reserach and Innovation Centre, Fondazione Edmund Mach, 39098, San Michele All'Adige, Italy
| | - Vincenzo Fogliano
- Food Quality and Design, Wageningen University & Research, 6708 WG, Wageningen, Netherlands
| | - Franco Biasioli
- Reserach and Innovation Centre, Fondazione Edmund Mach, 39098, San Michele All'Adige, Italy.
| |
Collapse
|
2
|
Wang X, Campuzano S, Guenne A, Mazéas L, Chapleur O. Inhibition of anaerobic digestion by various ammonia sources resulted in subtle differences in metabolite dynamics. CHEMOSPHERE 2024; 351:141157. [PMID: 38218245 DOI: 10.1016/j.chemosphere.2024.141157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/22/2023] [Accepted: 01/07/2024] [Indexed: 01/15/2024]
Abstract
The impact of ammonia on anaerobic digestion performance and microbial dynamics has been extensively studied, but the concurrent effect of anions brought by ammonium salt should not be neglected. This paper studied this effect using metabolomics and a time-course statistical framework. Metabolomics provides novel perspectives to study microbial processes and facilitates a more profound understanding at the metabolic level. The advanced statistical framework enables deciphering the complexity of large metabolomics data sets. More specifically, a series of lab-scale batch reactors were set up with different ammonia sources added. Samples of nine time points over the degradation were analyzed with liquid chromatography-mass spectrometry. A filtering procedure was applied to select the promising metabolomic peaks from 1262 peaks, followed by modeling their intensities across time. The metabolomic peaks with similar time profiles were clustered, evidencing the correlation of different biological processes. Differential analysis was performed to seek the differences in metabolite dynamics caused by different anions. Finally, tandem mass spectrometry and metabolite annotation provided further information on the molecular structure and possible metabolic pathways. For example, the consumption of 5-aminovaleric acid, a short-chain fatty acid obtained from l-lysine degradation, was slowed down by phosphates. Overall, by investigating the effect of anions on anaerobic digestion, our study demonstrated the effectiveness of metabolomics in providing detailed information in a set of samples from different experimental conditions. With the statistical framework, the approach enables capturing subtle differences in metabolite dynamics between samples while accounting for the differences caused by time variations.
Collapse
Affiliation(s)
- Xiaoqing Wang
- Université Paris-Saclay, INRAE, PRocédés biOtechnologiques au Service de l'Environnement, 92761, Antony, France
| | - Stephany Campuzano
- Université Paris-Saclay, INRAE, PRocédés biOtechnologiques au Service de l'Environnement, 92761, Antony, France
| | - Angéline Guenne
- Université Paris-Saclay, INRAE, PRocédés biOtechnologiques au Service de l'Environnement, 92761, Antony, France
| | - Laurent Mazéas
- Université Paris-Saclay, INRAE, PRocédés biOtechnologiques au Service de l'Environnement, 92761, Antony, France
| | - Olivier Chapleur
- Université Paris-Saclay, INRAE, PRocédés biOtechnologiques au Service de l'Environnement, 92761, Antony, France.
| |
Collapse
|
3
|
Ishibashi Y, Harada S, Eitaki Y, Kurihara A, Kato S, Kuwabara K, Iida M, Hirata A, Sata M, Matsumoto M, Shibuki T, Okamura T, Sugiyama D, Sato A, Amano K, Hirayama A, Sugimoto M, Soga T, Tomita M, Takebayashi T. A population-based urinary and plasma metabolomics study of environmental exposure to cadmium. Environ Health Prev Med 2024; 29:22. [PMID: 38556356 PMCID: PMC10992994 DOI: 10.1265/ehpm.23-00218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Accepted: 12/30/2023] [Indexed: 04/02/2024] Open
Abstract
BACKGROUND The application of metabolomics-based profiles in environmental epidemiological studies is a promising approach to refine the process of health risk assessment. We aimed to identify potential metabolomics-based profiles in urine and plasma for the detection of relatively low-level cadmium (Cd) exposure in large population-based studies. METHOD We analyzed 123 urinary metabolites and 94 plasma metabolites detected in fasting urine and plasma samples collected from 1,412 men and 2,022 women involved in the Tsuruoka Metabolomics Cohort Study. Regression analysis was performed for urinary N-acetyl-beta-D-glucosaminidase (NAG), plasma, and urinary metabolites as dependent variables, and urinary Cd (U-Cd, quartile) as an independent variable. The multivariable regression model included age, gender, systolic blood pressure, smoking, rice intake, BMI, glycated hemoglobin, low-density lipoprotein cholesterol, alcohol consumption, physical activity, educational history, dietary energy intake, urinary Na/K ratio, and uric acid. Pathway-network analysis was carried out to visualize the metabolite networks linked to Cd exposure. RESULT Urinary NAG was positively associated with U-Cd, but not at lower concentrations (Q2). Among urinary metabolites in the total population, 45 metabolites showed associations with U-Cd in the unadjusted and adjusted models after adjusting for the multiplicity of comparison with FDR. There were 12 urinary metabolites which showed consistent associations between Cd exposure from Q2 to Q4. Among plasma metabolites, six cations and one anion were positively associated with U-Cd, whereas alanine, creatinine, and isoleucine were negatively associated with U-Cd. Our results were robust by statistical adjustment of various confounders. Pathway-network analysis revealed metabolites and upstream regulator changes associated with mitochondria (ACACB, UCP2, and metabolites related to the TCA cycle). CONCLUSION These results suggested that U-Cd was associated with metabolites related to upstream mitochondrial dysfunction in a dose-dependent manner. Our data will help develop environmental Cd exposure profiles for human populations.
Collapse
Affiliation(s)
- Yoshiki Ishibashi
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Sei Harada
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
| | - Yoko Eitaki
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Ayako Kurihara
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Suzuka Kato
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Kazuyo Kuwabara
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Miho Iida
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Aya Hirata
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Mizuki Sata
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Minako Matsumoto
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Takuma Shibuki
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Tomonori Okamura
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
| | - Daisuke Sugiyama
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
- Faculty of Nursing and Medical Care, Keio University, Fujisawa, Kanagawa, Japan
| | - Asako Sato
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
| | - Kaori Amano
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
| | - Akiyoshi Hirayama
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
| | - Masahiro Sugimoto
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
| | - Tomoyoshi Soga
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Kanagawa, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
- Faculty of Environment and Information Studies, Keio University, Fujisawa, Kanagawa, Japan
| | - Toru Takebayashi
- Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan
| |
Collapse
|
4
|
Strulovici-Barel Y, Rostami MR, Kaner RJ, Mezey JG, Crystal RG. Serial Sampling of the Small Airway Epithelium to Identify Persistent Smoking-dysregulated Genes. Am J Respir Crit Care Med 2023; 208:780-790. [PMID: 37531632 PMCID: PMC10563181 DOI: 10.1164/rccm.202204-0786oc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 08/02/2023] [Indexed: 08/04/2023] Open
Abstract
Rationale: The small airway epithelium (beyond the sixth generation), the initiation site of smoking-induced airway disorders, is highly sensitive to the stress of smoking. Because of variations over time in smoking habits, the small airway epithelium transcriptome is dynamic, fluctuating not only among smokers but also within each smoker. Objectives: To perform accurate assessment of the smoking-related dysregulation of the human small airway epithelium despite the variation of smoking within the same individual and of the effects of smoking cessation on the dysregulated transcriptome. Methods: We conducted serial sampling of the same smokers and nonsmoker control subjects over time to identify persistent smoking dysregulation of the biology of the small airway epithelium over 1 year. We conducted serial sampling of smokers who quit smoking, before and after smoking cessation, to assess the effect of smoking cessation on the smoking-dysregulated genes. Measurements and Main Results: Repeated measures ANOVA of the small airway epithelium transcriptome sampled four times in the same individuals over 1 year enabled the identification of 475 persistent smoking-dysregulated genes. Most genes were normalized after 12 months of smoking cessation; however, 53 (11%) genes, including CYP1B1, PIR, ME1, and TRIM16, remained persistently abnormally expressed. Dysregulated pathways enriched with the nonreversible genes included xenobiotic metabolism signaling, bupropion degradation, and nicotine degradation. Conclusions: Analysis of repetitive sampling of the same individuals identified persistent smoking-induced dysregulation of the small airway epithelium transcriptome and the effect of smoking cessation. These results help identify targets for the development of therapies that can be applicable to smoking-related airway diseases.
Collapse
Affiliation(s)
| | | | - Robert J. Kaner
- Department of Genetic Medicine and
- Department of Medicine, Weill Cornell Medical College, New York, New York; and
| | - Jason G. Mezey
- Department of Genetic Medicine and
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York
| | - Ronald G. Crystal
- Department of Genetic Medicine and
- Department of Medicine, Weill Cornell Medical College, New York, New York; and
| |
Collapse
|
5
|
Artymowicz M, Struck-Lewicka W, Wiczling P, Markuszewski M, Markuszewski MJ, Siluk D. Targeted quantitative metabolomics with a linear mixed-effect model for analysis of urinary nucleosides and deoxynucleosides from bladder cancer patients before and after tumor resection. Anal Bioanal Chem 2023; 415:5511-5528. [PMID: 37460824 PMCID: PMC10444683 DOI: 10.1007/s00216-023-04826-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/07/2023] [Accepted: 06/26/2023] [Indexed: 08/23/2023]
Abstract
In the present study, we developed and validated a fast, simple, and sensitive quantitative method for the simultaneous determination of eleven nucleosides and deoxynucleosides from urine samples. The analyses were performed with the use of liquid chromatography coupled with triple quadrupole mass spectrometry. The sample pretreatment procedure was limited to centrifugation, vortex mixing of urine samples with a methanol/water solution (1:1, v/v), evaporation and dissolution steps. The analysis lasted 20 min and was performed in dynamic multiple reaction monitoring mode (dMRM) in positive polarity. Process validation was conducted to determine the linearity, precision, accuracy, limit of quantification, stability, recovery and matrix effect. All validation procedures were carried out in accordance with current FDA and EMA regulations. The validated method was applied for the analysis of 133 urine samples derived from bladder cancer patients before tumor resection and 24 h, 2 weeks, and 3, 6, 9, and 12 months after the surgery. The obtained data sets were analyzed using a linear mixed-effect model. The analysis revealed that concentration level of 2-methylthioadenosine was decreased, while for inosine, it was increased 24 h after tumor resection in comparison to the preoperative state. The presented quantitative longitudinal study of urine nucleosides and deoxynucleosides before and up to 12 months after bladder tumor resection brings additional prospective insight into the metabolite excretion pattern in bladder cancer disease. Moreover, incurred sample reanalysis was performed proving the robustness and repeatability of the developed targeted method.
Collapse
Affiliation(s)
- Małgorzata Artymowicz
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Aleja Gen. J. Hallera 107, 80-416, Gdańsk, Poland
| | - Wiktoria Struck-Lewicka
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Aleja Gen. J. Hallera 107, 80-416, Gdańsk, Poland
| | - Paweł Wiczling
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Aleja Gen. J. Hallera 107, 80-416, Gdańsk, Poland
| | - Marcin Markuszewski
- Department of Urology, Medical University of Gdańsk, Mariana Smoluchowskiego 17, 80-214, Gdańsk, Poland
| | - Michał J Markuszewski
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Aleja Gen. J. Hallera 107, 80-416, Gdańsk, Poland
| | - Danuta Siluk
- Department of Biopharmaceutics and Pharmacodynamics, Medical University of Gdańsk, Aleja Gen. J. Hallera 107, 80-416, Gdańsk, Poland.
| |
Collapse
|
6
|
Huang T, Staniak M, da Veiga Leprevost F, Figueroa-Navedo AM, Ivanov AR, Nesvizhskii AI, Choi M, Vitek O. Statistical Detection of Differentially Abundant Proteins in Experiments with Repeated Measures Designs and Isobaric Labeling. J Proteome Res 2023; 22:2641-2659. [PMID: 37467362 PMCID: PMC11090052 DOI: 10.1021/acs.jproteome.3c00155] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2023]
Abstract
Repeated measures experimental designs, which quantify proteins in biological subjects repeatedly over multiple experimental conditions or times, are commonly used in mass spectrometry-based proteomics. Such designs distinguish the biological variation within and between the subjects and increase the statistical power of detecting within-subject changes in protein abundance. Meanwhile, proteomics experiments increasingly incorporate tandem mass tag (TMT) labeling, a multiplexing strategy that gains both relative protein quantification accuracy and sample throughput. However, combining repeated measures and TMT multiplexing in a large-scale investigation presents statistical challenges due to unique interplays of between-mixture, within-mixture, between-subject, and within-subject variation. This manuscript proposes a family of linear mixed-effects models for differential analysis of proteomics experiments with repeated measures and TMT multiplexing. These models decompose the variation in the data into the contributions from its sources as appropriate for the specifics of each experiment, enable statistical inference of differential protein abundance, and recognize a difference in the uncertainty of between-subject versus within-subject comparisons. The proposed family of models is implemented in the R/Bioconductor package MSstatsTMT v2.2.0. Evaluations of four simulated datasets and four investigations answering diverse biological questions demonstrated the value of this approach as compared to the existing general-purpose approaches and implementations.
Collapse
Affiliation(s)
- Ting Huang
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Mateusz Staniak
- Institute of Mathematics, University of Wrocław, Wrocław, Poland
| | | | - Amanda M. Figueroa-Navedo
- Department of Chemistry and Chemical Biology, Barnett Institute of Biological and Chemical Analysis, Northeastern University, Boston, MA, USA
| | - Alexander R. Ivanov
- Department of Chemistry and Chemical Biology, Barnett Institute of Biological and Chemical Analysis, Northeastern University, Boston, MA, USA
| | | | - Meena Choi
- Departments of Microchemistry, Proteomics & Lipidomics, Genentech, South San Francisco, CA, USA
| | - Olga Vitek
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| |
Collapse
|
7
|
Benchmarking tools for detecting longitudinal differential expression in proteomics data allows establishing a robust reproducibility optimization regression approach. Nat Commun 2022; 13:7877. [PMID: 36550114 PMCID: PMC9780321 DOI: 10.1038/s41467-022-35564-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 12/09/2022] [Indexed: 12/24/2022] Open
Abstract
Quantitative proteomics has matured into an established tool and longitudinal proteomics experiments have begun to emerge. However, no effective, simple-to-use differential expression method for longitudinal proteomics data has been released. Typically, such data is noisy, contains missing values, and has only few time points and biological replicates. To address this need, we provide a comprehensive evaluation of several existing differential expression methods for high-throughput longitudinal omics data and introduce a Robust longitudinal Differential Expression (RolDE) approach. The methods are evaluated using over 3000 semi-simulated spike-in proteomics datasets and three large experimental datasets. In the comparisons, RolDE performs overall best; it is most tolerant to missing values, displays good reproducibility and is the top method in ranking the results in a biologically meaningful way. Furthermore, RolDE is suitable for different types of data with typically unknown patterns in longitudinal expression and can be applied by non-experienced users.
Collapse
|
8
|
Bar N, Nikparvar B, Jayavelu ND, Roessler FK. Constrained Fourier estimation of short-term time-series gene expression data reduces noise and improves clustering and gene regulatory network predictions. BMC Bioinformatics 2022; 23:330. [PMID: 35945515 PMCID: PMC9364503 DOI: 10.1186/s12859-022-04839-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 07/12/2022] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Biological data suffers from noise that is inherent in the measurements. This is particularly true for time-series gene expression measurements. Nevertheless, in order to to explore cellular dynamics, scientists employ such noisy measurements in predictive and clustering tools. However, noisy data can not only obscure the genes temporal patterns, but applying predictive and clustering tools on noisy data may yield inconsistent, and potentially incorrect, results. RESULTS To reduce the noise of short-term (< 48 h) time-series expression data, we relied on the three basic temporal patterns of gene expression: waves, impulses and sustained responses. We constrained the estimation of the true signals to these patterns by estimating the parameters of first and second-order Fourier functions and using the nonlinear least-squares trust-region optimization technique. Our approach lowered the noise in at least 85% of synthetic time-series expression data, significantly more than the spline method ([Formula: see text]). When the data contained a higher signal-to-noise ratio, our method allowed downstream network component analyses to calculate consistent and accurate predictions, particularly when the noise variance was high. Conversely, these tools led to erroneous results from untreated noisy data. Our results suggest that at least 5-7 time points are required to efficiently de-noise logarithmic scaled time-series expression data. Investing in sampling additional time points provides little benefit to clustering and prediction accuracy. CONCLUSIONS Our constrained Fourier de-noising method helps to cluster noisy gene expression and interpret dynamic gene networks more accurately. The benefit of noise reduction is large and can constitute the difference between a successful application and a failing one.
Collapse
Affiliation(s)
- Nadav Bar
- grid.5947.f0000 0001 1516 2393Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Sælandsvei 4, Trondheim, NO-7491 Norway
| | - Bahareh Nikparvar
- grid.5947.f0000 0001 1516 2393Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Sælandsvei 4, Trondheim, NO-7491 Norway
| | - Naresh Doni Jayavelu
- grid.34477.330000000122986657Division of Medical Genetics, Department of Medicine, University of Washington Seattle, Seattle, WA 98195-7720 USA
| | - Fabienne Krystin Roessler
- grid.5947.f0000 0001 1516 2393Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Sælandsvei 4, Trondheim, NO-7491 Norway
| |
Collapse
|
9
|
Kodikara S, Ellul S, Lê Cao KA. Statistical challenges in longitudinal microbiome data analysis. Brief Bioinform 2022; 23:bbac273. [PMID: 35830875 PMCID: PMC9294433 DOI: 10.1093/bib/bbac273] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 05/28/2022] [Accepted: 06/12/2022] [Indexed: 11/13/2022] Open
Abstract
The microbiome is a complex and dynamic community of microorganisms that co-exist interdependently within an ecosystem, and interact with its host or environment. Longitudinal studies can capture temporal variation within the microbiome to gain mechanistic insights into microbial systems; however, current statistical methods are limited due to the complex and inherent features of the data. We have identified three analytical objectives in longitudinal microbial studies: (1) differential abundance over time and between sample groups, demographic factors or clinical variables of interest; (2) clustering of microorganisms evolving concomitantly across time and (3) network modelling to identify temporal relationships between microorganisms. This review explores the strengths and limitations of current methods to fulfill these objectives, compares different methods in simulation and case studies for objectives (1) and (2), and highlights opportunities for further methodological developments. R tutorials are provided to reproduce the analyses conducted in this review.
Collapse
Affiliation(s)
- Saritha Kodikara
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Royal Parade, 3052, Victoria, Australia
| | - Susan Ellul
- Murdoch Children’s Research Institute and Department of Paediatrics, University of Melbourne, Bouverie Street, 3052, Victoria, Australia
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, The University of Melbourne, Royal Parade, 3052, Victoria, Australia
| |
Collapse
|
10
|
Kong G, Lê Cao KA, Hannan AJ. Alterations in the Gut Fungal Community in a Mouse Model of Huntington's Disease. Microbiol Spectr 2022; 10:e0219221. [PMID: 35262396 PMCID: PMC9045163 DOI: 10.1128/spectrum.02192-21] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 02/14/2022] [Indexed: 12/26/2022] Open
Abstract
Huntington's disease (HD) is a neurodegenerative disorder caused by a trinucleotide expansion in the HTT gene, which is expressed throughout the brain and body, including the gut epithelium and enteric nervous system. Afflicted individuals suffer from progressive impairments in motor, psychiatric, and cognitive faculties, as well as peripheral deficits, including the alteration of the gut microbiome. However, studies characterizing the gut microbiome in HD have focused entirely on the bacterial component, while the fungal community (mycobiome) has been overlooked. The gut mycobiome has gained recognition for its role in host homeostasis and maintenance of the gut epithelial barrier. We aimed to characterize the gut mycobiome profile in HD using fecal samples collected from the R6/1 transgenic mouse model (and wild-type littermate controls) from 4 to 12 weeks of age, corresponding to presymptomatic through to early disease stages. Shotgun sequencing was performed on fecal DNA samples, followed by metagenomic analyses. The HD gut mycobiome beta diversity was significantly different from that of wild-type littermates at 12 weeks of age, while no genotype differences were observed at the earlier time points. Similarly, greater alpha diversity was observed in the HD mice by 12 weeks of age. Key taxa, including Malassezia restricta, Yarrowia lipolytica, and Aspergillus species, were identified as having a negative association with HD. Furthermore, integration of the bacterial and fungal data sets at 12 weeks of age identified negative correlations between the HD-associated fungal species and Lactobacillus reuteri. These findings provide new insights into gut microbiome alterations in HD and may help identify novel therapeutic targets. IMPORTANCE Huntington's disease (HD) is a fatal neurodegenerative disorder affecting both the mind and body. We have recently discovered that gut bacteria are disrupted in HD. The present study provides the first evidence of an altered gut fungal community (mycobiome) in HD. The genomes of many thousands of gut microbes were sequenced and used to assess "metagenomics" in particular the different types of fungal species in the HD versus control gut, in a mouse model. At an early disease stage, before the onset of symptoms, the overall gut mycobiome structure (array of fungi) in HD mice was distinct from that of their wild-type littermates. Alterations of multiple key fungi species were identified as being associated with the onset of disease symptoms, some of which showed strong correlations with the gut bacterial community. This study highlights the potential role of gut fungi in HD and may facilitate the development of novel therapeutic approaches.
Collapse
Affiliation(s)
- Geraldine Kong
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Melbourne Brain Centre, Parkville, Australia
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Parkville, Australia
| | - Anthony J. Hannan
- Florey Institute of Neuroscience and Mental Health, University of Melbourne, Melbourne Brain Centre, Parkville, Australia
- Department of Anatomy and Physiology, University of Melbourne, Parkville, Australia
| |
Collapse
|
11
|
Vantini M, Mannerström H, Rautio S, Ahlfors H, Stockinger B, Lähdesmäki H. PairGP: Gaussian process modeling of longitudinal data from paired multi-condition studies. Comput Biol Med 2022; 143:105268. [PMID: 35131609 DOI: 10.1016/j.compbiomed.2022.105268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 01/23/2022] [Accepted: 01/23/2022] [Indexed: 11/30/2022]
Abstract
High-throughput technologies produce gene expression time-series data that need fast and specialized algorithms to be processed. While current methods already deal with different aspects, such as the non-stationarity of the process and the temporal correlation, they often fail to take into account the pairing among replicates. We propose PairGP, a non-stationary Gaussian process method to compare gene expression time-series across several conditions that can account for paired longitudinal study designs and can identify groups of conditions that have different gene expression dynamics. We demonstrate the method on both simulated data and previously unpublished RNA sequencing (RNA-seq) time-series with five conditions. The results show the advantage of modeling the pairing effect to better identify groups of conditions with different dynamics. The pairing effect model displays good capabilities of selecting the most probable grouping of conditions even in the presence of a high number of conditions. The developed method is of general application and can be applied to any gene expression time series dataset. The model can identify common replicate effects among the samples coming from the same biological replicates and model those as separate components. Learning the pairing effect as a separate component, not only allows us to exclude it from the model to get better estimates of the condition effects, but also to improve the precision of the model selection process. The pairing effect that was accounted before as noise, is now identified as a separate component, resulting in more accurate and explanatory models of the data.
Collapse
Affiliation(s)
- Michele Vantini
- Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02 150, Finland.
| | - Henrik Mannerström
- Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02 150, Finland.
| | - Sini Rautio
- Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02 150, Finland.
| | - Helena Ahlfors
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, United Kingdom.
| | - Brigitta Stockinger
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, United Kingdom.
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, Konemiehentie 2, Espoo, 02 150, Finland.
| |
Collapse
|
12
|
Identifying temporal and spatial patterns of variation from multimodal data using MEFISTO. Nat Methods 2022; 19:179-186. [PMID: 35027765 PMCID: PMC8828471 DOI: 10.1038/s41592-021-01343-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 11/05/2021] [Indexed: 01/04/2023]
Abstract
Factor analysis is a widely used method for dimensionality reduction in genome biology, with applications from personalized health to single-cell biology. Existing factor analysis models assume independence of the observed samples, an assumption that fails in spatio-temporal profiling studies. Here we present MEFISTO, a flexible and versatile toolbox for modeling high-dimensional data when spatial or temporal dependencies between the samples are known. MEFISTO maintains the established benefits of factor analysis for multimodal data, but enables the performance of spatio-temporally informed dimensionality reduction, interpolation, and separation of smooth from non-smooth patterns of variation. Moreover, MEFISTO can integrate multiple related datasets by simultaneously identifying and aligning the underlying patterns of variation in a data-driven manner. To illustrate MEFISTO, we apply the model to different datasets with spatial or temporal resolution, including an evolutionary atlas of organ development, a longitudinal microbiome study, a single-cell multi-omics atlas of mouse gastrulation and spatially resolved transcriptomics. MEFISTO models bulk and single-cell multi-omics data with temporal or spatial dependencies for interpretable pattern discovery and integration.
Collapse
|
13
|
Bodein A, Scott-Boyer MP, Perin O, Lê Cao KA, Droit A. timeOmics: an R package for longitudinal multi-omics data integration. Bioinformatics 2022; 38:577-579. [PMID: 34554215 DOI: 10.1093/bioinformatics/btab664] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/17/2021] [Accepted: 09/15/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Multi-omics data integration enables the global analysis of biological systems and discovery of new biological insights. Multi-omics experimental designs have been further extended with a longitudinal dimension to study dynamic relationships between molecules. However, methods that integrate longitudinal multi-omics data are still in their infancy. RESULTS We introduce the R package timeOmics, a generic analytical framework for the integration of longitudinal multi-omics data. The framework includes pre-processing, modeling and clustering to identify molecular features strongly associated with time. We illustrate this framework in a case study to detect seasonal patterns of mRNA, metabolites, gut taxa and clinical variables in patients with diabetes mellitus from the integrative Human Microbiome Project. AVAILABILITYAND IMPLEMENTATION timeOmics is available on Bioconductor and github.com/abodein/timeOmics. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC G1V 0A6, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC G1V 0A6, Canada
| | - Olivier Perin
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois 93600, France
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC G1V 0A6, Canada
| |
Collapse
|
14
|
Sharma A, Johnson KB, Bie B, Rhoades EE, Sen A, Kida Y, Hockings J, Gatta A, Davenport J, Arcangelini C, Ritzu J, DeVecchio J, Hughen R, Wei M, Thomas Budd G, Lynn Henry N, Eng C, Foss J, Rotroff DM. A Multimodal Approach to Discover Biomarkers for Taxane-Induced Peripheral Neuropathy (TIPN): A Study Protocol. Technol Cancer Res Treat 2022; 21:15330338221127169. [PMID: 36172750 PMCID: PMC9523841 DOI: 10.1177/15330338221127169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Introduction: Taxanes are a class of chemotherapeutics commonly used to treat various solid tumors, including breast and ovarian cancers. Taxane-induced peripheral neuropathy (TIPN) occurs in up to 70% of patients, impacting quality of life both during and after treatment. TIPN typically manifests as tingling and numbness in the hands and feet and can cause irreversible loss of function of peripheral nerves. TIPN can be dose-limiting, potentially impacting clinical outcomes. The mechanisms underlying TIPN are poorly understood. As such, there are limited treatment options and no tools to provide early detection of those who will develop TIPN. Although some patients may have a genetic predisposition, genetic biomarkers have been inconsistent in predicting chemotherapy-induced peripheral neuropathy (CIPN). Moreover, other molecular markers (eg, metabolites, mRNA, miRNA, proteins) may be informative for predicting CIPN, but remain largely unexplored. We anticipate that combinations of multiple biomarkers will be required to consistently predict those who will develop TIPN. Methods: To address this clinical gap of identifying patients at risk of TIPN, we initiated the Genetics and Inflammatory Markers for CIPN (GENIE) study. This longitudinal multicenter observational study uses a novel, multimodal approach to evaluate genomic variation, metabolites, DNA methylation, gene expression, and circulating cytokines/chemokines prior to, during, and after taxane treatment in 400 patients with breast cancer. Molecular and patient reported data will be collected prior to, during, and after taxane therapy. Multi-modal data will be used to develop a set of comprehensive predictive biomarker signatures of TIPN. Conclusion: The goal of this study is to enable early detection of patients at risk of developing TIPN, provide a tool to modify taxane treatment to minimize morbidity from TIPN, and improved patient quality of life. Here we provide a brief review of the current state of research into CIPN and TIPN and introduce the GENIE study design.
Collapse
Affiliation(s)
- Anukriti Sharma
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, OH, USA
| | - Ken B. Johnson
- Department of Anesthesiology, University of Utah, UT, USA
| | - Bihua Bie
- Department of Anesthesiology, Cleveland Clinic, OH, USA
| | | | - Alper Sen
- Department of Anesthesiology, University of Utah, UT, USA
| | - Yuri Kida
- Department of Anesthesiology, University of Utah, UT, USA
| | - Jennifer Hockings
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, OH, USA
- Department of Pharmacy, Cleveland Clinic, OH, USA
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Alycia Gatta
- Taussig Cancer Institute, Cleveland Clinic, OH, USA
| | | | | | | | - Jennifer DeVecchio
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, OH, USA
| | - Ron Hughen
- Department of Anesthesiology, University of Utah, UT, USA
| | - Mei Wei
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT
| | - G. Thomas Budd
- Taussig Cancer Institute, Cleveland Clinic, OH, USA
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - N. Lynn Henry
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, USA
| | - Charis Eng
- Taussig Cancer Institute, Cleveland Clinic, OH, USA
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, OH, USA
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Department of Genetics and Genome Sciences, Case Western Reserve University School of Medicine, Cleveland, OH, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Joseph Foss
- Department of Anesthesiology, Cleveland Clinic, OH, USA
| | - Daniel M. Rotroff
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, OH, USA
- Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
- Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH, USA
- Endocrinology and Metabolism Institute, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
15
|
Bodein A, Scott-Boyer MP, Perin O, Lê Cao KA, Droit A. Interpretation of network-based integration from multi-omics longitudinal data. Nucleic Acids Res 2021; 50:e27. [PMID: 34883510 PMCID: PMC8934642 DOI: 10.1093/nar/gkab1200] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 10/19/2021] [Accepted: 11/22/2021] [Indexed: 12/26/2022] Open
Abstract
Multi-omics integration is key to fully understand complex biological processes in an holistic manner. Furthermore, multi-omics combined with new longitudinal experimental design can unreveal dynamic relationships between omics layers and identify key players or interactions in system development or complex phenotypes. However, integration methods have to address various experimental designs and do not guarantee interpretable biological results. The new challenge of multi-omics integration is to solve interpretation and unlock the hidden knowledge within the multi-omics data. In this paper, we go beyond integration and propose a generic approach to face the interpretation problem. From multi-omics longitudinal data, this approach builds and explores hybrid multi-omics networks composed of both inferred and known relationships within and between omics layers. With smart node labelling and propagation analysis, this approach predicts regulation mechanisms and multi-omics functional modules. We applied the method on 3 case studies with various multi-omics designs and identified new multi-layer interactions involved in key biological functions that could not be revealed with single omics analysis. Moreover, we highlighted interplay in the kinetics that could help identify novel biological mechanisms. This method is available as an R package netOmics to readily suit any application.
Collapse
Affiliation(s)
- Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Perin
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| |
Collapse
|
16
|
Chapleur O, Poirier S, Guenne A, Lê Cao KA. Time-course analysis of metabolomic and microbial responses in anaerobic digesters exposed to ammonia. CHEMOSPHERE 2021; 283:131309. [PMID: 34467946 DOI: 10.1016/j.chemosphere.2021.131309] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 06/01/2021] [Accepted: 06/19/2021] [Indexed: 06/13/2023]
Abstract
Omics longitudinal studies are effective experimental designs to inform on the stability and dynamics of microbial communities in response to perturbations, but time-course analytical frameworks are required to fully exploit the temporal information acquired in this context. In this study we investigate the influence of ammonia on the stability of anaerobic digestion (AD) microbiome with a new statistical framework. Ammonia can severely reduce AD performance. Understanding how it affects microbial communities development and the degradation progress is a key operational issue to propose more stable processes. Thirty batch digesters were set-up with different levels of ammonia. Microbial community structure and metabolomic profiles were monitored with 16 S-metabarcoding and GCMS (gas-chromatography-mass-spectrometry). Digesters were first grouped according to similar degradation performances. Within each group, time profiles of OTUs and metabolites were modelled, then clustered into similar time trajectories, evidencing for example a syntrophic interaction between Syntrophomonas and Methanoculleus that was maintained up to 387 mg FAN/L. Metabolites resulting from organic matter fermentation, such as dehydroabietic or phytanic acid, decreased with increasing ammonia levels. Our analytical framework enabled to fully account for time variability and integrate this parameter in data analysis.
Collapse
Affiliation(s)
- Olivier Chapleur
- Université Paris-Saclay, INRAE, PRocédés biOtechnologiques au Service de l'Environnement, 92761, Antony, France.
| | - Simon Poirier
- Université Paris-Saclay, INRAE, PRocédés biOtechnologiques au Service de l'Environnement, 92761, Antony, France.
| | - Angéline Guenne
- Université Paris-Saclay, INRAE, PRocédés biOtechnologiques au Service de l'Environnement, 92761, Antony, France.
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics and the School of Mathematics and Statistics, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
17
|
Escoto-Sandoval C, Flores-Díaz A, Reyes-Valdés MH, Ochoa-Alejo N, Martínez O. A method to analyze time expression profiles demonstrated in a database of chili pepper fruit development. Sci Rep 2021; 11:13181. [PMID: 34162966 PMCID: PMC8222228 DOI: 10.1038/s41598-021-92672-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/14/2021] [Indexed: 12/13/2022] Open
Abstract
RNA-Seq experiments allow genome-wide estimation of relative gene expression. Estimation of gene expression at different time points generates time expression profiles of phenomena of interest, as for example fruit development. However, such profiles can be complex to analyze and interpret. We developed a methodology that transforms original RNA-Seq data from time course experiments into standardized expression profiles, which can be easily interpreted and analyzed. To exemplify this methodology we used RNA-Seq data obtained from 12 accessions of chili pepper (Capsicum annuum L.) during fruit development. All relevant data, as well as functions to perform analyses and interpretations from this experiment, were gathered into a publicly available R package: “Salsa”. Here we explain the rational of the methodology and exemplify the use of the package to obtain valuable insights into the multidimensional time expression changes that occur during chili pepper fruit development. We hope that this tool will be of interest for researchers studying fruit development in chili pepper as well as in other angiosperms.
Collapse
Affiliation(s)
- Christian Escoto-Sandoval
- Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Unidad de Genómica Avanzada (Langebio), Irapuato, Guanajuato, 36824, Mexico
| | - Alan Flores-Díaz
- Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Unidad de Genómica Avanzada (Langebio), Irapuato, Guanajuato, 36824, Mexico
| | - M Humberto Reyes-Valdés
- Department of Plant Breeding, Universidad Autónoma Agraria Antonio Narro, Saltillo, Coahuila, 25315, Mexico
| | - Neftalí Ochoa-Alejo
- Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Departamento de Ingeniería Genética, Unidad Irapuato, Irapuato, Guanajuato, 36824, Mexico
| | - Octavio Martínez
- Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (Cinvestav), Unidad de Genómica Avanzada (Langebio), Irapuato, Guanajuato, 36824, Mexico.
| |
Collapse
|
18
|
Hoffman GE, Roussos P. Dream: powerful differential expression analysis for repeated measures designs. Bioinformatics 2021; 37:192-201. [PMID: 32730587 DOI: 10.1093/bioinformatics/btaa687] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 07/13/2020] [Accepted: 07/23/2020] [Indexed: 01/08/2023] Open
Abstract
SUMMARY Large-scale transcriptome studies with multiple samples per individual are widely used to study disease biology. Yet, current methods for differential expression are inadequate for cross-individual testing for these repeated measures designs. Most problematic, we observe across multiple datasets that current methods can give reproducible false-positive findings that are driven by genetic regulation of gene expression, yet are unrelated to the trait of interest. Here, we introduce a statistical software package, dream, that increases power, controls the false positive rate, enables multiple types of hypothesis tests, and integrates with standard workflows. In 12 analyses in 6 independent datasets, dream yields biological insight not found with existing software while addressing the issue of reproducible false-positive findings. AVAILABILITY AND IMPLEMENTATION Dream is available within the variancePartition Bioconductor package at http://bioconductor.org/packages/variancePartition. CONTACT gabriel.hoffman@mssm.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel E Hoffman
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Panos Roussos
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Mental Illness Research, Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY 10468, USA
| |
Collapse
|
19
|
Oh VKS, Li RW. Temporal Dynamic Methods for Bulk RNA-Seq Time Series Data. Genes (Basel) 2021; 12:352. [PMID: 33673721 PMCID: PMC7997275 DOI: 10.3390/genes12030352] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 02/19/2021] [Accepted: 02/22/2021] [Indexed: 02/06/2023] Open
Abstract
Dynamic studies in time course experimental designs and clinical approaches have been widely used by the biomedical community. These applications are particularly relevant in stimuli-response models under environmental conditions, characterization of gradient biological processes in developmental biology, identification of therapeutic effects in clinical trials, disease progressive models, cell-cycle, and circadian periodicity. Despite their feasibility and popularity, sophisticated dynamic methods that are well validated in large-scale comparative studies, in terms of statistical and computational rigor, are less benchmarked, comparing to their static counterparts. To date, a number of novel methods in bulk RNA-Seq data have been developed for the various time-dependent stimuli, circadian rhythms, cell-lineage in differentiation, and disease progression. Here, we comprehensively review a key set of representative dynamic strategies and discuss current issues associated with the detection of dynamically changing genes. We also provide recommendations for future directions for studying non-periodical, periodical time course data, and meta-dynamic datasets.
Collapse
Affiliation(s)
- Vera-Khlara S. Oh
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA;
- Department of Computer Science and Statistics, College of Natural Sciences, Jeju National University, Jeju City 63243, Korea
| | - Robert W. Li
- Animal Genomics and Improvement Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA;
| |
Collapse
|
20
|
An integrated metagenomics and metabolomics approach implicates the microbiota-gut-brain axis in the pathogenesis of Huntington's disease. Neurobiol Dis 2020; 148:105199. [PMID: 33249136 DOI: 10.1016/j.nbd.2020.105199] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 10/21/2020] [Accepted: 11/23/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Huntington's disease (HD) is an autosomal dominant neurodegenerative disorder with onset and severity of symptoms influenced by various environmental factors. Recent discoveries have highlighted the importance of the gastrointestinal microbiome in mediating the gut-brain-axis bidirectional communication via circulating factors. Using shotgun sequencing, we investigated the gut microbiome composition in the R6/1 transgenic mouse model of HD from 4 to 12 weeks of age (early adolescent through to adult stages). Targeted metabolomics was also performed on the blood plasma of these mice (n = 9 per group) at 12 weeks of age to investigate potential effects of gut dysbiosis on the plasma metabolome profile. RESULTS Modelled time profiles of each species, KEGG Orthologs and bacterial genes, revealed heightened volatility in the R6/1 mice, indicating potential early effects of the HD mutation in the gut. In addition to gut dysbiosis in R6/1 mice at 12 weeks of age, gut microbiome function was perturbed. In particular, the butanoate metabolism pathway was elevated, suggesting increased production of the protective SCFA, butyrate, in the gut. No significant alterations were found in the plasma butyrate and propionate levels in the R6/1 mice at 12 weeks of age. The statistical integration of the metagenomics and metabolomics unraveled several Bacteroides species that were negatively correlated with ATP and pipecolic acid in the plasma. CONCLUSIONS The present study revealed the instability of the HD gut microbiome during the pre-motor symptomatic stage of the disease which may have dire consequences on the host's health. Perturbation of the HD gut microbiome function prior to significant cognitive and motor dysfunction suggest the potential role of the gut in modulating the pathogenesis of HD, potentially via specific altered plasma metabolites which mediate gut-brain signaling.
Collapse
|
21
|
Ahn H, Jung I, Chae H, Kang D, Jung W, Kim S. HTRgene: a computational method to perform the integrated analysis of multiple heterogeneous time-series data: case analysis of cold and heat stress response signaling genes in Arabidopsis. BMC Bioinformatics 2019; 20:588. [PMID: 31787073 PMCID: PMC6886170 DOI: 10.1186/s12859-019-3072-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Background Integrated analysis that uses multiple sample gene expression data measured under the same stress can detect stress response genes more accurately than analysis of individual sample data. However, the integrated analysis is challenging since experimental conditions (strength of stress and the number of time points) are heterogeneous across multiple samples. Results HTRgene is a computational method to perform the integrated analysis of multiple heterogeneous time-series data measured under the same stress condition. The goal of HTRgene is to identify “response order preserving DEGs” that are defined as genes not only which are differentially expressed but also whose response order is preserved across multiple samples. The utility of HTRgene was demonstrated using 28 and 24 time-series sample gene expression data measured under cold and heat stress in Arabidopsis. HTRgene analysis successfully reproduced known biological mechanisms of cold and heat stress in Arabidopsis. Also, HTRgene showed higher accuracy in detecting the documented stress response genes than existing tools. Conclusions HTRgene, a method to find the ordering of response time of genes that are commonly observed among multiple time-series samples, successfully integrated multiple heterogeneous time-series gene expression datasets. It can be applied to many research problems related to the integration of time series data analysis.
Collapse
Affiliation(s)
- Hongryul Ahn
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
| | - Inuk Jung
- Department of Computer Science and Engineering, Kyungpook National University, Daegu, Korea
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women's University, Seoul, Korea
| | - Dongwon Kang
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea
| | - Woosuk Jung
- Department of Crop Science, Konkuk University, Seoul, Korea.
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea. .,Bioinformatics Institute, Seoul National University, Seoul, Korea.
| |
Collapse
|
22
|
Bodein A, Chapleur O, Droit A, Lê Cao KA. A Generic Multivariate Framework for the Integration of Microbiome Longitudinal Studies With Other Data Types. Front Genet 2019; 10:963. [PMID: 31803221 PMCID: PMC6875829 DOI: 10.3389/fgene.2019.00963] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Accepted: 09/10/2019] [Indexed: 12/12/2022] Open
Abstract
Simultaneous profiling of biospecimens using different technological platforms enables the study of many data types, encompassing microbial communities, omics, and meta-omics as well as clinical or chemistry variables. Reduction in costs now enables longitudinal or time course studies on the same biological material or system. The overall aim of such studies is to investigate relationships between these longitudinal measures in a holistic manner to further decipher the link between molecular mechanisms and microbial community structures, or host-microbiota interactions. However, analytical frameworks enabling an integrated analysis between microbial communities and other types of biological, clinical, or phenotypic data are still in their infancy. The challenges include few time points that may be unevenly spaced and unmatched between different data types, a small number of unique individual biospecimens, and high individual variability. Those challenges are further exacerbated by the inherent characteristics of microbial communities-derived data (e.g., sparse, compositional). We propose a generic data-driven framework to integrate different types of longitudinal data measured on the same biological specimens with microbial community data and select key temporal features with strong associations within the same sample group. The framework ranges from filtering and modeling to integration using smoothing splines and multivariate dimension reduction methods to address some of the analytical challenges of microbiome-derived data. We illustrate our framework on different types of multi-omics case studies in bioreactor experiments as well as human studies.
Collapse
Affiliation(s)
- Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Chapleur
- Hydrosystems and Biopresses Research Unit, Irstea, Antony, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
23
|
Spies D, Renz PF, Beyer TA, Ciaudo C. Comparative analysis of differential gene expression tools for RNA sequencing time course data. Brief Bioinform 2019; 20:288-298. [PMID: 29028903 PMCID: PMC6357553 DOI: 10.1093/bib/bbx115] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2017] [Indexed: 02/05/2023] Open
Abstract
RNA sequencing (RNA-seq) has become a standard procedure to investigate transcriptional changes between conditions and is routinely used in research and clinics. While standard differential expression (DE) analysis between two conditions has been extensively studied, and improved over the past decades, RNA-seq time course (TC) DE analysis algorithms are still in their early stages. In this study, we compare, for the first time, existing TC RNA-seq tools on an extensive simulation data set and validated the best performing tools on published data. Surprisingly, TC tools were outperformed by the classical pairwise comparison approach on short time series (<8 time points) in terms of overall performance and robustness to noise, mostly because of high number of false positives, with the exception of ImpulseDE2. Overlapping of candidate lists between tools improved this shortcoming, as the majority of false-positive, but not true-positive, candidates were unique for each method. On longer time series, pairwise approach was less efficient on the overall performance compared with splineTC and maSigPro, which did not identify any false-positive candidate.
Collapse
Affiliation(s)
- Daniel Spies
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland.,Life Science Zurich Graduate School, Molecular Life Science program, University of Zürich, Switzerland
| | - Peter F Renz
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland.,Life Science Zurich Graduate School, Molecular Life Science program, University of Zürich, Switzerland
| | - Tobias A Beyer
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland
| | - Constance Ciaudo
- Swiss Federal Institute of Technology Zurich, Department of Biology, IMHS, Zurich, Switzerland
| |
Collapse
|
24
|
Wanichthanarak K, Jeamsripong S, Pornputtapong N, Khoomrung S. Accounting for biological variation with linear mixed-effects modelling improves the quality of clinical metabolomics data. Comput Struct Biotechnol J 2019; 17:611-618. [PMID: 31110642 PMCID: PMC6506811 DOI: 10.1016/j.csbj.2019.04.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 04/16/2019] [Accepted: 04/17/2019] [Indexed: 11/16/2022] Open
Abstract
Metabolite profiles from biological samples suffer from both technical variations and subject-specific variants. To improve the quality of metabolomics data, conventional data processing methods can be employed to remove technical variations. These methods do not consider sources of subject variation as separate factors from biological factors of interest. This can be a significant issue when performing quantitative metabolomics in clinical trials or screening for a potential biomarker in early-stage disease, because changes in metabolism or a desired-metabolite signal are small compared to the total metabolite signals. As a result, inter-individual variability can interfere subsequent statistical analyses. Here, we propose an additional data processing step using linear mixed-effects modelling to readjust an individual metabolite signal prior to multivariate analyses. Published clinical metabolomics data was used to demonstrate and evaluate the proposed method. We observed a substantial reduction in variation of each metabolite signal after model fitting. A comparison with other strategies showed that our proposed method contributed to improved classification accuracy, precision, sensitivity and specificity. Moreover, we highlight the importance of patient metadata as it contains rich information of subject characteristics, which can be used to model and normalize metabolite abundances. The proposed method is available as an R package lmm2met.
Collapse
Affiliation(s)
- Kwanjeera Wanichthanarak
- Department of Biochemistry and Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wanglang Road, Bangkok Noi, Bangkok 10700, Thailand.,Data Management and Statistical Analysis Center, Faculty of Public Health, Khon Kaen University, Khon Kaen 40002, Thailand
| | - Saharuetai Jeamsripong
- Research Unit in Microbial Food Safety and Antimicrobial Resistance, Department of Veterinary Public Health, Faculty of Veterinary Science, Chulalongkorn University, 39 Henri-Dunant Road, Pathumwan, Bangkok 10330, Thailand
| | - Natapol Pornputtapong
- Department of Biochemistry and Microbiology, Faculty of Pharmaceutical Sciences, Chulalongkorn University, Bangkok, 10330, Thailand.,Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand
| | - Sakda Khoomrung
- Department of Biochemistry and Siriraj Metabolomics and Phenomics Center, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wanglang Road, Bangkok Noi, Bangkok 10700, Thailand.,Center for Innovation in Chemistry (PERCH-CIC), Faculty of Science, Mahidol University, Rama 6 Road, Bangkok 10400, Thailand
| |
Collapse
|
25
|
Wang J, Choi H, Chung NC, Cao Q, Ng DCM, Mirza B, Scruggs SB, Wang D, Garlid AO, Ping P. Integrated Dissection of Cysteine Oxidative Post-translational Modification Proteome During Cardiac Hypertrophy. J Proteome Res 2018; 17:4243-4257. [PMID: 30141336 DOI: 10.1021/acs.jproteome.8b00372] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Cysteine oxidative modification of cellular proteins is crucial for many aspects of cardiac hypertrophy development. However, integrated dissection of multiple types of cysteine oxidative post-translational modifications (O-PTM) of proteomes in cardiac hypertrophy is currently missing. Here we developed a novel discovery platform that encompasses a customized biotin switch-based quantitative proteomics pipeline and an advanced analytic workflow to comprehensively profile the landscape of cysteine O-PTM in an ISO-induced cardiac hypertrophy mouse model. Specifically, we identified a total of 1655 proteins containing 3324 oxidized cysteine sites by at least one of the following three modifications: reversible cysteine O-PTM, cysteine sulfinylation (CysSO2H), and cysteine sulfonylation (CysSO3H). Analyzing the hypertrophy signatures that are reproducibly discovered from this computational workflow unveiled four biological processes with increased cysteine O-PTM. Among them, protein phosphorylation, creatine metabolism, and response to elevated Ca2+ pathways exhibited an elevation of cysteine O-PTM in early stages, whereas glucose metabolism enzymes were increasingly modified in later stages, illustrating a temporal regulatory map in cardiac hypertrophy. Our cysteine O-PTM platform depicts a dynamic and integrated landscape of the cysteine oxidative proteome, through the extracted molecular signatures, and provides critical mechanistic insights in cardiac hypertrophy. Data are available via ProteomeXchange with identifier PXD010336.
Collapse
|
26
|
van den Brink WJ, Palic S, Köhler I, de Lange ECM. Access to the CNS: Biomarker Strategies for Dopaminergic Treatments. Pharm Res 2018; 35:64. [PMID: 29450650 PMCID: PMC5814527 DOI: 10.1007/s11095-017-2333-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 12/18/2017] [Indexed: 12/26/2022]
Abstract
Despite substantial research carried out over the last decades, it remains difficult to understand the wide range of pharmacological effects of dopaminergic agents. The dopaminergic system is involved in several neurological disorders, such as Parkinson's disease and schizophrenia. This complex system features multiple pathways implicated in emotion and cognition, psychomotor functions and endocrine control through activation of G protein-coupled dopamine receptors. This review focuses on the system-wide effects of dopaminergic agents on the multiple biochemical and endocrine pathways, in particular the biomarkers (i.e., indicators of a pharmacological process) that reflect these effects. Dopaminergic treatments developed over the last decades were found to be associated with numerous biochemical pathways in the brain, including the norepinephrine and the kynurenine pathway. Additionally, they have shown to affect peripheral systems, for example the hypothalamus-pituitary-adrenal (HPA) axis. Dopaminergic agents thus have a complex and broad pharmacological profile, rendering drug development challenging. Considering the complex system-wide pharmacological profile of dopaminergic agents, this review underlines the needs for systems pharmacology studies that include: i) proteomics and metabolomics analysis; ii) longitudinal data evaluation and mathematical modeling; iii) pharmacokinetics-based interpretation of drug effects; iv) simultaneous biomarker evaluation in the brain, the cerebrospinal fluid (CSF) and plasma; and v) specific attention to condition-dependent (e.g., disease) pharmacology. Such approach is considered essential to increase our understanding of central nervous system (CNS) drug effects and substantially improve CNS drug development.
Collapse
Affiliation(s)
- Willem Johan van den Brink
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Semra Palic
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Isabelle Köhler
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - Elizabeth Cunera Maria de Lange
- Division of Systems Biomedicine and Pharmacology, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.
| |
Collapse
|
27
|
Terenina E, Sautron V, Ydier C, Bazovkina D, Sevin-Pujol A, Gress L, Lippi Y, Naylies C, Billon Y, Liaubet L, Mormede P, Villa-Vialaneix N. Time course study of the response to LPS targeting the pig immune gene networks. BMC Genomics 2017; 18:988. [PMID: 29273011 PMCID: PMC5741867 DOI: 10.1186/s12864-017-4363-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 12/01/2017] [Indexed: 12/23/2022] Open
Abstract
Background Stress is a generic term used to describe non-specific responses of the body to all kinds of challenges. A very large variability in the response can be observed across individuals, depending on numerous conditioning factors like genetics, early influences and life history. As a result, there is a wide range of individual vulnerability and resilience to stress, also called robustness. The importance of robustness-related traits in breeding strategies is increasing progressively towards the production of animals with a high level of production under a wide range of climatic conditions and management systems, together with a lower environmental impact and a high level of animal welfare. The present study aims at describing blood transcriptomic, hormonal, and metabolic responses of pigs to a systemic challenge using lipopolysaccharide (LPS). The objective is to analyze the individual variation of the biological responses in relation to the activity of the HPA axis measured by the levels of plasma cortisol after LPS and ACTH in 120 juvenile Large White (LW) pigs. The kinetics of the response was measured with biological variables and whole blood gene expression at 4 time points. A multilevel statistical analysis was used to take into account the longitudinal aspect of the data. Results Cortisol level reaches its peak 4 h after LPS injection. The characteristic changes of white blood cell count to LPS were observed, with a decrease of total count, maximal at t=+4 h, and the mirror changes in the respective proportions of lymphocytes and granulocytes. The lymphocytes / granulocytes ratio was maximal at t=+1 h. An integrative statistical approach was used and provided a set of candidate genes for kinetic studies and ongoing complementary studies focused on the LPS-stimulated inflammatory response. Conclusions The present study demonstrates the specific biomarkers indicative of an inflammation in swine. Furthermore, these stress responses persist for prolonged periods of time and at significant expression levels, making them good candidate markers for evaluating the efficacy of anti-inflammatory drugs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4363-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Elena Terenina
- INRA, UMR 1388 GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet-Tolosan, F-31326, France.
| | - Valérie Sautron
- INRA, UMR 1388 GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet-Tolosan, F-31326, France
| | - Caroline Ydier
- INRA, UMR 1388 GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet-Tolosan, F-31326, France
| | - Darya Bazovkina
- Department of Behavioral Neurogenomics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090, Russia
| | - Amélie Sevin-Pujol
- INRA, UMR 1388 GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet-Tolosan, F-31326, France
| | - Laure Gress
- INRA, UMR 1388 GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet-Tolosan, F-31326, France
| | - Yannick Lippi
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRA, ENVT, INP-Purpan, UPS, Toulouse, F-31027, France
| | - Claire Naylies
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRA, ENVT, INP-Purpan, UPS, Toulouse, F-31027, France
| | - Yvon Billon
- INRA, UE 1372 GenESI, Surgeres, F-17700, France
| | - Laurence Liaubet
- INRA, UMR 1388 GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet-Tolosan, F-31326, France
| | - Pierre Mormede
- INRA, UMR 1388 GenPhySE, Université de Toulouse, INRA, INPT, ENVT, Castanet-Tolosan, F-31326, France
| | | |
Collapse
|
28
|
Abstract
Transposon insertion sequencing (TIS) is a powerful high-throughput genetic technique that is transforming functional genomics in prokaryotes, because it enables genome-wide mapping of the determinants of fitness. However, current approaches for analyzing TIS data assume that selective pressures are constant over time and thus do not yield information regarding changes in the genetic requirements for growth in dynamic environments (e.g., during infection). Here, we describe structured analysis of TIS data collected as a time series, termed pattern analysis of conditional essentiality (PACE). From a temporal series of TIS data, PACE derives a quantitative assessment of each mutant’s fitness over the course of an experiment and identifies mutants with related fitness profiles. In so doing, PACE circumvents major limitations of existing methodologies, specifically the need for artificial effect size thresholds and enumeration of bacterial population expansion. We used PACE to analyze TIS samples of Edwardsiella piscicida (a fish pathogen) collected over a 2-week infection period from a natural host (the flatfish turbot). PACE uncovered more genes that affect E. piscicida’s fitness in vivo than were detected using a cutoff at a terminal sampling point, and it identified subpopulations of mutants with distinct fitness profiles, one of which informed the design of new live vaccine candidates. Overall, PACE enables efficient mining of time series TIS data and enhances the power and sensitivity of TIS-based analyses. Transposon insertion sequencing (TIS) enables genome-wide mapping of the genetic determinants of fitness, typically based on observations at a single sampling point. Here, we move beyond analysis of endpoint TIS data to create a framework for analysis of time series TIS data, termed pattern analysis of conditional essentiality (PACE). We applied PACE to identify genes that contribute to colonization of a natural host by the fish pathogen Edwardsiella piscicida. PACE uncovered more genes that affect E. piscicida’s fitness in vivo than were detected using a terminal sampling point, and its clustering of mutants with related fitness profiles informed design of new live vaccine candidates. PACE yields insights into patterns of fitness dynamics and circumvents major limitations of existing methodologies. Finally, the PACE method should be applicable to additional “omic” time series data, including screens based on clustered regularly interspaced short palindromic repeats with Cas9 (CRISPR/Cas9).
Collapse
|
29
|
Law KP, Mao X, Han TL, Zhang H. Unsaturated plasma phospholipids are consistently lower in the patients diagnosed with gestational diabetes mellitus throughout pregnancy: A longitudinal metabolomics study of Chinese pregnant women part 1. Clin Chim Acta 2017; 465:53-71. [DOI: 10.1016/j.cca.2016.12.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Revised: 11/28/2016] [Accepted: 12/12/2016] [Indexed: 12/12/2022]
|
30
|
Straube J, Huang BE, Cao KAL. DynOmics to identify delays and co-expression patterns across time course experiments. Sci Rep 2017; 7:40131. [PMID: 28065937 PMCID: PMC5220332 DOI: 10.1038/srep40131] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 12/02/2016] [Indexed: 12/16/2022] Open
Abstract
Dynamic changes in biological systems can be captured by measuring molecular expression from different levels (e.g., genes and proteins) across time. Integration of such data aims to identify molecules that show similar expression changes over time; such molecules may be co-regulated and thus involved in similar biological processes. Combining data sources presents a systematic approach to study molecular behaviour. It can compensate for missing data in one source, and can reduce false positives when multiple sources highlight the same pathways. However, integrative approaches must accommodate the challenges inherent in ‘omics’ data, including high-dimensionality, noise, and timing differences in expression. As current methods for identification of co-expression cannot cope with this level of complexity, we developed a novel algorithm called DynOmics. DynOmics is based on the fast Fourier transform, from which the difference in expression initiation between trajectories can be estimated. This delay can then be used to realign the trajectories and identify those which show a high degree of correlation. Through extensive simulations, we demonstrate that DynOmics is efficient and accurate compared to existing approaches. We consider two case studies highlighting its application, identifying regulatory relationships across ‘omics’ data within an organism and for comparative gene expression analysis across organisms.
Collapse
Affiliation(s)
- Jasmin Straube
- QFAB@QCIF Bioinformatics, Institute for Molecular Biosciences, The University of Queensland, Queensland Bioscience Precinct, St Lucia, QLD, Australia.,The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| | - Bevan Emma Huang
- Janssen Research &Development, LLC, Discovery Sciences, Menlo Park, USA
| | - Kim-Anh Lê Cao
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, QLD, Australia
| |
Collapse
|
31
|
MixMC: A Multivariate Statistical Framework to Gain Insight into Microbial Communities. PLoS One 2016; 11:e0160169. [PMID: 27513472 PMCID: PMC4981383 DOI: 10.1371/journal.pone.0160169] [Citation(s) in RCA: 103] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 07/14/2016] [Indexed: 12/14/2022] Open
Abstract
Culture independent techniques, such as shotgun metagenomics and 16S rRNA amplicon sequencing have dramatically changed the way we can examine microbial communities. Recently, changes in microbial community structure and dynamics have been associated with a growing list of human diseases. The identification and comparison of bacteria driving those changes requires the development of sound statistical tools, especially if microbial biomarkers are to be used in a clinical setting. We present mixMC, a novel multivariate data analysis framework for metagenomic biomarker discovery. mixMC accounts for the compositional nature of 16S data and enables detection of subtle differences when high inter-subject variability is present due to microbial sampling performed repeatedly on the same subjects, but in multiple habitats. Through data dimension reduction the multivariate methods provide insightful graphical visualisations to characterise each type of environment in a detailed manner. We applied mixMC to 16S microbiome studies focusing on multiple body sites in healthy individuals, compared our results with existing statistical tools and illustrated added value of using multivariate methodologies to fully characterise and compare microbial communities.
Collapse
|