101
|
Jang H, Kwon H, Yang JJ, Hong J, Kim Y, Kim KW, Lee JS, Jang YK, Kim ST, Lee KH, Lee JH, Na DL, Seo SW, Kim HJ, Lee JM. Correlations between Gray Matter and White Matter Degeneration in Pure Alzheimer's Disease, Pure Subcortical Vascular Dementia, and Mixed Dementia. Sci Rep 2017; 7:9541. [PMID: 28842654 PMCID: PMC5573310 DOI: 10.1038/s41598-017-10074-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 08/04/2017] [Indexed: 11/09/2022] Open
Abstract
Alzheimer's disease dementia (ADD) and subcortical vascular dementia (SVaD) both show cortical thinning and white matter (WM) microstructural changes. We evaluated different patterns of correlation between gray matter (GM) and WM microstructural changes in pure ADD, pure SVaD, and mixed dementia. We enrolled 40 Pittsburgh compound B (PiB) positive ADD patients without WM hyperintensities (pure ADD), 32 PiB negative SVaD patients (pure SVaD), 23 PiB positive SVaD patients (mixed dementia), and 56 normal controls. WM microstructural integrity was quantified using fractional anisotropy (FA), axial diffusivity (DA), and radial diffusivity (DR) values. We used sparse canonical correlation analysis to show correlated regions of cortical thinning and WM microstructural changes. In pure ADD patients, lower FA in the frontoparietal area correlated with cortical thinning in the left inferior parietal lobule and bilateral paracentral lobules. In pure SVaD patients, lower FA and higher DR across extensive WM regions correlated with cortical thinning in bilateral fronto-temporo-parietal regions. In mixed dementia patients, DR and DA changes across extensive WM regions correlated with cortical thinning in the bilateral fronto-temporo-parietal regions. Our findings showed that the relationships between GM and WM degeneration are distinct in pure ADD, pure SVaD, and mixed dementia, suggesting that different pathomechanisms underlie their correlations.
Collapse
Affiliation(s)
- Hyemin Jang
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Korea
| | - Hunki Kwon
- Department of Biomedical Engineering, Hanyang University, Seoul, Korea
| | - Jin-Ju Yang
- Department of Biomedical Engineering, Hanyang University, Seoul, Korea
| | - Jinwoo Hong
- Department of Biomedical Engineering, Hanyang University, Seoul, Korea
| | - Yeshin Kim
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Korea
| | - Ko Woon Kim
- Department of Neurology, Chonbuk National University Hospital, Chonbuk National University Medical school, JeonJu, Korea
| | - Jin San Lee
- Department of Neurology, Kyung Hee University Hospital, Seoul, Korea
| | - Young Kyoung Jang
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Korea
| | - Sung Tae Kim
- Radiology Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Kyung Han Lee
- Nuclear Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jae Hong Lee
- Department of Neurology, Asan Medical Center, Ulsan University School of Medicine, Seoul, Korea
| | - Duk L Na
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Korea
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea
- Stem Cell & Regenerative Medicine Institute, Samsung Medical Center, Seoul, Korea
| | - Sang Won Seo
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
- Neuroscience Center, Samsung Medical Center, Seoul, Korea
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea
- Department of Clinical Research Design & Evaluation, SAIHST, Sungkyunkwan University, Seoul, Korea
| | - Hee Jin Kim
- Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea.
- Neuroscience Center, Samsung Medical Center, Seoul, Korea.
| | - Jong-Min Lee
- Department of Biomedical Engineering, Hanyang University, Seoul, Korea.
| |
Collapse
|
102
|
Trainor PJ, DeFilippis AP, Rai SN. Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics. Metabolites 2017. [PMID: 28635678 PMCID: PMC5488001 DOI: 10.3390/metabo7020030] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k-Nearest Neighbors (k-NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction, classifier performance (least to greatest error) was ranked as follows: SVM, Random Forest, Naïve Bayes, sPLS-DA, Neural Networks, PLS-DA and k-NN classifiers. When non-normal error distributions were introduced, the performance of PLS-DA and k-NN classifiers deteriorated further relative to the remaining techniques. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.
Collapse
Affiliation(s)
- Patrick J Trainor
- Division of Cardiovascular Medicine, Department of Medicine, University of Louisville, 580 S. Preston St., Louisville, KY 40202, USA.
| | - Andrew P DeFilippis
- Division of Cardiovascular Medicine, Department of Medicine, University of Louisville, 580 S. Preston St., Louisville, KY 40202, USA.
| | - Shesh N Rai
- Department of Bioinformatics and Biostatistics, University of Louisville, 505 S. Hancock St., Louisville, KY 40202, USA.
| |
Collapse
|
103
|
Huang S, Chaudhary K, Garmire LX. More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front Genet 2017; 8:84. [PMID: 28670325 PMCID: PMC5472696 DOI: 10.3389/fgene.2017.00084] [Citation(s) in RCA: 389] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 06/01/2017] [Indexed: 01/20/2023] Open
Abstract
Multi-omics data integration is one of the major challenges in the era of precision medicine. Considerable work has been done with the advent of high-throughput studies, which have enabled the data access for downstream analyses. To improve the clinical outcome prediction, a gamut of software tools has been developed. This review outlines the progress done in the field of multi-omics integration and comprehensive tools developed so far in this field. Further, we discuss the integration methods to predict patient survival at the end of the review.
Collapse
Affiliation(s)
- Sijia Huang
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, United States.,Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, United States
| | - Kumardeep Chaudhary
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, United States
| | - Lana X Garmire
- Epidemiology Program, University of Hawaii Cancer CenterHonolulu, HI, United States.,Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at ManoaHonolulu, HI, United States.,Department of Obstetrics, Gynecology, and Women's Health, John A. Burns School of Medicine, University of Hawaii at ManoaHonolulu, HI, United States
| |
Collapse
|
104
|
Furanoterpene Diversity and Variability in the Marine Sponge Spongia officinalis, from Untargeted LC-MS/MS Metabolomic Profiling to Furanolactam Derivatives. Metabolites 2017; 7:metabo7020027. [PMID: 28608848 PMCID: PMC5487998 DOI: 10.3390/metabo7020027] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Revised: 05/23/2017] [Accepted: 06/06/2017] [Indexed: 01/07/2023] Open
Abstract
The Mediterranean marine sponge Spongia officinalis has been reported as a rich source of secondary metabolites and also as a bioindicator of water quality given its capacity to concentrate trace metals. In this study, we evaluated the chemical diversity within 30 S. officinalis samples collected over three years at two sites differentially impacted by anthropogenic pollutants located near Marseille (South of France). Untargeted liquid chromatography—mass spectrometry (LC–MS) metabolomic profiling (C18 LC, ESI-Q-TOF MS) combined with XCMS Online data processing and multivariate statistical analysis revealed 297 peaks assigned to at least 86 compounds. The spatio-temporal metabolite variability was mainly attributed to variations in relative content of furanoterpene derivatives. This family was further characterized through LC–MS/MS analyses in positive and negative ion modes combined with molecular networking, together with a comprehensive NMR study of isolated representatives such as demethylfurospongin-4 and furospongin-1. The MS/MS and NMR spectroscopic data led to the identification of a new furanosesterterpene, furofficin (2), as well as two derivatives with a glycinyl lactam moiety, spongialactam A (12a) and B (12b). This study illustrates the potential of untargeted LC–MS metabolomics and molecular networking to discover new natural compounds even in an extensively studied organism such as S. officinalis. It also highlights the effect of anthropogenic pollution on the chemical profiles within the sponge.
Collapse
|
105
|
Tosun D, Landau S, Aisen PS, Petersen RC, Mintun M, Jagust W, Weiner MW. Association between tau deposition and antecedent amyloid-β accumulation rates in normal and early symptomatic individuals. Brain 2017; 140:1499-1512. [DOI: 10.1093/brain/awx046] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 01/17/2017] [Indexed: 02/06/2023] Open
Affiliation(s)
- Duygu Tosun
- Department of Radiology and Biomedical Imaging, University of California – San Francisco, San Francisco, CA, USA
| | - Susan Landau
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Paul S Aisen
- Department of Neurology, University of California-San Diego, San Diego, CA, USA
| | | | - Mark Mintun
- Avid Radiopharmaceuticals, Philadelphia, PA, USA
| | - William Jagust
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Michael W Weiner
- Department of Radiology and Biomedical Imaging, University of California – San Francisco, San Francisco, CA, USA
| | | |
Collapse
|
106
|
Torbati ME, Mitreva M, Gopalakrishnan V. Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations. DATA 2016; 1:19. [PMID: 28239609 PMCID: PMC5325162 DOI: 10.3390/data1030019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery.
Collapse
Affiliation(s)
- Mahbaneh Eshaghzadeh Torbati
- Department of Computer Science, University of Pittsburgh, 6135 Sennott Square, 210 S Bouquet St, Pittsburgh, PA 15260-9161, USA
| | - Makedonka Mitreva
- Department of Medicine, Washington University School of Medicine, 660 S Euclid Ave, St. Louis, MO 63110, USA
| | - Vanathi Gopalakrishnan
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Suite 500, Pittsburgh, PA 15206-3701
| |
Collapse
|
107
|
Gatesoupe FJ, Huelvan C, Le Bayon N, Le Delliou H, Madec L, Mouchel O, Quazuguel P, Mazurais D, Zambonino-Infante JL. The highly variable microbiota associated to intestinal mucosa correlates with growth and hypoxia resistance of sea bass, Dicentrarchus labrax, submitted to different nutritional histories. BMC Microbiol 2016; 16:266. [PMID: 27821062 PMCID: PMC5100225 DOI: 10.1186/s12866-016-0885-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 10/30/2016] [Indexed: 01/12/2023] Open
Abstract
Background The better understanding of how intestinal microbiota interacts with fish health is one of the key to sustainable aquaculture development. The present experiment aimed at correlating active microbiota associated to intestinal mucosa with Specific Growth Rate (SGR) and Hypoxia Resistance Time (HRT) in European sea bass individuals submitted to different nutritional histories: the fish were fed either standard or unbalanced diets at first feeding, and then mixed before repeating the dietary challenge in a common garden approach at the juvenile stage. Results A diet deficient in essential fatty acids (LH) lowered both SGR and HRT in sea bass, especially when the deficiency was already applied at first feeding. A protein-deficient diet with high starch supply (HG) reduced SGR to a lesser extent than LH, but it did not affect HRT. In overall average, 94 % of pyrosequencing reads corresponded to Proteobacteria, and the differences in Operational Taxonomy Units (OTUs) composition were mildly significant between experimental groups, mainly due to high individual variability. The highest and the lowest Bray-Curtis indices of intra-group similarity were observed in the two groups fed standard starter diet, and then mixed before the final dietary challenge with fish already exposed to the nutritional deficiency at first feeding (0.60 and 0.42 with diets HG and LH, respectively). Most noticeably, the median percentage of Escherichia-Shigella OTU_1 was less in the group LH with standard starter diet. Disregarding the nutritional history of each individual, strong correlation appeared between (1) OTU richness and SGR, and (2) dominance index and HRT. The two physiological traits correlated also with the relative abundance of distinct OTUs (positive correlations: Pseudomonas sp. OTU_3 and Herbaspirillum sp. OTU_10 with SGR, Paracoccus sp. OTU_4 and Vibrio sp. OTU_7 with HRT; negative correlation: Rhizobium sp. OTU_9 with HRT). Conclusions In sea bass, gut microbiota characteristics and physiological traits of individuals are linked together, interfering with nutritional history, and resulting in high variability among individual microbiota. Many samples and tank replicates seem necessary to further investigate the effect of experimental treatments on gut microbiota composition, and to test the hypothesis whether microbiotypes may be delineated in fish. Electronic supplementary material The online version of this article (doi:10.1186/s12866-016-0885-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- François-Joël Gatesoupe
- NUMEA, INRA, Univ. Pau & Pays Adour, 64310, Saint Pée sur Nivelle, France. .,PFOM/ARN, Ifremer, Centre de Bretagne, CS 10070, 29280, Plouzané, France.
| | - Christine Huelvan
- Ifremer, UMR 6539 (LEMAR), PFOM/ARN, Centre de Bretagne, CS 10070, 29280, Plouzané, France
| | - Nicolas Le Bayon
- Ifremer, UMR 6539 (LEMAR), PFOM/ARN, Centre de Bretagne, CS 10070, 29280, Plouzané, France
| | - Hervé Le Delliou
- Ifremer, UMR 6539 (LEMAR), PFOM/ARN, Centre de Bretagne, CS 10070, 29280, Plouzané, France
| | - Lauriane Madec
- Ifremer, UMR 6539 (LEMAR), PFOM/ARN, Centre de Bretagne, CS 10070, 29280, Plouzané, France
| | - Olivier Mouchel
- Ifremer, UMR 6539 (LEMAR), PFOM/ARN, Centre de Bretagne, CS 10070, 29280, Plouzané, France
| | - Patrick Quazuguel
- Ifremer, UMR 6539 (LEMAR), PFOM/ARN, Centre de Bretagne, CS 10070, 29280, Plouzané, France
| | - David Mazurais
- Ifremer, UMR 6539 (LEMAR), PFOM/ARN, Centre de Bretagne, CS 10070, 29280, Plouzané, France
| | | |
Collapse
|
108
|
Steegenga WT, Mischke M, Lute C, Boekschoten MV, Lendvai A, Pruis MGM, Verkade HJ, van de Heijning BJM, Boekhorst J, Timmerman HM, Plösch T, Müller M, Hooiveld GJEJ. Maternal exposure to a Western-style diet causes differences in intestinal microbiota composition and gene expression of suckling mouse pups. Mol Nutr Food Res 2016; 61. [PMID: 27129739 PMCID: PMC5215441 DOI: 10.1002/mnfr.201600141] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Revised: 03/25/2016] [Accepted: 04/13/2016] [Indexed: 12/14/2022]
Abstract
Scope The long‐lasting consequences of nutritional programming during the early phase of life have become increasingly evident. The effects of maternal nutrition on the developing intestine are still underexplored. Methods and results In this study, we observed (1) altered microbiota composition of the colonic luminal content, and (2) differential gene expression in the intestinal wall in 2‐week‐old mouse pups born from dams exposed to a Western‐style (WS) diet during the perinatal period. A sexually dimorphic effect was found for the differentially expressed genes in the offspring of WS diet‐exposed dams but no differences between male and female pups were found for the microbiota composition. Integrative analysis of the microbiota and gene expression data revealed that the maternal WS diet independently affected gene expression and microbiota composition. However, the abundance of bacterial families not affected by the WS diet (Bacteroidaceae, Porphyromonadaceae, and Lachnospiraceae) correlated with the expression of genes playing a key role in intestinal development and functioning (e.g. Pitx2 and Ace2). Conclusion Our data reveal that maternal consumption of a WS diet during the perinatal period alters both gene expression and microbiota composition in the intestinal tract of 2‐week‐old offspring.
Collapse
Affiliation(s)
- Wilma T Steegenga
- Nutrition, Metabolism, and Genomics Group, Division of Human Nutrition, Wageningen University, Wageningen, The Netherlands
| | - Mona Mischke
- Nutrition, Metabolism, and Genomics Group, Division of Human Nutrition, Wageningen University, Wageningen, The Netherlands
| | - Carolien Lute
- Nutrition, Metabolism, and Genomics Group, Division of Human Nutrition, Wageningen University, Wageningen, The Netherlands
| | - Mark V Boekschoten
- Nutrition, Metabolism, and Genomics Group, Division of Human Nutrition, Wageningen University, Wageningen, The Netherlands
| | - Agnes Lendvai
- Center for Liver, Digestive and Metabolic Diseases, Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Maurien G M Pruis
- Center for Liver, Digestive and Metabolic Diseases, Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Henkjan J Verkade
- Center for Liver, Digestive and Metabolic Diseases, Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | | | | | | | - Torsten Plösch
- Department of Obstetrics and Gynaecology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Michael Müller
- Nutrigenomics and Systems Nutrition, Norwich Medical School, University of East Anglia, Norwich, UK
| | - Guido J E J Hooiveld
- Nutrition, Metabolism, and Genomics Group, Division of Human Nutrition, Wageningen University, Wageningen, The Netherlands
| |
Collapse
|
109
|
Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 2016; 17:628-41. [PMID: 26969681 PMCID: PMC4945831 DOI: 10.1093/bib/bbv108] [Citation(s) in RCA: 196] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 10/26/2015] [Indexed: 01/16/2023] Open
Abstract
State-of-the-art next-generation sequencing, transcriptomics, proteomics and other high-throughput 'omics' technologies enable the efficient generation of large experimental data sets. These data may yield unprecedented knowledge about molecular pathways in cells and their role in disease. Dimension reduction approaches have been widely used in exploratory analysis of single omics data sets. This review will focus on dimension reduction approaches for simultaneous exploratory analyses of multiple data sets. These methods extract the linear relationships that best explain the correlated structure across data sets, the variability both within and between variables (or observations) and may highlight data issues such as batch effects or outliers. We explore dimension reduction techniques as one of the emerging approaches for data integration, and how these can be applied to increase our understanding of biological systems in normal physiological function and disease.
Collapse
|
110
|
Monteiro JM, Rao A, Shawe-Taylor J, Mourão-Miranda J. A multiple hold-out framework for Sparse Partial Least Squares. J Neurosci Methods 2016; 271:182-94. [PMID: 27353722 PMCID: PMC5012894 DOI: 10.1016/j.jneumeth.2016.06.011] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 06/10/2016] [Accepted: 06/15/2016] [Indexed: 12/01/2022]
Abstract
SPLS framework which tests model reliability by fitting it to several data splits. Framework was applied to brain anatomy and individual items of the MMSE score. The adequate number of voxels and clinical items was selected automatically. SPLS found two associative effects between sparse brain voxels and MMSE items. Projection deflation provided better results than a classical PLS deflation.
Background Supervised classification machine learning algorithms may have limitations when studying brain diseases with heterogeneous populations, as the labels might be unreliable. More exploratory approaches, such as Sparse Partial Least Squares (SPLS), may provide insights into the brain's mechanisms by finding relationships between neuroimaging and clinical/demographic data. The identification of these relationships has the potential to improve the current understanding of disease mechanisms, refine clinical assessment tools, and stratify patients. SPLS finds multivariate associative effects in the data by computing pairs of sparse weight vectors, where each pair is used to remove its corresponding associative effect from the data by matrix deflation, before computing additional pairs. New method We propose a novel SPLS framework which selects the adequate number of voxels and clinical variables to describe each associative effect, and tests their reliability by fitting the model to different splits of the data. As a proof of concept, the approach was applied to find associations between grey matter probability maps and individual items of the Mini-Mental State Examination (MMSE) in a clinical sample with various degrees of dementia. Results The framework found two statistically significant associative effects between subsets of brain voxels and subsets of the questions/tasks. Comparison with existing methods SPLS was compared with its non-sparse version (PLS). The use of projection deflation versus a classical PLS deflation was also tested in both PLS and SPLS. Conclusions SPLS outperformed PLS, finding statistically significant effects and providing higher correlation values in hold-out data. Moreover, projection deflation provided better results.
Collapse
Affiliation(s)
- João M Monteiro
- Department of Computer Science, University College London, London, UK; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, UK.
| | - Anil Rao
- Department of Computer Science, University College London, London, UK; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
| | - John Shawe-Taylor
- Department of Computer Science, University College London, London, UK
| | - Janaina Mourão-Miranda
- Department of Computer Science, University College London, London, UK; Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, UK
| | | |
Collapse
|
111
|
Chaturvedi N, de Menezes RX, Goeman JJ. A global × global test for testing associations between two large sets of variables. Biom J 2016; 59:145-158. [PMID: 27225065 DOI: 10.1002/bimj.201500106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 01/06/2016] [Accepted: 03/07/2016] [Indexed: 12/30/2022]
Abstract
In high-dimensional omics studies where multiple molecular profiles are obtained for each set of patients, there is often interest in identifying complex multivariate associations, for example, copy number regulated expression levels in a certain pathway or in a genomic region. To detect such associations, we present a novel approach to test for association between two sets of variables. Our approach generalizes the global test, which tests for association between a group of covariates and a single univariate response, to allow high-dimensional multivariate response. We apply the method to several simulated datasets as well as two publicly available datasets, where we compare the performance of multivariate global test (G2) with univariate global test. The method is implemented in R and will be available as a part of the globaltest package in R.
Collapse
Affiliation(s)
- Nimisha Chaturvedi
- Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.,Netherlands Bioinformatics Center, Nijmegen, The Netherlands
| | - Renée X de Menezes
- Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.,Netherlands Bioinformatics Center, Nijmegen, The Netherlands
| | - Jelle J Goeman
- Biostatistics, Department for Health Evidence, Radboud University Medical Center, Nijmegen, The Netherlands.,Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
112
|
Bouyioukos C, Bucchini F, Elati M, Képès F. GREAT: a web portal for Genome Regulatory Architecture Tools. Nucleic Acids Res 2016; 44:W77-82. [PMID: 27151196 PMCID: PMC4987929 DOI: 10.1093/nar/gkw384] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 04/26/2016] [Indexed: 11/15/2022] Open
Abstract
GREAT (Genome REgulatory Architecture Tools) is a novel web portal for tools designed to generate user-friendly and biologically useful analysis of genome architecture and regulation. The online tools of GREAT are freely accessible and compatible with essentially any operating system which runs a modern browser. GREAT is based on the analysis of genome layout -defined as the respective positioning of co-functional genes- and its relation with chromosome architecture and gene expression. GREAT tools allow users to systematically detect regular patterns along co-functional genomic features in an automatic way consisting of three individual steps and respective interactive visualizations. In addition to the complete analysis of regularities, GREAT tools enable the use of periodicity and position information for improving the prediction of transcription factor binding sites using a multi-view machine learning approach. The outcome of this integrative approach features a multivariate analysis of the interplay between the location of a gene and its regulatory sequence. GREAT results are plotted in web interactive graphs and are available for download either as individual plots, self-contained interactive pages or as machine readable tables for downstream analysis. The GREAT portal can be reached at the following URL https://absynth.issb.genopole.fr/GREAT and each individual GREAT tool is available for downloading.
Collapse
Affiliation(s)
- Costas Bouyioukos
- iSSB, CNRS, Genopole, UEVE, Université Paris-Saclay, 5 rue Henri Desbruères, Évry 91030 Cedex, France
| | - François Bucchini
- iSSB, CNRS, Genopole, UEVE, Université Paris-Saclay, 5 rue Henri Desbruères, Évry 91030 Cedex, France
| | - Mohamed Elati
- iSSB, CNRS, Genopole, UEVE, Université Paris-Saclay, 5 rue Henri Desbruères, Évry 91030 Cedex, France
| | - François Képès
- iSSB, CNRS, Genopole, UEVE, Université Paris-Saclay, 5 rue Henri Desbruères, Évry 91030 Cedex, France
| |
Collapse
|
113
|
Identification of Commensal Species Positively Correlated with Early Stress Responses to a Compromised Mucus Barrier. Inflamm Bowel Dis 2016; 22:826-40. [PMID: 26926038 DOI: 10.1097/mib.0000000000000688] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
BACKGROUND Our aims were (1) to correlate changes in the microbiota to intestinal gene expression before and during the development of colitis in Muc2 mice and (2) to investigate whether the heterozygote Muc2 mouse would reveal host markers of gut barrier stress. METHODS Colon histology, transcriptomics, and microbiota profiling of faecal samples was performed on wild type, Muc2, and Muc2 mice at 2, 4, and 8 weeks of age. RESULTS Muc2 mice develop colitis in proximal colon after weaning, resulting in inflammatory and adaptive immune responses, and expression of genes associated with human inflammatory bowel disease. Muc2 mice do not develop colitis, but produce a thinner mucus layer. The transcriptome of Muc2 mice revealed differential expression of genes participating in mucosal stress responses and exacerbation of a transient inflammatory state around the time of weaning. Young wild type and Muc2 mice have a more constrained group of bacteria as compared with the Muc2 mice, but at 8 weeks the microbiota composition is more similar in all mice. At all ages, microbiota composition discriminated the groups of mice according to their genotype. Specific bacterial clusters correlated with altered gene expression responses to stress and bacteria, before colitis development, including colitogenic members of the genus Bacteroides. CONCLUSIONS The abundance of Bacteroides pathobionts increased before histological signs of pathology suggesting they may play a role in triggering the development of colitis. The Muc2 mouse produces a thinner mucus layer and can be used to study mucus barrier stress in the absence of colitis.
Collapse
|
114
|
Rohart F, Mason EA, Matigian N, Mosbergen R, Korn O, Chen T, Butcher S, Patel J, Atkinson K, Khosrotehrani K, Fisk NM, Lê Cao KA, Wells CA. A molecular classification of human mesenchymal stromal cells. PeerJ 2016; 4:e1845. [PMID: 27042394 PMCID: PMC4811172 DOI: 10.7717/peerj.1845] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 03/03/2016] [Indexed: 12/13/2022] Open
Abstract
Mesenchymal stromal cells (MSC) are widely used for the study of mesenchymal tissue repair, and increasingly adopted for cell therapy, despite the lack of consensus on the identity of these cells. In part this is due to the lack of specificity of MSC markers. Distinguishing MSC from other stromal cells such as fibroblasts is particularly difficult using standard analysis of surface proteins, and there is an urgent need for improved classification approaches. Transcriptome profiling is commonly used to describe and compare different cell types; however, efforts to identify specific markers of rare cellular subsets may be confounded by the small sample sizes of most studies. Consequently, it is difficult to derive reproducible, and therefore useful markers. We addressed the question of MSC classification with a large integrative analysis of many public MSC datasets. We derived a sparse classifier (The Rohart MSC test) that accurately distinguished MSC from non-MSC samples with >97% accuracy on an internal training set of 635 samples from 41 studies derived on 10 different microarray platforms. The classifier was validated on an external test set of 1,291 samples from 65 studies derived on 15 different platforms, with >95% accuracy. The genes that contribute to the MSC classifier formed a protein-interaction network that included known MSC markers. Further evidence of the relevance of this new MSC panel came from the high number of Mendelian disorders associated with mutations in more than 65% of the network. These result in mesenchymal defects, particularly impacting on skeletal growth and function. The Rohart MSC test is a simple in silico test that accurately discriminates MSC from fibroblasts, other adult stem/progenitor cell types or differentiated stromal cells. It has been implemented in the www.stemformatics.org resource, to assist researchers wishing to benchmark their own MSC datasets or data from the public domain. The code is available from the CRAN repository and all data used to generate the MSC test is available to download via the Gene Expression Omnibus or the Stemformatics resource.
Collapse
Affiliation(s)
- Florian Rohart
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
- The University of Queensland Diamantina Institute, Translational Research Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Elizabeth A. Mason
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
| | - Nicholas Matigian
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
- The University of Queensland Diamantina Institute, Translational Research Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Rowland Mosbergen
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
- Department of Anatomy and Neuroscience, Faculty of Medicine, University of Melbourne, Melbourne, Victoria, Australia
| | - Othmar Korn
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
| | - Tyrone Chen
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
- Department of Anatomy and Neuroscience, Faculty of Medicine, University of Melbourne, Melbourne, Victoria, Australia
| | - Suzanne Butcher
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
- Department of Anatomy and Neuroscience, Faculty of Medicine, University of Melbourne, Melbourne, Victoria, Australia
| | - Jatin Patel
- The University of Queensland Centre for Clinical Research, University of Queensland, Brisbane, Queensland, Australia
| | - Kerry Atkinson
- The University of Queensland Centre for Clinical Research, University of Queensland, Brisbane, Queensland, Australia
| | - Kiarash Khosrotehrani
- The University of Queensland Centre for Clinical Research, University of Queensland, Brisbane, Queensland, Australia
- Centre for Advanced Prenatal Care, Royal Brisbane & Women’s Hospital, Brisbane, Queensland, Australia
| | - Nicholas M. Fisk
- The University of Queensland Centre for Clinical Research, University of Queensland, Brisbane, Queensland, Australia
- Centre for Advanced Prenatal Care, Royal Brisbane & Women’s Hospital, Brisbane, Queensland, Australia
| | - Kim-Anh Lê Cao
- The University of Queensland Diamantina Institute, Translational Research Institute, University of Queensland, Brisbane, Queensland, Australia
| | - Christine A. Wells
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
- Department of Anatomy and Neuroscience, Faculty of Medicine, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
115
|
Liu B, Shen X, Pan W. Integrative and regularized principal component analysis of multiple sources of data. Stat Med 2016; 35:2235-50. [PMID: 26756854 DOI: 10.1002/sim.6866] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Revised: 09/28/2015] [Accepted: 12/14/2015] [Indexed: 12/14/2022]
Abstract
Integration of data of disparate types has become increasingly important to enhancing the power for new discoveries by combining complementary strengths of multiple types of data. One application is to uncover tumor subtypes in human cancer research in which multiple types of genomic data are integrated, including gene expression, DNA copy number, and DNA methylation data. In spite of their successes, existing approaches based on joint latent variable models require stringent distributional assumptions and may suffer from unbalanced scales (or units) of different types of data and non-scalability of the corresponding algorithms. In this paper, we propose an alternative based on integrative and regularized principal component analysis, which is distribution-free, computationally efficient, and robust against unbalanced scales. The new method performs dimension reduction simultaneously on multiple types of data, seeking data-adaptive sparsity and scaling. As a result, in addition to feature selection for each type of data, integrative clustering is achieved. Numerically, the proposed method compares favorably against its competitors in terms of accuracy (in identifying hidden clusters), computational efficiency, and robustness against unbalanced scales. In particular, compared with a popular method, the new method was competitive in identifying tumor subtypes associated with distinct patient survival patterns when applied to a combined analysis of DNA copy number, mRNA expression, and DNA methylation data in a glioblastoma multiforme study. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Binghui Liu
- School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin Province, China.,School of Statistics, University of Minnesota, 224 Church St. S.E., Minneapolis, 55455, MN, U.S.A.,Division of Biostatistics, University of Minnesota, 420 Delaware St. S.E., Minneapolis, 55455, MN, U.S.A
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, 224 Church St. S.E., Minneapolis, 55455, MN, U.S.A
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, 420 Delaware St. S.E., Minneapolis, 55455, MN, U.S.A
| |
Collapse
|
116
|
He H, Lin D, Zhang J, Wang Y, Deng HW. Biostatistics, Data Mining and Computational Modeling. TRANSLATIONAL BIOINFORMATICS 2016. [DOI: 10.1007/978-94-017-7543-4_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
117
|
Saqi M, Pellet J, Roznovat I, Mazein A, Ballereau S, De Meulder B, Auffray C. Systems Medicine: The Future of Medical Genomics, Healthcare, and Wellness. Methods Mol Biol 2016; 1386:43-60. [PMID: 26677178 DOI: 10.1007/978-1-4939-3283-2_3] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Recent advances in genomics have led to the rapid and relatively inexpensive collection of patient molecular data including multiple types of omics data. The integration of these data with clinical measurements has the potential to impact on our understanding of the molecular basis of disease and on disease management. Systems medicine is an approach to understanding disease through an integration of large patient datasets. It offers the possibility for personalized strategies for healthcare through the development of a new taxonomy of disease. Advanced computing will be an important component in effectively implementing systems medicine. In this chapter we describe three computational challenges associated with systems medicine: disease subtype discovery using integrated datasets, obtaining a mechanistic understanding of disease, and the development of an informatics platform for the mining, analysis, and visualization of data emerging from translational medicine studies.
Collapse
Affiliation(s)
- Mansoor Saqi
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Johann Pellet
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Irina Roznovat
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Alexander Mazein
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Stéphane Ballereau
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Bertrand De Meulder
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France
| | - Charles Auffray
- European Institute for Systems Biology and Medicine, CNRS-ENS-UCBL, Université de Lyon, 50 Avenue Tony Garnier, Lyon, 69007, France. .,Université Claude Bernard, 3e étage plot 2, 50 Avenue Tony Garnier, Lyon, Cedex 07, 69366, France.
| |
Collapse
|
118
|
Reis MM, Reis MG, Mills J, Ross C, Brightwell G. Characterization of volatile metabolites associated with confinement odour during the shelf-life of vacuum packed lamb meat under different storage conditions. Meat Sci 2015; 113:80-91. [PMID: 26624794 DOI: 10.1016/j.meatsci.2015.11.017] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 11/15/2015] [Accepted: 11/18/2015] [Indexed: 11/18/2022]
Abstract
Confinement odour was investigated. Volatiles were extracted directly from the pack, using solid phase microextraction and analysed by gas chromatography-mass spectrometry. Sensory evaluation and microbiological analysis of the meat surface were also performed. Commercial samples of vacuum packed lamb legs (n=85), from two meat processing plants, were kept for 7weeks at -1.5°C then at different regimes of temperature (-1.5 to +4°C) until 11, 12 or 13weeks. Persistent odour was observed in 66% of samples, confinement odour in 24% and no odour in 11%. Volatiles associated with confinement odour (3-methyl-butanal, 3-hydroxy-2-butanone and sulphur dioxide) corresponded with end/sub products of glucose fermentation and catabolism of amino acids by bacteria (all bacteria naturally found in meat and do not represent a risk to health). Confinement odour could indicate a stage at which the environment for bacteria growth is becoming favourable for the production of volatiles with strong odours that are noticed by the consumer.
Collapse
Affiliation(s)
- Marlon M Reis
- Food Assurance and Meat Science Team, Food and Bio-based Products Group, AgResearch, Ruakura Research Centre, 10 Bisley Road, Hamilton, New Zealand.
| | - Mariza G Reis
- Dairy Foods Team, Food and Bio-based Products Group, AgResearch, Ruakura Research Centre, 10 Bisley Road, Hamilton, New Zealand
| | - John Mills
- Food Assurance and Meat Science Team, Food and Bio-based Products Group, AgResearch, Hopkirk Research Institute, Massey University, Corner University Ave and Library Road, Palmerston North, New Zealand
| | - Colleen Ross
- Food Assurance and Meat Science Team, Food and Bio-based Products Group, AgResearch, Ruakura Research Centre, 10 Bisley Road, Hamilton, New Zealand
| | - Gale Brightwell
- Food Assurance and Meat Science Team, Food and Bio-based Products Group, AgResearch, Hopkirk Research Institute, Massey University, Corner University Ave and Library Road, Palmerston North, New Zealand
| |
Collapse
|
119
|
Small RNA Transcriptome of the Oral Microbiome during Periodontitis Progression. Appl Environ Microbiol 2015; 81:6688-99. [PMID: 26187962 DOI: 10.1128/aem.01782-15] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 07/12/2015] [Indexed: 02/06/2023] Open
Abstract
The oral microbiome is one of the most complex microbial communities in the human body, and due to circumstances not completely understood, the healthy microbial community becomes dysbiotic, giving rise to periodontitis, a polymicrobial inflammatory disease. We previously reported the results of community-wide gene expression changes in the oral microbiome during periodontitis progression and identified signatures associated with increasing severity of the disease. Small noncoding RNAs (sRNAs) are key players in posttranscriptional regulation, especially in fast-changing environments such as the oral cavity. Here, we expanded our analysis to the study of the sRNA metatranscriptome during periodontitis progression on the same samples for which mRNA expression changes were analyzed. We observed differential expression of 12,097 sRNAs, identifying a total of 20 Rfam sRNA families as being overrepresented in progression and 23 at baseline. Gene ontology activities regulated by the differentially expressed (DE) sRNAs included amino acid metabolism, ethanolamine catabolism, signal recognition particle-dependent cotranslational protein targeting to membrane, intron splicing, carbohydrate metabolism, control of plasmid copy number, and response to stress. In integrating patterns of expression of protein coding transcripts and sRNAs, we found that functional activities of genes that correlated positively with profiles of expression of DE sRNAs were involved in pathogenesis, proteolysis, ferrous iron transport, and oligopeptide transport. These findings represent the first integrated sequencing analysis of the community-wide sRNA transcriptome of the oral microbiome during periodontitis progression and show that sRNAs are key regulatory elements of the dysbiotic process leading to disease.
Collapse
|
120
|
Waller T, Gubała T, Sarapata K, Piwowar M, Jurkowski W. DNA microarray integromics analysis platform. BioData Min 2015; 8:18. [PMID: 26110022 PMCID: PMC4479227 DOI: 10.1186/s13040-015-0052-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Accepted: 06/19/2015] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND The study of interactions between molecules belonging to different biochemical families (such as lipids and nucleic acids) requires specialized data analysis methods. This article describes the DNA Microarray Integromics Analysis Platform, a unique web application that focuses on computational integration and analysis of "multi-omics" data. Our tool supports a range of complex analyses, including - among others - low- and high-level analyses of DNA microarray data, integrated analysis of transcriptomics and lipidomics data and the ability to infer miRNA-mRNA interactions. RESULTS We demonstrate the characteristics and benefits of the DNA Microarray Integromics Analysis Platform using two different test cases. The first test case involves the analysis of the nutrimouse dataset, which contains measurements of the expression of genes involved in nutritional problems and the concentrations of hepatic fatty acids. The second test case involves the analysis of miRNA-mRNA interactions in polysaccharide-stimulated human dermal fibroblasts infected with porcine endogenous retroviruses. CONCLUSIONS The DNA Microarray Integromics Analysis Platform is a web-based graphical user interface for "multi-omics" data management and analysis. Its intuitive nature and wide range of available workflows make it an effective tool for molecular biology research. The platform is hosted at https://lifescience.plgrid.pl/.
Collapse
Affiliation(s)
- Tomasz Waller
- Institute of Computer Science, Division of Biomedical Computer Systems, University of Silesia, Katowice, Poland ; Academic Computer Centre CYFRONET, AGH University of Science and Technology, Kraków, Poland
| | - Tomasz Gubała
- Academic Computer Centre CYFRONET, AGH University of Science and Technology, Kraków, Poland
| | - Krzysztof Sarapata
- Molecular Biology and Clinical Genetics Laboratory, Department of Medicine, Jagiellonian University, Kraków, Poland
| | - Monika Piwowar
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Kraków, Poland
| | | |
Collapse
|
121
|
Development of a Drug-Response Modeling Framework to Identify Cell Line Derived Translational Biomarkers That Can Predict Treatment Outcome to Erlotinib or Sorafenib. PLoS One 2015; 10:e0130700. [PMID: 26107615 PMCID: PMC4480971 DOI: 10.1371/journal.pone.0130700] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Accepted: 05/23/2015] [Indexed: 01/21/2023] Open
Abstract
Development of drug responsive biomarkers from pre-clinical data is a critical step in drug discovery, as it enables patient stratification in clinical trial design. Such translational biomarkers can be validated in early clinical trial phases and utilized as a patient inclusion parameter in later stage trials. Here we present a study on building accurate and selective drug sensitivity models for Erlotinib or Sorafenib from pre-clinical in vitro data, followed by validation of individual models on corresponding treatment arms from patient data generated in the BATTLE clinical trial. A Partial Least Squares Regression (PLSR) based modeling framework was designed and implemented, using a special splitting strategy and canonical pathways to capture robust information for model building. Erlotinib and Sorafenib predictive models could be used to identify a sub-group of patients that respond better to the corresponding treatment, and these models are specific to the corresponding drugs. The model derived signature genes reflect each drug’s known mechanism of action. Also, the models predict each drug’s potential cancer indications consistent with clinical trial results from a selection of globally normalized GEO expression datasets.
Collapse
|
122
|
Lange K, Hugenholtz F, Jonathan MC, Schols HA, Kleerebezem M, Smidt H, Müller M, Hooiveld GJEJ. Comparison of the effects of five dietary fibers on mucosal transcriptional profiles, and luminal microbiota composition and SCFA concentrations in murine colon. Mol Nutr Food Res 2015; 59:1590-602. [PMID: 25914036 DOI: 10.1002/mnfr.201400597] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Revised: 03/16/2015] [Accepted: 03/18/2015] [Indexed: 12/14/2022]
Abstract
SCOPE The aim of our study was to investigate and compare the effects of five fibers on the mucosal transcriptome, together with alterations in the luminal microbiota composition and SCFA concentrations in the colon. METHODS AND RESULTS Mice were fed fibers that differed in carbohydrate composition or a control diet for 10 days. Colonic gene expression profiles and luminal microbiota composition were determined by microarray techniques, and integrated using multivariate statistics. Our data showed a distinct reaction of the host and microbiota to resistant starch, a fiber that was not completely fermented in the colon, whereas the other fibers induced similar responses on gene expression and microbiota. Consistent associations were revealed between fiber-induced enrichment of Clostridium cluster IV and XIVa representatives, and changes in mucosal expression of genes related to energy metabolism. The nuclear receptor PPAR-γ was predicted to be an important regulator of the mucosal responses. CONCLUSION Results of this exploratory study suggest that despite different sources and composition, fermentable fibers induce a highly similar mucosal response that may at least be partially governed by PPAR-γ.
Collapse
Affiliation(s)
- Katja Lange
- Nutrition, Metabolism and Genomics group, Division of Human Nutrition, Wageningen University, Wageningen, the Netherlands.,Netherlands Consortium for Systems Biology, Amsterdam, the Netherlands
| | - Floor Hugenholtz
- Laboratory of Microbiology, Wageningen University, Wageningen, the Netherlands.,Netherlands Consortium for Systems Biology, Amsterdam, the Netherlands
| | - Melliana C Jonathan
- Laboratory of Food Chemistry, Wageningen University, Wageningen, the Netherlands
| | - Henk A Schols
- Laboratory of Food Chemistry, Wageningen University, Wageningen, the Netherlands.,TI Food and Nutrition, Wageningen, the Netherlands
| | - Michiel Kleerebezem
- Netherlands Consortium for Systems Biology, Amsterdam, the Netherlands.,TI Food and Nutrition, Wageningen, the Netherlands.,Host-Microbe Interactomics, Wageningen University, Wageningen, the Netherlands
| | - Hauke Smidt
- Laboratory of Microbiology, Wageningen University, Wageningen, the Netherlands.,Netherlands Consortium for Systems Biology, Amsterdam, the Netherlands.,TI Food and Nutrition, Wageningen, the Netherlands
| | - Michael Müller
- Nutrition, Metabolism and Genomics group, Division of Human Nutrition, Wageningen University, Wageningen, the Netherlands.,Netherlands Consortium for Systems Biology, Amsterdam, the Netherlands
| | - Guido J E J Hooiveld
- Nutrition, Metabolism and Genomics group, Division of Human Nutrition, Wageningen University, Wageningen, the Netherlands.,Netherlands Consortium for Systems Biology, Amsterdam, the Netherlands
| |
Collapse
|
123
|
Piwowar M, Jurkowski W. ONION: Functional Approach for Integration of Lipidomics and Transcriptomics Data. PLoS One 2015; 10:e0128854. [PMID: 26053255 PMCID: PMC4459700 DOI: 10.1371/journal.pone.0128854] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 05/03/2015] [Indexed: 12/19/2022] Open
Abstract
To date, the massive quantity of data generated by high-throughput techniques has not yet met bioinformatics treatment required to make full use of it. This is partially due to a mismatch in experimental and analytical study design but primarily due to a lack of adequate analytical approaches. When integrating multiple data types e.g. transcriptomics and metabolomics, multidimensional statistical methods are currently the techniques of choice. Typical statistical approaches, such as canonical correlation analysis (CCA), that are applied to find associations between metabolites and genes are failing due to small numbers of observations (e.g. conditions, diet etc.) in comparison to data size (number of genes, metabolites). Modifications designed to cope with this issue are not ideal due to the need to add simulated data resulting in a lack of p-value computation or by pruning of variables hence losing potentially valid information. Instead, our approach makes use of verified or putative molecular interactions or functional association to guide analysis. The workflow includes dividing of data sets to reach the expected data structure, statistical analysis within groups and interpretation of results. By applying pathway and network analysis, data obtained by various platforms are grouped with moderate stringency to avoid functional bias. As a consequence CCA and other multivariate models can be applied to calculate robust statistics and provide easy to interpret associations between metabolites and genes to leverage understanding of metabolic response. Effective integration of lipidomics and transcriptomics is demonstrated on publically available murine nutrigenomics data sets. We are able to demonstrate that our approach improves detection of genes related to lipid metabolism, in comparison to applying statistics alone. This is measured by increased percentage of explained variance (95% vs. 75–80%) and by identifying new metabolite-gene associations related to lipid metabolism.
Collapse
Affiliation(s)
- Monika Piwowar
- Department of Bioinformatics and Telemedicine, Jagiellonian University, Kopernika 7E, 31–062 Kraków, Poland
| | - Wiktor Jurkowski
- The Genome Analysis Centre, Norwich Research Park, Norwich NR4 7UH, United Kingdom
- * E-mail:
| |
Collapse
|
124
|
Mach N, Berri M, Estellé J, Levenez F, Lemonnier G, Denis C, Leplat JJ, Chevaleyre C, Billon Y, Doré J, Rogel-Gaillard C, Lepage P. Early-life establishment of the swine gut microbiome and impact on host phenotypes. ENVIRONMENTAL MICROBIOLOGY REPORTS 2015; 7:554-69. [PMID: 25727666 DOI: 10.1111/1758-2229.12285] [Citation(s) in RCA: 264] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 02/22/2015] [Indexed: 05/03/2023]
Abstract
Early bacterial colonization and succession within the gastrointestinal tract has been suggested to be crucial in the establishment of specific microbiota composition and the shaping of host phenotype. Here, the composition and dynamics of faecal microbiomes were studied for 31 healthy piglets across five age strata (days 14, 36, 48, 60 and 70 after birth) together with their mothers. Faecal microbiome composition was assessed by 16S rRNA gene 454-pyrosequencing. Bacteroidetes and Firmicutes were the predominant phyla present at each age. For all piglets, luminal secretory IgA concentration was measured at day 70, and body weight was recorded until day 70. The microbiota of suckling piglets was mainly represented by Bacteroides, Oscillibacter, Escherichia/Shigella, Lactobacillus and unclassified Ruminococcaceae genera. This pattern contrasted with that of Acetivibrio, Dialister, Oribacterium, Succinivibrio and Prevotella genera, which appeared increased after weaning. Lactobacillus fermentum might be vertically transferred via breast milk or faeces. The microbiota composition coevolved with their hosts towards two different clusters after weaning, primarily distinguished by unclassified Ruminococcaceae and Prevotella abundances. Prevotella was positively correlated with luminal secretory IgA concentrations, and body weight. Our study opens up new possibilities for health and feed efficiency manipulation via genetic selection and nutrition in the agricultural domain.
Collapse
Affiliation(s)
- Núria Mach
- INRA, UMR1319 MICALIS, Jouy-en-Josas, France
- AgroParisTech, UMR1319 MICALIS, Jouy-en-Josas, France
- INRA, UMR 1313 Génétique Animale et Biologie Intégrative, Jouy-en-Josas, France
- AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, France
| | - Mustapha Berri
- UMR1282 ISP, INRA, Nouzilly, France
- UMR1282 ISP, Université de Tours, Tours, France
| | - Jordi Estellé
- INRA, UMR 1313 Génétique Animale et Biologie Intégrative, Jouy-en-Josas, France
- AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, France
| | - Florence Levenez
- INRA, UMR1319 MICALIS, Jouy-en-Josas, France
- AgroParisTech, UMR1319 MICALIS, Jouy-en-Josas, France
| | - Gaëtan Lemonnier
- INRA, UMR 1313 Génétique Animale et Biologie Intégrative, Jouy-en-Josas, France
- AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, France
| | - Catherine Denis
- INRA, UMR 1313 Génétique Animale et Biologie Intégrative, Jouy-en-Josas, France
- AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, France
| | - Jean-Jacques Leplat
- INRA, UMR 1313 Génétique Animale et Biologie Intégrative, Jouy-en-Josas, France
- AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, France
- CEA, DSV-IRCM-LREG, Jouy-en-Josas, France
| | - Claire Chevaleyre
- UMR1282 ISP, INRA, Nouzilly, France
- UMR1282 ISP, Université de Tours, Tours, France
| | | | - Joël Doré
- INRA, UMR1319 MICALIS, Jouy-en-Josas, France
- AgroParisTech, UMR1319 MICALIS, Jouy-en-Josas, France
| | - Claire Rogel-Gaillard
- INRA, UMR 1313 Génétique Animale et Biologie Intégrative, Jouy-en-Josas, France
- AgroParisTech, UMR 1313 Génétique Animale et Biologie Intégrative, France
| | - Patricia Lepage
- INRA, UMR1319 MICALIS, Jouy-en-Josas, France
- AgroParisTech, UMR1319 MICALIS, Jouy-en-Josas, France
| |
Collapse
|
125
|
Gupta A, Mayer EA, Sanmiguel CP, Van Horn JD, Woodworth D, Ellingson BM, Fling C, Love A, Tillisch K, Labus JS. Patterns of brain structural connectivity differentiate normal weight from overweight subjects. NEUROIMAGE-CLINICAL 2015; 7:506-17. [PMID: 25737959 PMCID: PMC4338207 DOI: 10.1016/j.nicl.2015.01.005] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Background Alterations in the hedonic component of ingestive behaviors have been implicated as a possible risk factor in the pathophysiology of overweight and obese individuals. Neuroimaging evidence from individuals with increasing body mass index suggests structural, functional, and neurochemical alterations in the extended reward network and associated networks. Aim To apply a multivariate pattern analysis to distinguish normal weight and overweight subjects based on gray and white-matter measurements. Methods Structural images (N = 120, overweight N = 63) and diffusion tensor images (DTI) (N = 60, overweight N = 30) were obtained from healthy control subjects. For the total sample the mean age for the overweight group (females = 32, males = 31) was 28.77 years (SD = 9.76) and for the normal weight group (females = 32, males = 25) was 27.13 years (SD = 9.62). Regional segmentation and parcellation of the brain images was performed using Freesurfer. Deterministic tractography was performed to measure the normalized fiber density between regions. A multivariate pattern analysis approach was used to examine whether brain measures can distinguish overweight from normal weight individuals. Results 1. White-matter classification: The classification algorithm, based on 2 signatures with 17 regional connections, achieved 97% accuracy in discriminating overweight individuals from normal weight individuals. For both brain signatures, greater connectivity as indexed by increased fiber density was observed in overweight compared to normal weight between the reward network regions and regions of the executive control, emotional arousal, and somatosensory networks. In contrast, the opposite pattern (decreased fiber density) was found between ventromedial prefrontal cortex and the anterior insula, and between thalamus and executive control network regions. 2. Gray-matter classification: The classification algorithm, based on 2 signatures with 42 morphological features, achieved 69% accuracy in discriminating overweight from normal weight. In both brain signatures regions of the reward, salience, executive control and emotional arousal networks were associated with lower morphological values in overweight individuals compared to normal weight individuals, while the opposite pattern was seen for regions of the somatosensory network. Conclusions 1. An increased BMI (i.e., overweight subjects) is associated with distinct changes in gray-matter and fiber density of the brain. 2. Classification algorithms based on white-matter connectivity involving regions of the reward and associated networks can identify specific targets for mechanistic studies and future drug development aimed at abnormal ingestive behavior and in overweight/obesity. Multivariate analysis can be used to classify overweight from normal weight individuals. Anatomical connectivity achieved 97% accuracy in the classification algorithm. Greater connectivity was observed in extended reward and somatosensory regions. Morphological gray-matter achieved 69% accuracy in the classification algorithm. Lower morphological values were observed in regions of the extended reward network.
Collapse
Key Words
- ACC, anterior cingulate cortex
- ANOVA, analysis of variance
- Anatomical white-matter connectivity
- BMI, body mass index
- CT, cortical thickness
- Classification algorithm
- DTI, diffusion tensor imaging
- DWI, diffusion-weighted MRIs
- FA, flip angle
- FACT, fiber assignment by continuous tracking
- FDR, false-discovery rate
- FOV, field of view
- GLM, general linear model
- GMV, gray matter volume
- HAD, hospital anxiety and Depression Scale
- HC, healthy control
- MC, mean curvature
- Morphological gray-matter
- Multivariate analysis
- NPV, negative predictive value
- OFG, orbitofrontal gyrus
- Obesity
- Overweight
- PPC, posterior parietal cortex
- PPV, positive predictive value
- Reward network
- SA, surface area
- SPSS, statistical package for the social sciences
- TE, echo time
- TR, repetition time
- VIP, variable importance in projection
- VTA, ventral tegmental area
- aMCC, anterior mid cingulate cortex
- dlPFC, dorsolateral prefrontal cortex
- sPLS-DA, sparse partial least squares for discrimination Analysis
- sgACC, subgenual anterior cingulate cortex
- vmPFC, ventromedial prefrontal cortex
Collapse
Affiliation(s)
- Arpana Gupta
- Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA
| | - Emeran A Mayer
- Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA ; Ahmanson-Lovelace Brain Mapping Center, UCLA, Los Angeles, CA, USA
| | - Claudia P Sanmiguel
- Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA
| | - John D Van Horn
- The Institute for Neuroimaging and Informatics, Keck School of Medicine, USC, Los Angeles, CA, USA
| | - Davis Woodworth
- Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; Radiology, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Benjamin M Ellingson
- Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; Radiology, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
| | - Connor Fling
- Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA
| | - Aubrey Love
- Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA
| | - Kirsten Tillisch
- Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA ; Integrative Medicine, GLA VHA, UCLA, Los Angeles, CA, USA
| | - Jennifer S Labus
- Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA
| |
Collapse
|
126
|
Rajasundaram D, Runavot JL, Guo X, Willats WGT, Meulewaeter F, Selbig J. Understanding the relationship between cotton fiber properties and non-cellulosic cell wall polysaccharides. PLoS One 2014; 9:e112168. [PMID: 25383868 PMCID: PMC4226482 DOI: 10.1371/journal.pone.0112168] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Accepted: 10/06/2014] [Indexed: 12/03/2022] Open
Abstract
A detailed knowledge of cell wall heterogeneity and complexity is crucial for understanding plant growth and development. One key challenge is to establish links between polysaccharide-rich cell walls and their phenotypic characteristics. It is of particular interest for some plant material, like cotton fibers, which are of both biological and industrial importance. To this end, we attempted to study cotton fiber characteristics together with glycan arrays using regression based approaches. Taking advantage of the comprehensive microarray polymer profiling technique (CoMPP), 32 cotton lines from different cotton species were studied. The glycan array was generated by sequential extraction of cell wall polysaccharides from mature cotton fibers and screening samples against eleven extensively characterized cell wall probes. Also, phenotypic characteristics of cotton fibers such as length, strength, elongation and micronaire were measured. The relationship between the two datasets was established in an integrative manner using linear regression methods. In the conducted analysis, we demonstrated the usefulness of regression based approaches in establishing a relationship between glycan measurements and phenotypic traits. In addition, the analysis also identified specific polysaccharides which may play a major role during fiber development for the final fiber characteristics. Three different regression methods identified a negative correlation between micronaire and the xyloglucan and homogalacturonan probes. Moreover, homogalacturonan and callose were shown to be significant predictors for fiber length. The role of these polysaccharides was already pointed out in previous cell wall elongation studies. Additional relationships were predicted for fiber strength and elongation which will need further experimental validation.
Collapse
Affiliation(s)
- Dhivyaa Rajasundaram
- Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, 14476, Germany
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - Jean-Luc Runavot
- Bayer CropScience NV-Innovation Center, Technologiepark 38, 9052 Gent, Belgium
| | - Xiaoyuan Guo
- Department of Plant and Environmental Sciences, Faculty of Sciences, University of Copenhagen, Thorvaldsensvej, 40 1.1871, Fredriksberg C, Denmark
| | - William G. T. Willats
- Department of Plant and Environmental Sciences, Faculty of Sciences, University of Copenhagen, Thorvaldsensvej, 40 1.1871, Fredriksberg C, Denmark
| | - Frank Meulewaeter
- Bayer CropScience NV-Innovation Center, Technologiepark 38, 9052 Gent, Belgium
| | - Joachim Selbig
- Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, 14476, Germany
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
- * E-mail:
| |
Collapse
|
127
|
Lin D, Cao H, Calhoun VD, Wang YP. Sparse models for correlative and integrative analysis of imaging and genetic data. J Neurosci Methods 2014; 237:69-78. [PMID: 25218561 DOI: 10.1016/j.jneumeth.2014.09.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Revised: 08/27/2014] [Accepted: 09/01/2014] [Indexed: 11/29/2022]
Abstract
The development of advanced medical imaging technologies and high-throughput genomic measurements has enhanced our ability to understand their interplay as well as their relationship with human behavior by integrating these two types of datasets. However, the high dimensionality and heterogeneity of these datasets presents a challenge to conventional statistical methods; there is a high demand for the development of both correlative and integrative analysis approaches. Here, we review our recent work on developing sparse representation based approaches to address this challenge. We show how sparse models are applied to the correlation and integration of imaging and genetic data for biomarker identification. We present examples on how these approaches are used for the detection of risk genes and classification of complex diseases such as schizophrenia. Finally, we discuss future directions on the integration of multiple imaging and genomic datasets including their interactions such as epistasis.
Collapse
Affiliation(s)
- Dongdong Lin
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA; Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA.
| | - Hongbao Cao
- Unit on Statistical Genomics, Intramural Program of Research, National Institute of Mental Health, NIH, Bethesda 20852, USA.
| | - Vince D Calhoun
- The Mind Research Network & LBERI, Albuquerque, NM 87106, USA; Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, USA.
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, LA, 70118, USA; Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, 70112, USA.
| |
Collapse
|
128
|
Zendehdel R. Oxidative Damage Modeling by Biomonitoring of Exposure to Metals for Manual Metal Arc Welders. HEALTH SCOPE 2014. [DOI: 10.17795/jhealthscope-16440] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
129
|
Lin D, Calhoun VD, Wang YP. Correspondence between fMRI and SNP data by group sparse canonical correlation analysis. Med Image Anal 2014; 18:891-902. [PMID: 24247004 PMCID: PMC4007390 DOI: 10.1016/j.media.2013.10.010] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Revised: 08/27/2013] [Accepted: 10/16/2013] [Indexed: 10/26/2022]
Abstract
Both genetic variants and brain region abnormalities are recognized as important factors for complex diseases (e.g., schizophrenia). In this paper, we investigated the correspondence between single nucleotide polymorphism (SNP) and brain activity measured by functional magnetic resonance imaging (fMRI) to understand how genetic variation influences the brain activity. A group sparse canonical correlation analysis method (group sparse CCA) was developed to explore the correlation between these two datasets which are high dimensional-the number of SNPs/voxels is far greater than the number of samples. Different from the existing sparse CCA methods (sCCA), our approach can exploit structural information in the correlation analysis by introducing group constraints. A simulation study demonstrates that it outperforms the existing sCCA. We applied this method to the real data analysis and identified two pairs of significant canonical variates with average correlations of 0.4527 and 0.4292 respectively, which were used to identify genes and voxels associated with schizophrenia. The selected genes are mostly from 5 schizophrenia (SZ)-related signalling pathways. The brain mappings of the selected voxles also indicate the abnormal brain regions susceptible to schizophrenia. A gene and brain region of interest (ROI) correlation analysis was further performed to confirm the significant correlations between genes and ROIs.
Collapse
Affiliation(s)
- Dongdong Lin
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA; Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA 70118, USA.
| | - Vince D Calhoun
- The Mind Research Network, Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, USA.
| | - Yu-Ping Wang
- Biomedical Engineering Department, Tulane University, New Orleans, LA 70118, USA; Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA 70118, USA; Center for Systems Biomedicine, Shanghai University for Science and Technology, Shanghai, China.
| |
Collapse
|
130
|
Meng C, Kuster B, Culhane AC, Gholami AM. A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics 2014; 15:162. [PMID: 24884486 PMCID: PMC4053266 DOI: 10.1186/1471-2105-15-162] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 05/14/2014] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND To leverage the potential of multi-omics studies, exploratory data analysis methods that provide systematic integration and comparison of multiple layers of omics information are required. We describe multiple co-inertia analysis (MCIA), an exploratory data analysis method that identifies co-relationships between multiple high dimensional datasets. Based on a covariance optimization criterion, MCIA simultaneously projects several datasets into the same dimensional space, transforming diverse sets of features onto the same scale, to extract the most variant from each dataset and facilitate biological interpretation and pathway analysis. RESULTS We demonstrate integration of multiple layers of information using MCIA, applied to two typical "omics" research scenarios. The integration of transcriptome and proteome profiles of cells in the NCI-60 cancer cell line panel revealed distinct, complementary features, which together increased the coverage and power of pathway analysis. Our analysis highlighted the importance of the leukemia extravasation signaling pathway in leukemia that was not highly ranked in the analysis of any individual dataset. Secondly, we compared transcriptome profiles of high grade serous ovarian tumors that were obtained, on two different microarray platforms and next generation RNA-sequencing, to identify the most informative platform and extract robust biomarkers of molecular subtypes. We discovered that the variance of RNA-sequencing data processed using RPKM had greater variance than that with MapSplice and RSEM. We provided novel markers highly associated to tumor molecular subtype combined from four data platforms. MCIA is implemented and available in the R/Bioconductor "omicade4" package. CONCLUSION We believe MCIA is an attractive method for data integration and visualization of several datasets of multi-omics features observed on the same set of individuals. The method is not dependent on feature annotation, and thus it can extract important features even when there are not present across all datasets. MCIA provides simple graphical representations for the identification of relationships between large datasets.
Collapse
Affiliation(s)
- Chen Meng
- Chair of Proteomics and Bioanalytics, Technische Universität München, Freising, Germany
| | - Bernhard Kuster
- Chair of Proteomics and Bioanalytics, Technische Universität München, Freising, Germany
- Center for Integrated Protein Science Munich, Freising, Germany
| | - Aedín C Culhane
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA
| | | |
Collapse
|
131
|
Jiang M, Wang C, Zhang Y, Feng Y, Wang Y, Zhu Y. Sparse partial-least-squares discriminant analysis for different geographical origins of Salvia miltiorrhiza by (1) H-NMR-based metabolomics. PHYTOCHEMICAL ANALYSIS : PCA 2014; 25:50-58. [PMID: 23868756 DOI: 10.1002/pca.2461] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Revised: 06/09/2013] [Accepted: 06/09/2013] [Indexed: 06/02/2023]
Abstract
INTRODUCTION (1) H nuclear magnetic resonance (NMR) spectroscopy has clear advantages in respect of detecting various primary and secondary metabolites in plants simultaneously, non-targeted and non-destructively. OBJECTIVE To establish a method for detecting both primary and secondary metabolites in Salvia miltiorrhiza and screening potential geographical biomarkers effectively. METHODS Primary and secondary metabolites of S. militiorrhiza were detected and identified by (1) H-NMR fingerprint. Sparse partial-least-squares discriminant analysis (sPLS-DA) was undertaken for classification and variable selection in a one-step procedure and the classification error rates were implemented to estimate the cluster validation of sPLS-DA. Potential candidate metabolites by characterised different geographical origins of S. miltiorrhiza were identified according to the sparse loading vectors. The levels of these metabolites were quantified and evaluated by Kruskal-Wallis tests and also showed significant difference. RESULTS Twenty-six primary and secondary metabolites were identified in samples from different regions. The results suggest that malonate and succinate can be possibly recognised as the key markers for discriminating the geographical origin of S. miltiorrhiza based on the regulation and influence on the root respiratory rates of plants. CONCLUSION (1) H-NMR metabolic profiling combination with PLS-DA provided a very efficient and visualised representation of similarities and dissimilarities between S. miltiorrhiza samples.
Collapse
Affiliation(s)
- Miaomiao Jiang
- Tianjin State Key Laboratory of Modern Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin, 300193, PR China; Key Laboratory of TCM Chemistry and Analysis, Tianjin University of Traditional Chinese Medicine, Tianjin, 300193, PR China; Research and Development Center of TCM, Tianjin International Joint Academy of Biotechnology and Medicine, Tianjin, 300457, PR China
| | | | | | | | | | | |
Collapse
|
132
|
Mach N, Gao Y, Lemonnier G, Lecardonnel J, Oswald IP, Estellé J, Rogel-Gaillard C. The peripheral blood transcriptome reflects variations in immunity traits in swine: towards the identification of biomarkers. BMC Genomics 2013; 14:894. [PMID: 24341289 PMCID: PMC3878494 DOI: 10.1186/1471-2164-14-894] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Accepted: 12/04/2013] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Immune traits (ITs) are potentially relevant criteria to characterize an individual's immune response. Our aim was to investigate whether the peripheral blood transcriptome can provide a significant and comprehensive view of IT variations in pig. RESULTS Sixty-day-old Large White pigs classified as extreme for in vitro production of IL2, IL10, IFNγ and TNFα, phagocytosis activity, in vivo CD4⁻/CD8⁺ or TCRγδ + cell counts, and anti-Mycoplasma antibody levels were chosen to perform a blood transcriptome analysis with a porcine generic array enriched with immunity-related genes. Differentially expressed (DE) genes for in vitro production of IL2 and IL10, phagocytosis activity and CD4⁻/CD8⁺ cell counts were identified. Gene set enrichment analysis revealed a significant over-representation of immune response functions. To validate the microarray-based results, a subset of DE genes was confirmed by RT-qPCR. An independent set of 74 animals was used to validate the covariation between gene expression levels and ITs. Five potential gene biomarkers were found for prediction of IL2 (RALGDS), phagocytosis (ALOX12) or CD4⁻/CD8⁺ cell count (GNLY, KLRG1 and CX3CR1). On average, these biomarkers performed with a sensitivity of 79% and a specificity of 86%. CONCLUSIONS Our results confirmed that gene expression profiling in blood represents a relevant molecular phenotype to refine ITs in pig and to identify potential biomarkers that can provide new insights into immune response analysis.
Collapse
Affiliation(s)
- Núria Mach
- INRA, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
- AgroParisTech, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
| | - Yu Gao
- Department of Nutritional Sciences, University of Wisconsin-Madison, Madison, USA
| | - Gaëtan Lemonnier
- INRA, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
- AgroParisTech, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
| | - Jérôme Lecardonnel
- INRA, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
- AgroParisTech, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
| | - Isabelle P Oswald
- INRA, UMR1331, Toxalim, Research Centre in Food Toxicology, F-31027 Toulouse, France
- Université de Toulouse III, INP, Toxalim, F- 31076 Toulouse, France
| | - Jordi Estellé
- INRA, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
- AgroParisTech, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
| | - Claire Rogel-Gaillard
- INRA, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
- AgroParisTech, UMR1313 Génétique Animale et Biologie Intégrative, F-78350 Jouy-en-Josas, France
| |
Collapse
|
133
|
Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population. Neuroimage 2013; 84:698-711. [PMID: 24096125 DOI: 10.1016/j.neuroimage.2013.09.048] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 09/11/2013] [Accepted: 09/20/2013] [Indexed: 12/12/2022] Open
Abstract
This study establishes that sparse canonical correlation analysis (SCCAN) identifies generalizable, structural MRI-derived cortical networks that relate to five distinct categories of cognition. We obtain multivariate psychometrics from the domain-specific sub-scales of the Philadelphia Brief Assessment of Cognition (PBAC). By using a training and separate testing stage, we find that PBAC-defined cognitive domains of language, visuospatial functioning, episodic memory, executive control, and social functioning correlate with unique and distributed areas of gray matter (GM). In contrast, a parallel univariate framework fails to identify, from the training data, regions that are also significant in the left-out test dataset. The cohort includes164 patients with Alzheimer's disease, behavioral-variant frontotemporal dementia, semantic variant primary progressive aphasia, non-fluent/agrammatic primary progressive aphasia, or corticobasal syndrome. The analysis is implemented with open-source software for which we provide examples in the text. In conclusion, we show that multivariate techniques identify biologically-plausible brain regions supporting specific cognitive domains. The findings are identified in training data and confirmed in test data.
Collapse
|
134
|
Lin D, Zhang J, Li J, Calhoun VD, Deng HW, Wang YP. Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics 2013; 14:245. [PMID: 23937249 PMCID: PMC3751310 DOI: 10.1186/1471-2105-14-245] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 08/08/2013] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group). RESULTS We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features. CONCLUSIONS The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature selection simultaneously. It outperforms the two sCCA methods (CCA-l1 and CCA-group) by identifying the correlated features with more true positives while controlling total discordance at a lower level on the simulated data, even if the group effect does not exist or there are irrelevant features grouped with true correlated features. Compared with our proposed CCA-group sparse models, CCA-l1 tends to select less true correlated features while CCA-group inclines to select more redundant features.
Collapse
Affiliation(s)
- Dongdong Lin
- Biomedical Engineering Department, Tulane University, New Orleans, LA, USA
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
| | - Jigang Zhang
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, USA
| | - Jingyao Li
- Biomedical Engineering Department, Tulane University, New Orleans, LA, USA
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
| | - Vince D Calhoun
- The Mind Research Network, Albuquerque, NM, 87131, USA
- Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM, 87131, USA
| | - Hong-Wen Deng
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, USA
| | - Yu-Ping Wang
- Biomedical Engineering Department, Tulane University, New Orleans, LA, USA
- Center of Genomics and Bioinformatics, Tulane University, New Orleans, LA, USA
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, LA, USA
| |
Collapse
|
135
|
GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm. PLoS Genet 2013; 9:e1003657. [PMID: 23950726 PMCID: PMC3738451 DOI: 10.1371/journal.pgen.1003657] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 05/30/2013] [Indexed: 01/06/2023] Open
Abstract
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n>100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space. Nowadays, the availability of cheaper and accurate assays to quantify multiple (endo)phenotypes in large population cohorts allows multi-trait studies. However, these studies are limited by the lack of flexible models integrated with efficient computational tools for genome-wide multi SNPs-traits analyses. To overcome this problem, we propose a novel Bayesian analysis strategy and a new algorithmic implementation which exploits parallel processing architecture for fully multivariate modeling of groups of correlated phenotypes at the genome-wide scale. In addition to increased power of our algorithm over alternative Bayesian and well-established non-Bayesian multi-phenotype methods, we provide an application to a real case study of several blood lipid traits, and show how our method recovered most of the major associations and is better at refining multi-trait polygenic associations than alternative methods. We reveal and replicate in independent cohorts new associations with two phenotypic groups that were not detected by competing multivariate approaches and not noticed by a large meta-GWAS. We also discuss the applicability of the proposed method to large meta-analyses involving hundreds of thousands of individuals and to diverse genomic datasets where complex dependencies in the predictor space are present.
Collapse
|
136
|
Valledor L, Furuhashi T, Hanak AM, Weckwerth W. Systemic cold stress adaptation of Chlamydomonas reinhardtii. Mol Cell Proteomics 2013; 12:2032-47. [PMID: 23564937 PMCID: PMC3734567 DOI: 10.1074/mcp.m112.026765] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 03/15/2013] [Indexed: 11/06/2022] Open
Abstract
Chlamydomonas reinhardtii is one of the most important model organisms nowadays phylogenetically situated between higher plants and animals (Merchant et al. 2007). Stress adaptation of this unicellular model algae is in the focus because of its relevance to biomass and biofuel production. Here, we have studied cold stress adaptation of C. reinhardtii hitherto not described for this algae whereas intensively studied in higher plants. Toward this goal, high throughput mass spectrometry was employed to integrate proteome, metabolome, physiological and cell-morphological changes during a time-course from 0 to 120 h. These data were complemented with RT-qPCR for target genes involved in central metabolism, signaling, and lipid biosynthesis. Using this approach dynamics in central metabolism were linked to cold-stress dependent sugar and autophagy pathways as well as novel genes in C. reinhardtii such as CKIN1, CKIN2 and a hitherto functionally not annotated protein named CKIN3. Cold stress affected extensively the physiology and the organization of the cell. Gluconeogenesis and starch biosynthesis pathways are activated leading to a pronounced starch and sugar accumulation. Quantitative lipid profiles indicate a sharp decrease in the lipophilic fraction and an increase in polyunsaturated fatty acids suggesting this as a mechanism of maintaining membrane fluidity. The proteome is completely remodeled during cold stress: specific candidates of the ribosome and the spliceosome indicate altered biosynthesis and degradation of proteins important for adaptation to low temperatures. Specific proteasome degradation may be mediated by the observed cold-specific changes in the ubiquitinylation system. Sparse partial least squares regression analysis was applied for protein correlation network analysis using proteins as predictors and Fv/Fm, FW, total lipids, and starch as responses. We applied also Granger causality analysis and revealed correlations between proteins and metabolites otherwise not detectable. Twenty percent of the proteins responsive to cold are uncharacterized proteins. This presents a considerable resource for new discoveries in cold stress biology in alga and plants.
Collapse
Affiliation(s)
- Luis Valledor
- ‡From the Department of Molecular Systems Biology, Faculty of Life Sciences, University of Vienna, Austria, Althanstrasse 14, A-1090, Vienna, Austria
| | - Takeshi Furuhashi
- ‡From the Department of Molecular Systems Biology, Faculty of Life Sciences, University of Vienna, Austria, Althanstrasse 14, A-1090, Vienna, Austria
| | - Anne-Mette Hanak
- ‡From the Department of Molecular Systems Biology, Faculty of Life Sciences, University of Vienna, Austria, Althanstrasse 14, A-1090, Vienna, Austria
| | - Wolfram Weckwerth
- ‡From the Department of Molecular Systems Biology, Faculty of Life Sciences, University of Vienna, Austria, Althanstrasse 14, A-1090, Vienna, Austria
| |
Collapse
|
137
|
Chadeau-Hyam M, Campanella G, Jombart T, Bottolo L, Portengen L, Vineis P, Liquet B, Vermeulen RCH. Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2013; 54:542-557. [PMID: 23918146 DOI: 10.1002/em.21797] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Revised: 05/21/2013] [Accepted: 05/28/2013] [Indexed: 05/28/2023]
Abstract
Recent technological advances in molecular biology have given rise to numerous large-scale datasets whose analysis imposes serious methodological challenges mainly relating to the size and complex structure of the data. Considerable experience in analyzing such data has been gained over the past decade, mainly in genetics, from the Genome-Wide Association Study era, and more recently in transcriptomics and metabolomics. Building upon the corresponding literature, we provide here a nontechnical overview of well-established methods used to analyze OMICS data within three main types of regression-based approaches: univariate models including multiple testing correction strategies, dimension reduction techniques, and variable selection models. Our methodological description focuses on methods for which ready-to-use implementations are available. We describe the main underlying assumptions, the main features, and advantages and limitations of each of the models. This descriptive summary constitutes a useful tool for driving methodological choices while analyzing OMICS data, especially in environmental epidemiology, where the emergence of the exposome concept clearly calls for unified methods to analyze marginally and jointly complex exposure and OMICS datasets.
Collapse
Affiliation(s)
- Marc Chadeau-Hyam
- Department of Epidemiology and Biostatistics, MRC-HPA Centre for Environment and Health, School of Public Health, Imperial College London, Norfolk Place, London, W2 1PG, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
138
|
Kulpa DA, Lawani M, Cooper A, Peretz Y, Ahlers J, Sékaly RP. PD-1 coinhibitory signals: the link between pathogenesis and protection. Semin Immunol 2013; 25:219-27. [PMID: 23548749 DOI: 10.1016/j.smim.2013.02.002] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 02/15/2013] [Indexed: 12/31/2022]
Abstract
In the majority of HIV-1 infected individuals, the adaptive immune response drives virus escape resulting in persistent viremia and a lack of immune-mediated control. The expression of negative regulatory molecules such as PD-1 during chronic HIV infection provides a useful marker to differentiate functional memory T cell subsets and the frequency of T cells with an exhausted phenotype. In addition, cell-based measurements of virus persistence equate with activation markers and the frequency of CD4 T cells expressing PD-1. High-level expression of PD-1 and its ligands PD-L1 and PD-L2 are found on hematopoietic and non-hematopoietic cells, and are upregulated by chronic antigen stimulation, Type 1 and Type II interferons (IFNs), and homeostatic cytokines. In HIV infected subjects, PD-1 levels on CD4 and CD8 T cells continue to remain high following combination anti-retroviral therapy (cART). System biology approaches have begun to elucidate signal transduction pathways regulated by PD-1 expression in CD4 and CD8 T cell subsets that become dysfunctional through chronic TCR activation and PD-1 signaling. In this review, we summarize our current understanding of transcriptional signatures and signal transduction pathways associated with immune exhaustion with a focus on recent work in our laboratory characterizing the role of PD-1 in T cell dysfunction and HIV pathogenesis. We also highlight the therapeutic potential of blocking PD-1-PD-L1 and other immune checkpoints for activating potent cellular immune responses against chronic viral infections and cancer.
Collapse
Affiliation(s)
- Deanna A Kulpa
- Division of Infectious Diseases, Vaccine and Gene Therapy Institute-Florida (VGTI-FL), Port Saint Lucie, FL, United States
| | | | | | | | | | | |
Collapse
|
139
|
Shen R, Wang S, Mo Q. SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS. Ann Appl Stat 2013; 7:269-294. [PMID: 24587839 DOI: 10.1214/12-aoas578] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation, and gene expression associated with a disease. An integrated genomic profiling approach measuring multiple omics data types simultaneously in the same set of biological samples would render an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso (Tibshirani, 1996), elastic net (Zou and Hastie, 2005), and fused lasso (Tibshirani et al., 2005) methods to induce sparsity in the coefficient vectors, revealing important genomic features that have significant contributions to the latent variables. An iterative ridge regression is used to compute the sparse coefficient vectors. In model selection, a uniform design (Fang and Wang, 1994) is used to seek "experimental" points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic, and transcriptomic data for subtype analysis in breast and lung cancer data sets.
Collapse
|
140
|
Inter-individual differences in response to dietary intervention: integrating omics platforms towards personalised dietary recommendations. Proc Nutr Soc 2013; 72:207-18. [PMID: 23388096 DOI: 10.1017/s0029665113000025] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Technologic advances now make it possible to collect large amounts of genetic, epigenetic, metabolomic and gut microbiome data. These data have the potential to transform approaches towards nutrition counselling by allowing us to recognise and embrace the metabolic, physiologic and genetic differences among individuals. The ultimate goal is to be able to integrate these multi-dimensional data so as to characterise the health status and disease risk of an individual and to provide personalised recommendations to maximise health. To this end, accurate and predictive systems-based measures of health are needed that incorporate molecular signatures of genes, transcripts, proteins, metabolites and microbes. Although we are making progress within each of these omics arenas, we have yet to integrate effectively multiple sources of biologic data so as to provide comprehensive phenotypic profiles. Observational studies have provided some insights into associative interactions between genetic or phenotypic variation and diet and their impact on health; however, very few human experimental studies have addressed these relationships. Dietary interventions that test prescribed diets in well-characterised study populations and that monitor system-wide responses (ideally using several omics platforms) are needed to make correlation-causation connections and to characterise phenotypes under controlled conditions. Given the growth in our knowledge, there is the potential to develop personalised dietary recommendations. However, developing these recommendations assumes that an improved understanding of the phenotypic complexities of individuals and their responses to the complexities of their diets will lead to a sustainable, effective approach to promote health and prevent disease - therein lies our challenge.
Collapse
|
141
|
Liquet B, Cao KAL, Hocini H, Thiébaut R. A novel approach for biomarker selection and the integration of repeated measures experiments from two assays. BMC Bioinformatics 2012; 13:325. [PMID: 23216942 PMCID: PMC3627901 DOI: 10.1186/1471-2105-13-325] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 11/26/2012] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND High throughput 'omics' experiments are usually designed to compare changes observed between different conditions (or interventions) and to identify biomarkers capable of characterizing each condition. We consider the complex structure of repeated measurements from different assays where different conditions are applied on the same subjects. RESULTS We propose a two-step analysis combining a multilevel approach and a multivariate approach to reveal separately the effects of conditions within subjects from the biological variation between subjects. The approach is extended to two-factor designs and to the integration of two matched data sets. It allows internal variable selection to highlight genes able to discriminate the net condition effect within subjects. A simulation study was performed to demonstrate the good performance of the multilevel multivariate approach compared to a classical multivariate method. The multilevel multivariate approach outperformed the classical multivariate approach with respect to the classification error rate and the selection of relevant genes. The approach was applied to an HIV-vaccine trial evaluating the response with gene expression and cytokine secretion. The discriminant multilevel analysis selected a relevant subset of genes while the integrative multilevel analysis highlighted clusters of genes and cytokines that were highly correlated across the samples. CONCLUSIONS Our combined multilevel multivariate approach may help in finding signatures of vaccine effect and allows for a better understanding of immunological mechanisms activated by the intervention. The integrative analysis revealed clusters of genes, that were associated with cytokine secretion. These clusters can be seen as gene signatures to predict future cytokine response. The approach is implemented in the R package mixOmics (http://cran.r-project.org/) with associated tutorials to perform the analysis(a).
Collapse
Affiliation(s)
- Benoit Liquet
- Univ. Bordeaux, ISPED, centre INSERM U-897-Epidémiologie-Biostatistique, Bordeaux, F-33000, FRANCE
- INSERM, ISPED, centre INSERM U-897-Epidémiologie-Biostatistique, Bordeaux, F-33000, FRANCE
- Vaccine Research Institute ANRS, Paris, France
| | - Kim-Anh Lê Cao
- Queensland Facility for Advanced Bioinformatics and the institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Hakim Hocini
- INSERM U955 Eq 16, UPEC Université, Créteil, FRANCE
- Vaccine Research Institute ANRS, Paris, France
| | - Rodolphe Thiébaut
- Univ. Bordeaux, ISPED, centre INSERM U-897-Epidémiologie-Biostatistique, Bordeaux, F-33000, FRANCE
- INSERM, ISPED, centre INSERM U-897-Epidémiologie-Biostatistique, Bordeaux, F-33000, FRANCE
- Vaccine Research Institute ANRS, Paris, France
| |
Collapse
|
142
|
González I, Cao KAL, Davis MJ, Déjean S. Visualising associations between paired 'omics' data sets. BioData Min 2012; 5:19. [PMID: 23148523 PMCID: PMC3630015 DOI: 10.1186/1756-0381-5-19] [Citation(s) in RCA: 195] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 10/15/2012] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Each omics platform is now able to generate a large amount of data. Genomics, proteomics, metabolomics, interactomics are compiled at an ever increasing pace and now form a core part of the fundamental systems biology framework. Recently, several integrative approaches have been proposed to extract meaningful information. However, these approaches lack of visualisation outputs to fully unravel the complex associations between different biological entities. RESULTS The multivariate statistical approaches 'regularized Canonical Correlation Analysis' and 'sparse Partial Least Squares regression' were recently developed to integrate two types of highly dimensional 'omics' data and to select relevant information. Using the results of these methods, we propose to revisit few graphical outputs to better understand the relationships between two 'omics' data and to better visualise the correlation structure between the different biological entities. These graphical outputs include Correlation Circle plots, Relevance Networks and Clustered Image Maps. We demonstrate the usefulness of such graphical outputs on several biological data sets and further assess their biological relevance using gene ontology analysis. CONCLUSIONS Such graphical outputs are undoubtedly useful to aid the interpretation of these promising integrative analysis tools and will certainly help in addressing fundamental biological questions and understanding systems as a whole. AVAILABILITY The graphical tools described in this paper are implemented in the freely available R package mixOmics and in its associated web application.
Collapse
Affiliation(s)
- Ignacio González
- , Institut de Mathématiques - Université de Toulouse III et CNRS, UMR 5219, F-31062 Toulouse, France
| | - Kim-Anh Lê Cao
- Queensland Facility for Advanced Bioinformatics and the Institute for Molecular Bioscience, The University of Queensland, 4072 St Lucia, QLD, Australia
| | - Melissa J Davis
- Queensland Facility for Advanced Bioinformatics and the Institute for Molecular Bioscience, The University of Queensland, 4072 St Lucia, QLD, Australia
| | - Sébastien Déjean
- , Institut de Mathématiques - Université de Toulouse III et CNRS, UMR 5219, F-31062 Toulouse, France
| |
Collapse
|
143
|
Carreno-Quintero N, Bouwmeester HJ, Keurentjes JJB. Genetic analysis of metabolome-phenotype interactions: from model to crop species. Trends Genet 2012; 29:41-50. [PMID: 23084137 DOI: 10.1016/j.tig.2012.09.006] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Revised: 09/18/2012] [Accepted: 09/20/2012] [Indexed: 10/27/2022]
Abstract
The past decade has seen increased interest from the scientific community, and particularly plant biologists, in integrating metabolic approaches into research aimed at unraveling phenotypic diversity and its underlying genetic variation. Advances in plant metabolomics have enabled large-scale analyses that have identified qualitative and quantitative variation in the metabolic content of various species, and this variation has been linked to genetic factors through genetic-mapping approaches, providing a glimpse of the genetic architecture of the plant metabolome. Parallel analyses of morphological phenotypes and physiological performance characteristics have further enhanced our understanding of the complex molecular mechanisms regulating these quantitative traits. This review aims to illustrate the advantages of including assessments of phenotypic and metabolic diversity in investigations of the genetic basis of complex traits, and the value of this approach in studying agriculturally important crops. We highlight the ground-breaking work on model species and discuss recent achievements in important crop species.
Collapse
|
144
|
Tong P, Coombes KR. integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory. ACTA ACUST UNITED AC 2012; 28:2861-9. [PMID: 23014630 DOI: 10.1093/bioinformatics/bts561] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
MOTIVATION Identifying genes altered in cancer plays a crucial role in both understanding the mechanism of carcinogenesis and developing novel therapeutics. It is known that there are various mechanisms of regulation that can lead to gene dysfunction, including copy number change, methylation, abnormal expression, mutation and so on. Nowadays, all these types of alterations can be simultaneously interrogated by different types of assays. Although many methods have been proposed to identify altered genes from a single assay, there is no method that can deal with multiple assays accounting for different alteration types systematically. RESULTS In this article, we propose a novel method, integration using item response theory (integIRTy), to identify altered genes by using item response theory that allows integrated analysis of multiple high-throughput assays. When applied to a single assay, the proposed method is more robust and reliable than conventional methods such as Student's t-test or the Wilcoxon rank-sum test. When used to integrate multiple assays, integIRTy can identify novel-altered genes that cannot be found by looking at individual assay separately. We applied integIRTy to three public cancer datasets (ovarian carcinoma, breast cancer, glioblastoma) for cross-assay type integration which all show encouraging results. AVAILABILITY AND IMPLEMENTATION The R package integIRTy is available at the web site http://bioinformatics.mdanderson.org/main/OOMPA:Overview. CONTACT kcoombes@mdanderson.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pan Tong
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA
| | | |
Collapse
|
145
|
Cao H, Lei S, Deng HW, Wang YP. Identification of genes for complex diseases using integrated analysis of multiple types of genomic data. PLoS One 2012; 7:e42755. [PMID: 22957024 PMCID: PMC3434191 DOI: 10.1371/journal.pone.0042755] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 07/10/2012] [Indexed: 12/17/2022] Open
Abstract
Various types of genomic data (e.g., SNPs and mRNA transcripts) have been employed to identify risk genes for complex diseases. However, the analysis of these data has largely been performed in isolation. Combining these multiple data for integrative analysis can take advantage of complementary information and thus can have higher power to identify genes (and/or their functions) that would otherwise be impossible with individual data analysis. Due to the different nature, structure, and format of diverse sets of genomic data, multiple genomic data integration is challenging. Here we address the problem by developing a sparse representation based clustering (SRC) method for integrative data analysis. As an example, we applied the SRC method to the integrative analysis of 376821 SNPs in 200 subjects (100 cases and 100 controls) and expression data for 22283 genes in 80 subjects (40 cases and 40 controls) to identify significant genes for osteoporosis (OP). Comparing our results with previous studies, we identified some genes known related to OP risk (e.g., 'THSD4', 'CRHR1', 'HSD11B1', 'THSD7A', 'BMPR1B' 'ADCY10', 'PRL', 'CA8','ESRRA', 'CALM1', 'CALM1', 'SPARC', and 'LRP1'). Moreover, we uncovered novel osteoporosis susceptible genes ('DICER1', 'PTMA', etc.) that were not found previously but play functionally important roles in osteoporosis etiology from existing studies. In addition, the SRC method identified genes can lead to higher accuracy for the diagnosis/classification of osteoporosis subjects when compared with the traditional T-test and Fisher-exact test, which further validates the proposed SRC approach for integrative analysis.
Collapse
Affiliation(s)
- Hongbao Cao
- Department of Biomedical Engineering, Tulane University, New Orleans, Louisiana, United States of America
| | - Shufeng Lei
- Center for Genetic Epidemiology and Genomics, School of Public Health, Medical College of Soochow University, Suzhou, P. R. China
| | - Hong-Wen Deng
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, Louisiana, United States of America
| | - Yu-Ping Wang
- Department of Biomedical Engineering, Tulane University, New Orleans, Louisiana, United States of America
- Department of Biostatistics and Bioinformatics, Tulane University, New Orleans, Louisiana, United States of America
- * E-mail:
| |
Collapse
|
146
|
McWilliams B, Montana G. Multi-view predictive partitioning in high dimensions. Stat Anal Data Min 2012. [DOI: 10.1002/sam.11144] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
147
|
Integrative subtype discovery in glioblastoma using iCluster. PLoS One 2012; 7:e35236. [PMID: 22539962 PMCID: PMC3335101 DOI: 10.1371/journal.pone.0035236] [Citation(s) in RCA: 147] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2011] [Accepted: 03/13/2012] [Indexed: 12/31/2022] Open
Abstract
Large-scale cancer genome projects, such as the Cancer Genome Atlas (TCGA) project, are comprehensive molecular characterization efforts to accelerate our understanding of cancer biology and the discovery of new therapeutic targets. The accumulating wealth of multidimensional data provides a new paradigm for important research problems including cancer subtype discovery. The current standard approach relies on separate clustering analyses followed by manual integration. Results can be highly data type dependent, restricting the ability to discover new insights from multidimensional data. In this study, we present an integrative subtype analysis of the TCGA glioblastoma (GBM) data set. Our analysis revealed new insights through integrated subtype characterization. We found three distinct integrated tumor subtypes. Subtype 1 lacks the classical GBM events of chr 7 gain and chr 10 loss. This subclass is enriched for the G-CIMP phenotype and shows hypermethylation of genes involved in brain development and neuronal differentiation. The tumors in this subclass display a Proneural expression profile. Subtype 2 is characterized by a near complete association with EGFR amplification, overrepresentation of promoter methylation of homeobox and G-protein signaling genes, and a Classical expression profile. Subtype 3 is characterized by NF1 and PTEN alterations and exhibits a Mesenchymal-like expression profile. The data analysis workflow we propose provides a unified and computationally scalable framework to harness the full potential of large-scale integrated cancer genomic data for integrative subtype discovery.
Collapse
|
148
|
Abstract
Canonical correlation analysis (CCA) is a widely used multivariate method for assessing the association between two sets of variables. However, when the number of variables far exceeds the number of subjects, such in the case of large-scale genomic studies, the traditional CCA method is not appropriate. In addition, when the variables are highly correlated the sample covariance matrices become unstable or undefined. To overcome these two issues, sparse canonical correlation analysis (SCCA) for multiple data sets has been proposed using a Lasso type of penalty. However, these methods do not have direct control over sparsity of solution. An additional step that uses Bayesian Information Criterion (BIC) has also been suggested to further filter out unimportant features. In this paper, a comparison of four penalty functions (Lasso, Elastic-net, SCAD and Hard-threshold) for SCCA with and without the BIC filtering step have been carried out using both real and simulated genotypic and mRNA expression data. This study indicates that the SCAD penalty with BIC filter would be a preferable penalty function for application of SCCA to genomic data.
Collapse
|
149
|
Cao H, Lei S, Deng HW, Wang YP. Identification of genes for complex diseases by integrating multiple types of genomic data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012; 2012:5541-5544. [PMID: 23367184 PMCID: PMC4164202 DOI: 10.1109/embc.2012.6347249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Combining multi-types of genomic data for integrative analyses can take advantage of complementary information and thus can have higher power to identify genes/variables that would otherwise be impossible with individual data analysis. Here we proposed a sparse representation based clustering (SRC) method for integrative data analyses, and applied the SRC method to the integrative analysis of 376821 SNPs in 200 subjects (100 cases and 100 controls) and expression data for 22283 genes in 80 subjects (40 cases and 40 controls) to identify significant genes for osteoporosis (OP). Comparing our results with previous studies, we identified some genes known related to OP risk, as well as some uncovered novel osteoporosis susceptible genes ('DICER1', 'PTMA', etc.) that may function importantly in osteoporosis etiology. In addition, the SRC method identified genes can lead to higher accuracy for the identification of osteoporosis subjects when compared with the traditional T-test and Fisher-exact test, which further validate the proposed SRC approach for integrative analysis.
Collapse
Affiliation(s)
- Hongbao Cao
- Department of Biomedical Engineering, Tulane University, New Orleans, USA
| | - Shufeng Lei
- Laboratory of Molecular and Statistical Genetics, Hunan Normal University, Changsha, P. R. China
| | - Hong-Wen Deng
- Department of Biostatistics, Tulane University, New Orleans, USA
| | - Yu-Ping Wang
- Department of Electrical Engineering and Department of Biostatistics, Tulane University, New Orleans, USA (Tel: 504-988-1341)
| |
Collapse
|
150
|
Van Deun K, Wilderjans TF, van den Berg RA, Antoniadis A, Van Mechelen I. A flexible framework for sparse simultaneous component based data integration. BMC Bioinformatics 2011; 12:448. [PMID: 22085701 PMCID: PMC3283562 DOI: 10.1186/1471-2105-12-448] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 11/15/2011] [Indexed: 12/05/2022] Open
Abstract
1 Background High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account. 2 Results We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of Escherichia coli samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks. 3 Conclusion Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform (group lasso approach) as well as structures that involve all data platforms (Elitist lasso approach). 4 Availability The additional file contains a MATLAB implementation of the sparse simultaneous component method.
Collapse
Affiliation(s)
- Katrijn Van Deun
- Center for Computational Systems Biology SymBioSys, Katholieke Universiteit Leuven, 3000 Leuven, Belgium.
| | | | | | | | | |
Collapse
|