Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lê Cao KA, Martin PG, Robert-Granié C, Besse P. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 2009;10:34. [PMID: 19171069 DOI: 10.1186/1471-2105-10-34] [Citation(s) in RCA: 162] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Accepted: 01/26/2009] [Indexed: 11/10/2022] Open

For:	Lê Cao KA, Martin PG, Robert-Granié C, Besse P. Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 2009;10:34. [PMID: 19171069 DOI: 10.1186/1471-2105-10-34] [Citation(s) in RCA: 162] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Accepted: 01/26/2009] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

101

Jang H, Kwon H, Yang JJ, Hong J, Kim Y, Kim KW, Lee JS, Jang YK, Kim ST, Lee KH, Lee JH, Na DL, Seo SW, Kim HJ, Lee JM. Correlations between Gray Matter and White Matter Degeneration in Pure Alzheimer's Disease, Pure Subcortical Vascular Dementia, and Mixed Dementia. Sci Rep 2017;7:9541. [PMID: 28842654 PMCID: PMC5573310 DOI: 10.1038/s41598-017-10074-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 08/04/2017] [Indexed: 11/09/2022] Open

Affiliation(s)

Hyemin Jang Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea Neuroscience Center, Samsung Medical Center, Seoul, Korea
Hunki Kwon Department of Biomedical Engineering, Hanyang University, Seoul, Korea
Jin-Ju Yang Department of Biomedical Engineering, Hanyang University, Seoul, Korea
Jinwoo Hong Department of Biomedical Engineering, Hanyang University, Seoul, Korea
Yeshin Kim Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea Neuroscience Center, Samsung Medical Center, Seoul, Korea
Ko Woon Kim Department of Neurology, Chonbuk National University Hospital, Chonbuk National University Medical school, JeonJu, Korea
Jin San Lee Department of Neurology, Kyung Hee University Hospital, Seoul, Korea
Young Kyoung Jang Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea Neuroscience Center, Samsung Medical Center, Seoul, Korea
Sung Tae Kim Radiology Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
Kyung Han Lee Nuclear Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
Jae Hong Lee Department of Neurology, Asan Medical Center, Ulsan University School of Medicine, Seoul, Korea
Duk L Na Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea Neuroscience Center, Samsung Medical Center, Seoul, Korea Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea Stem Cell & Regenerative Medicine Institute, Samsung Medical Center, Seoul, Korea
Sang Won Seo Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea Neuroscience Center, Samsung Medical Center, Seoul, Korea Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea Department of Clinical Research Design & Evaluation, SAIHST, Sungkyunkwan University, Seoul, Korea
Hee Jin Kim Department of Neurology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea. Neuroscience Center, Samsung Medical Center, Seoul, Korea.
Jong-Min Lee Department of Biomedical Engineering, Hanyang University, Seoul, Korea.

Collapse

102

Trainor PJ, DeFilippis AP, Rai SN. Evaluation of Classifier Performance for Multiclass Phenotype Discrimination in Untargeted Metabolomics. Metabolites 2017. [PMID: 28635678 PMCID: PMC5488001 DOI: 10.3390/metabo7020030] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Abstract

Statistical classification is a critical component of utilizing metabolomics data for examining the molecular determinants of phenotypes. Despite this, a comprehensive and rigorous evaluation of the accuracy of classification techniques for phenotype discrimination given metabolomics data has not been conducted. We conducted such an evaluation using both simulated and real metabolomics datasets, comparing Partial Least Squares-Discriminant Analysis (PLS-DA), Sparse PLS-DA, Random Forests, Support Vector Machines (SVM), Artificial Neural Network, k-Nearest Neighbors (k-NN), and Naïve Bayes classification techniques for discrimination. We evaluated the techniques on simulated data generated to mimic global untargeted metabolomics data by incorporating realistic block-wise correlation and partial correlation structures for mimicking the correlations and metabolite clustering generated by biological processes. Over the simulation studies, covariance structures, means, and effect sizes were stochastically varied to provide consistent estimates of classifier performance over a wide range of possible scenarios. The effects of the presence of non-normal error distributions, the introduction of biological and technical outliers, unbalanced phenotype allocation, missing values due to abundances below a limit of detection, and the effect of prior-significance filtering (dimension reduction) were evaluated via simulation. In each simulation, classifier parameters, such as the number of hidden nodes in a Neural Network, were optimized by cross-validation to minimize the probability of detecting spurious results due to poorly tuned classifiers. Classifier performance was then evaluated using real metabolomics datasets of varying sample medium, sample size, and experimental design. We report that in the most realistic simulation studies that incorporated non-normal error distributions, unbalanced phenotype allocation, outliers, missing values, and dimension reduction, classifier performance (least to greatest error) was ranked as follows: SVM, Random Forest, Naïve Bayes, sPLS-DA, Neural Networks, PLS-DA and k-NN classifiers. When non-normal error distributions were introduced, the performance of PLS-DA and k-NN classifiers deteriorated further relative to the remaining techniques. Over the real datasets, a trend of better performance of SVM and Random Forest classifier performance was observed.

Collapse

103

Huang S, Chaudhary K, Garmire LX. More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front Genet 2017;8:84. [PMID: 28670325 PMCID: PMC5472696 DOI: 10.3389/fgene.2017.00084] [Citation(s) in RCA: 389] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Accepted: 06/01/2017] [Indexed: 01/20/2023] Open

104

Furanoterpene Diversity and Variability in the Marine Sponge Spongia officinalis, from Untargeted LC-MS/MS Metabolomic Profiling to Furanolactam Derivatives. Metabolites 2017;7:metabo7020027. [PMID: 28608848 PMCID: PMC5487998 DOI: 10.3390/metabo7020027] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Revised: 05/23/2017] [Accepted: 06/06/2017] [Indexed: 01/07/2023] Open

105

Tosun D, Landau S, Aisen PS, Petersen RC, Mintun M, Jagust W, Weiner MW. Association between tau deposition and antecedent amyloid-β accumulation rates in normal and early symptomatic individuals. Brain 2017;140:1499-1512. [DOI: 10.1093/brain/awx046] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 01/17/2017] [Indexed: 02/06/2023] Open

106

Torbati ME, Mitreva M, Gopalakrishnan V. Application of Taxonomic Modeling to Microbiota Data Mining for Detection of Helminth Infection in Global Populations. DATA 2016;1:19. [PMID: 28239609 PMCID: PMC5325162 DOI: 10.3390/data1030019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Abstract

Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery.

Collapse

107

Gatesoupe FJ, Huelvan C, Le Bayon N, Le Delliou H, Madec L, Mouchel O, Quazuguel P, Mazurais D, Zambonino-Infante JL. The highly variable microbiota associated to intestinal mucosa correlates with growth and hypoxia resistance of sea bass, Dicentrarchus labrax, submitted to different nutritional histories. BMC Microbiol 2016;16:266. [PMID: 27821062 PMCID: PMC5100225 DOI: 10.1186/s12866-016-0885-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 10/30/2016] [Indexed: 01/12/2023] Open

Abstract

Background

The better understanding of how intestinal microbiota interacts with fish health is one of the key to sustainable aquaculture development. The present experiment aimed at correlating active microbiota associated to intestinal mucosa with Specific Growth Rate (SGR) and Hypoxia Resistance Time (HRT) in European sea bass individuals submitted to different nutritional histories: the fish were fed either standard or unbalanced diets at first feeding, and then mixed before repeating the dietary challenge in a common garden approach at the juvenile stage.

Results

A diet deficient in essential fatty acids (LH) lowered both SGR and HRT in sea bass, especially when the deficiency was already applied at first feeding. A protein-deficient diet with high starch supply (HG) reduced SGR to a lesser extent than LH, but it did not affect HRT. In overall average, 94 % of pyrosequencing reads corresponded to Proteobacteria, and the differences in Operational Taxonomy Units (OTUs) composition were mildly significant between experimental groups, mainly due to high individual variability. The highest and the lowest Bray-Curtis indices of intra-group similarity were observed in the two groups fed standard starter diet, and then mixed before the final dietary challenge with fish already exposed to the nutritional deficiency at first feeding (0.60 and 0.42 with diets HG and LH, respectively). Most noticeably, the median percentage of Escherichia-Shigella OTU_1 was less in the group LH with standard starter diet. Disregarding the nutritional history of each individual, strong correlation appeared between (1) OTU richness and SGR, and (2) dominance index and HRT. The two physiological traits correlated also with the relative abundance of distinct OTUs (positive correlations: Pseudomonas sp. OTU_3 and Herbaspirillum sp. OTU_10 with SGR, Paracoccus sp. OTU_4 and Vibrio sp. OTU_7 with HRT; negative correlation: Rhizobium sp. OTU_9 with HRT).

Conclusions

In sea bass, gut microbiota characteristics and physiological traits of individuals are linked together, interfering with nutritional history, and resulting in high variability among individual microbiota. Many samples and tank replicates seem necessary to further investigate the effect of experimental treatments on gut microbiota composition, and to test the hypothesis whether microbiotypes may be delineated in fish.

Electronic supplementary material

The online version of this article (doi:10.1186/s12866-016-0885-2) contains supplementary material, which is available to authorized users.

Collapse

108

Steegenga WT, Mischke M, Lute C, Boekschoten MV, Lendvai A, Pruis MGM, Verkade HJ, van de Heijning BJM, Boekhorst J, Timmerman HM, Plösch T, Müller M, Hooiveld GJEJ. Maternal exposure to a Western-style diet causes differences in intestinal microbiota composition and gene expression of suckling mouse pups. Mol Nutr Food Res 2016;61. [PMID: 27129739 PMCID: PMC5215441 DOI: 10.1002/mnfr.201600141] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Revised: 03/25/2016] [Accepted: 04/13/2016] [Indexed: 12/14/2022]

109

Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 2016;17:628-41. [PMID: 26969681 PMCID: PMC4945831 DOI: 10.1093/bib/bbv108] [Citation(s) in RCA: 196] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 10/26/2015] [Indexed: 01/16/2023] Open

110

Monteiro JM, Rao A, Shawe-Taylor J, Mourão-Miranda J. A multiple hold-out framework for Sparse Partial Least Squares. J Neurosci Methods 2016;271:182-94. [PMID: 27353722 PMCID: PMC5012894 DOI: 10.1016/j.jneumeth.2016.06.011] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 06/10/2016] [Accepted: 06/15/2016] [Indexed: 12/01/2022]

Abstract

•

SPLS framework which tests model reliability by fitting it to several data splits.

•

Framework was applied to brain anatomy and individual items of the MMSE score.

•

The adequate number of voxels and clinical items was selected automatically.

•

SPLS found two associative effects between sparse brain voxels and MMSE items.

•

Projection deflation provided better results than a classical PLS deflation.

Background

Supervised classification machine learning algorithms may have limitations when studying brain diseases with heterogeneous populations, as the labels might be unreliable. More exploratory approaches, such as Sparse Partial Least Squares (SPLS), may provide insights into the brain's mechanisms by finding relationships between neuroimaging and clinical/demographic data. The identification of these relationships has the potential to improve the current understanding of disease mechanisms, refine clinical assessment tools, and stratify patients. SPLS finds multivariate associative effects in the data by computing pairs of sparse weight vectors, where each pair is used to remove its corresponding associative effect from the data by matrix deflation, before computing additional pairs.

New method

We propose a novel SPLS framework which selects the adequate number of voxels and clinical variables to describe each associative effect, and tests their reliability by fitting the model to different splits of the data. As a proof of concept, the approach was applied to find associations between grey matter probability maps and individual items of the Mini-Mental State Examination (MMSE) in a clinical sample with various degrees of dementia.

Results

The framework found two statistically significant associative effects between subsets of brain voxels and subsets of the questions/tasks.

Comparison with existing methods

SPLS was compared with its non-sparse version (PLS). The use of projection deflation versus a classical PLS deflation was also tested in both PLS and SPLS.

Conclusions

SPLS outperformed PLS, finding statistically significant effects and providing higher correlation values in hold-out data. Moreover, projection deflation provided better results.

Collapse

111

Chaturvedi N, de Menezes RX, Goeman JJ. A global × global test for testing associations between two large sets of variables. Biom J 2016;59:145-158. [PMID: 27225065 DOI: 10.1002/bimj.201500106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Revised: 01/06/2016] [Accepted: 03/07/2016] [Indexed: 12/30/2022]

112

Bouyioukos C, Bucchini F, Elati M, Képès F. GREAT: a web portal for Genome Regulatory Architecture Tools. Nucleic Acids Res 2016;44:W77-82. [PMID: 27151196 PMCID: PMC4987929 DOI: 10.1093/nar/gkw384] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 04/26/2016] [Indexed: 11/15/2022] Open

113

Identification of Commensal Species Positively Correlated with Early Stress Responses to a Compromised Mucus Barrier. Inflamm Bowel Dis 2016;22:826-40. [PMID: 26926038 DOI: 10.1097/mib.0000000000000688] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

114

Rohart F, Mason EA, Matigian N, Mosbergen R, Korn O, Chen T, Butcher S, Patel J, Atkinson K, Khosrotehrani K, Fisk NM, Lê Cao KA, Wells CA. A molecular classification of human mesenchymal stromal cells. PeerJ 2016;4:e1845. [PMID: 27042394 PMCID: PMC4811172 DOI: 10.7717/peerj.1845] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 03/03/2016] [Indexed: 12/13/2022] Open

Abstract

Mesenchymal stromal cells (MSC) are widely used for the study of mesenchymal tissue repair, and increasingly adopted for cell therapy, despite the lack of consensus on the identity of these cells. In part this is due to the lack of specificity of MSC markers. Distinguishing MSC from other stromal cells such as fibroblasts is particularly difficult using standard analysis of surface proteins, and there is an urgent need for improved classification approaches. Transcriptome profiling is commonly used to describe and compare different cell types; however, efforts to identify specific markers of rare cellular subsets may be confounded by the small sample sizes of most studies. Consequently, it is difficult to derive reproducible, and therefore useful markers. We addressed the question of MSC classification with a large integrative analysis of many public MSC datasets. We derived a sparse classifier (The Rohart MSC test) that accurately distinguished MSC from non-MSC samples with >97% accuracy on an internal training set of 635 samples from 41 studies derived on 10 different microarray platforms. The classifier was validated on an external test set of 1,291 samples from 65 studies derived on 15 different platforms, with >95% accuracy. The genes that contribute to the MSC classifier formed a protein-interaction network that included known MSC markers. Further evidence of the relevance of this new MSC panel came from the high number of Mendelian disorders associated with mutations in more than 65% of the network. These result in mesenchymal defects, particularly impacting on skeletal growth and function. The Rohart MSC test is a simple in silico test that accurately discriminates MSC from fibroblasts, other adult stem/progenitor cell types or differentiated stromal cells. It has been implemented in the www.stemformatics.org resource, to assist researchers wishing to benchmark their own MSC datasets or data from the public domain. The code is available from the CRAN repository and all data used to generate the MSC test is available to download via the Gene Expression Omnibus or the Stemformatics resource.

Collapse

Affiliation(s)

Florian Rohart Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia The University of Queensland Diamantina Institute, Translational Research Institute, University of Queensland, Brisbane, Queensland, Australia
Elizabeth A. Mason Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
Nicholas Matigian Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia The University of Queensland Diamantina Institute, Translational Research Institute, University of Queensland, Brisbane, Queensland, Australia
Rowland Mosbergen Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia Department of Anatomy and Neuroscience, Faculty of Medicine, University of Melbourne, Melbourne, Victoria, Australia
Othmar Korn Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
Tyrone Chen Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia Department of Anatomy and Neuroscience, Faculty of Medicine, University of Melbourne, Melbourne, Victoria, Australia
Suzanne Butcher Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia Department of Anatomy and Neuroscience, Faculty of Medicine, University of Melbourne, Melbourne, Victoria, Australia
Jatin Patel The University of Queensland Centre for Clinical Research, University of Queensland, Brisbane, Queensland, Australia
Kerry Atkinson The University of Queensland Centre for Clinical Research, University of Queensland, Brisbane, Queensland, Australia
Kiarash Khosrotehrani The University of Queensland Centre for Clinical Research, University of Queensland, Brisbane, Queensland, Australia Centre for Advanced Prenatal Care, Royal Brisbane & Women’s Hospital, Brisbane, Queensland, Australia
Nicholas M. Fisk The University of Queensland Centre for Clinical Research, University of Queensland, Brisbane, Queensland, Australia Centre for Advanced Prenatal Care, Royal Brisbane & Women’s Hospital, Brisbane, Queensland, Australia
Kim-Anh Lê Cao The University of Queensland Diamantina Institute, Translational Research Institute, University of Queensland, Brisbane, Queensland, Australia
Christine A. Wells Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia Department of Anatomy and Neuroscience, Faculty of Medicine, University of Melbourne, Melbourne, Victoria, Australia

Collapse

115

Liu B, Shen X, Pan W. Integrative and regularized principal component analysis of multiple sources of data. Stat Med 2016;35:2235-50. [PMID: 26756854 DOI: 10.1002/sim.6866] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2015] [Revised: 09/28/2015] [Accepted: 12/14/2015] [Indexed: 12/14/2022]

116

He H, Lin D, Zhang J, Wang Y, Deng HW. Biostatistics, Data Mining and Computational Modeling. TRANSLATIONAL BIOINFORMATICS 2016. [DOI: 10.1007/978-94-017-7543-4_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

117

Saqi M, Pellet J, Roznovat I, Mazein A, Ballereau S, De Meulder B, Auffray C. Systems Medicine: The Future of Medical Genomics, Healthcare, and Wellness. Methods Mol Biol 2016;1386:43-60. [PMID: 26677178 DOI: 10.1007/978-1-4939-3283-2_3] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

118

Reis MM, Reis MG, Mills J, Ross C, Brightwell G. Characterization of volatile metabolites associated with confinement odour during the shelf-life of vacuum packed lamb meat under different storage conditions. Meat Sci 2015;113:80-91. [PMID: 26624794 DOI: 10.1016/j.meatsci.2015.11.017] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 11/15/2015] [Accepted: 11/18/2015] [Indexed: 11/18/2022]

119

Small RNA Transcriptome of the Oral Microbiome during Periodontitis Progression. Appl Environ Microbiol 2015;81:6688-99. [PMID: 26187962 DOI: 10.1128/aem.01782-15] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 07/12/2015] [Indexed: 02/06/2023] Open

120

Waller T, Gubała T, Sarapata K, Piwowar M, Jurkowski W. DNA microarray integromics analysis platform. BioData Min 2015;8:18. [PMID: 26110022 PMCID: PMC4479227 DOI: 10.1186/s13040-015-0052-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Accepted: 06/19/2015] [Indexed: 01/12/2023] Open

121

Development of a Drug-Response Modeling Framework to Identify Cell Line Derived Translational Biomarkers That Can Predict Treatment Outcome to Erlotinib or Sorafenib. PLoS One 2015;10:e0130700. [PMID: 26107615 PMCID: PMC4480971 DOI: 10.1371/journal.pone.0130700] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Accepted: 05/23/2015] [Indexed: 01/21/2023] Open

122

Lange K, Hugenholtz F, Jonathan MC, Schols HA, Kleerebezem M, Smidt H, Müller M, Hooiveld GJEJ. Comparison of the effects of five dietary fibers on mucosal transcriptional profiles, and luminal microbiota composition and SCFA concentrations in murine colon. Mol Nutr Food Res 2015;59:1590-602. [PMID: 25914036 DOI: 10.1002/mnfr.201400597] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Revised: 03/16/2015] [Accepted: 03/18/2015] [Indexed: 12/14/2022]

123

Piwowar M, Jurkowski W. ONION: Functional Approach for Integration of Lipidomics and Transcriptomics Data. PLoS One 2015;10:e0128854. [PMID: 26053255 PMCID: PMC4459700 DOI: 10.1371/journal.pone.0128854] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 05/03/2015] [Indexed: 12/19/2022] Open

Abstract

To date, the massive quantity of data generated by high-throughput techniques has not yet met bioinformatics treatment required to make full use of it. This is partially due to a mismatch in experimental and analytical study design but primarily due to a lack of adequate analytical approaches. When integrating multiple data types e.g. transcriptomics and metabolomics, multidimensional statistical methods are currently the techniques of choice. Typical statistical approaches, such as canonical correlation analysis (CCA), that are applied to find associations between metabolites and genes are failing due to small numbers of observations (e.g. conditions, diet etc.) in comparison to data size (number of genes, metabolites). Modifications designed to cope with this issue are not ideal due to the need to add simulated data resulting in a lack of p-value computation or by pruning of variables hence losing potentially valid information. Instead, our approach makes use of verified or putative molecular interactions or functional association to guide analysis. The workflow includes dividing of data sets to reach the expected data structure, statistical analysis within groups and interpretation of results. By applying pathway and network analysis, data obtained by various platforms are grouped with moderate stringency to avoid functional bias. As a consequence CCA and other multivariate models can be applied to calculate robust statistics and provide easy to interpret associations between metabolites and genes to leverage understanding of metabolic response. Effective integration of lipidomics and transcriptomics is demonstrated on publically available murine nutrigenomics data sets. We are able to demonstrate that our approach improves detection of genes related to lipid metabolism, in comparison to applying statistics alone. This is measured by increased percentage of explained variance (95% vs. 75–80%) and by identifying new metabolite-gene associations related to lipid metabolism.

Collapse

124

Mach N, Berri M, Estellé J, Levenez F, Lemonnier G, Denis C, Leplat JJ, Chevaleyre C, Billon Y, Doré J, Rogel-Gaillard C, Lepage P. Early-life establishment of the swine gut microbiome and impact on host phenotypes. ENVIRONMENTAL MICROBIOLOGY REPORTS 2015;7:554-69. [PMID: 25727666 DOI: 10.1111/1758-2229.12285] [Citation(s) in RCA: 264] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 02/22/2015] [Indexed: 05/03/2023]

125

Gupta A, Mayer EA, Sanmiguel CP, Van Horn JD, Woodworth D, Ellingson BM, Fling C, Love A, Tillisch K, Labus JS. Patterns of brain structural connectivity differentiate normal weight from overweight subjects. NEUROIMAGE-CLINICAL 2015;7:506-17. [PMID: 25737959 PMCID: PMC4338207 DOI: 10.1016/j.nicl.2015.01.005] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Abstract

Background

Alterations in the hedonic component of ingestive behaviors have been implicated as a possible risk factor in the pathophysiology of overweight and obese individuals. Neuroimaging evidence from individuals with increasing body mass index suggests structural, functional, and neurochemical alterations in the extended reward network and associated networks.

Aim

To apply a multivariate pattern analysis to distinguish normal weight and overweight subjects based on gray and white-matter measurements.

Methods

Structural images (N = 120, overweight N = 63) and diffusion tensor images (DTI) (N = 60, overweight N = 30) were obtained from healthy control subjects. For the total sample the mean age for the overweight group (females = 32, males = 31) was 28.77 years (SD = 9.76) and for the normal weight group (females = 32, males = 25) was 27.13 years (SD = 9.62). Regional segmentation and parcellation of the brain images was performed using Freesurfer. Deterministic tractography was performed to measure the normalized fiber density between regions. A multivariate pattern analysis approach was used to examine whether brain measures can distinguish overweight from normal weight individuals.

Results

1. White-matter classification: The classification algorithm, based on 2 signatures with 17 regional connections, achieved 97% accuracy in discriminating overweight individuals from normal weight individuals. For both brain signatures, greater connectivity as indexed by increased fiber density was observed in overweight compared to normal weight between the reward network regions and regions of the executive control, emotional arousal, and somatosensory networks. In contrast, the opposite pattern (decreased fiber density) was found between ventromedial prefrontal cortex and the anterior insula, and between thalamus and executive control network regions. 2. Gray-matter classification: The classification algorithm, based on 2 signatures with 42 morphological features, achieved 69% accuracy in discriminating overweight from normal weight. In both brain signatures regions of the reward, salience, executive control and emotional arousal networks were associated with lower morphological values in overweight individuals compared to normal weight individuals, while the opposite pattern was seen for regions of the somatosensory network.

Conclusions

1. An increased BMI (i.e., overweight subjects) is associated with distinct changes in gray-matter and fiber density of the brain. 2. Classification algorithms based on white-matter connectivity involving regions of the reward and associated networks can identify specific targets for mechanistic studies and future drug development aimed at abnormal ingestive behavior and in overweight/obesity.

•

Multivariate analysis can be used to classify overweight from normal weight individuals.

•

Anatomical connectivity achieved 97% accuracy in the classification algorithm.

•

Greater connectivity was observed in extended reward and somatosensory regions.

•

Morphological gray-matter achieved 69% accuracy in the classification algorithm.

•

Lower morphological values were observed in regions of the extended reward network.

Collapse

Affiliation(s)

Arpana Gupta Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA
Emeran A Mayer Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA ; Ahmanson-Lovelace Brain Mapping Center, UCLA, Los Angeles, CA, USA
Claudia P Sanmiguel Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA
John D Van Horn The Institute for Neuroimaging and Informatics, Keck School of Medicine, USC, Los Angeles, CA, USA
Davis Woodworth Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; Radiology, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
Benjamin M Ellingson Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; Radiology, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
Connor Fling Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA
Aubrey Love Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA
Kirsten Tillisch Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA ; Integrative Medicine, GLA VHA, UCLA, Los Angeles, CA, USA
Jennifer S Labus Gail and Gerald Oppenheimer Family Center for Neurobiology of Stress, Ingestive Behavior and Obesity Program (IBOP), UCLA, Los Angeles, CA, USA ; David Geffen School of Medicine, UCLA, Los Angeles, CA, USA ; Division of Digestive Diseases, UCLA, Los Angeles, CA, USA

Collapse

126

Rajasundaram D, Runavot JL, Guo X, Willats WGT, Meulewaeter F, Selbig J. Understanding the relationship between cotton fiber properties and non-cellulosic cell wall polysaccharides. PLoS One 2014;9:e112168. [PMID: 25383868 PMCID: PMC4226482 DOI: 10.1371/journal.pone.0112168] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Accepted: 10/06/2014] [Indexed: 12/03/2022] Open

Abstract

A detailed knowledge of cell wall heterogeneity and complexity is crucial for understanding plant growth and development. One key challenge is to establish links between polysaccharide-rich cell walls and their phenotypic characteristics. It is of particular interest for some plant material, like cotton fibers, which are of both biological and industrial importance. To this end, we attempted to study cotton fiber characteristics together with glycan arrays using regression based approaches. Taking advantage of the comprehensive microarray polymer profiling technique (CoMPP), 32 cotton lines from different cotton species were studied. The glycan array was generated by sequential extraction of cell wall polysaccharides from mature cotton fibers and screening samples against eleven extensively characterized cell wall probes. Also, phenotypic characteristics of cotton fibers such as length, strength, elongation and micronaire were measured. The relationship between the two datasets was established in an integrative manner using linear regression methods. In the conducted analysis, we demonstrated the usefulness of regression based approaches in establishing a relationship between glycan measurements and phenotypic traits. In addition, the analysis also identified specific polysaccharides which may play a major role during fiber development for the final fiber characteristics. Three different regression methods identified a negative correlation between micronaire and the xyloglucan and homogalacturonan probes. Moreover, homogalacturonan and callose were shown to be significant predictors for fiber length. The role of these polysaccharides was already pointed out in previous cell wall elongation studies. Additional relationships were predicted for fiber strength and elongation which will need further experimental validation.

Collapse

127

Lin D, Cao H, Calhoun VD, Wang YP. Sparse models for correlative and integrative analysis of imaging and genetic data. J Neurosci Methods 2014;237:69-78. [PMID: 25218561 DOI: 10.1016/j.jneumeth.2014.09.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Revised: 08/27/2014] [Accepted: 09/01/2014] [Indexed: 11/29/2022]

128

Zendehdel R. Oxidative Damage Modeling by Biomonitoring of Exposure to Metals for Manual Metal Arc Welders. HEALTH SCOPE 2014. [DOI: 10.17795/jhealthscope-16440] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

129

Lin D, Calhoun VD, Wang YP. Correspondence between fMRI and SNP data by group sparse canonical correlation analysis. Med Image Anal 2014;18:891-902. [PMID: 24247004 PMCID: PMC4007390 DOI: 10.1016/j.media.2013.10.010] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Revised: 08/27/2013] [Accepted: 10/16/2013] [Indexed: 10/26/2022]

130

Meng C, Kuster B, Culhane AC, Gholami AM. A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics 2014;15:162. [PMID: 24884486 PMCID: PMC4053266 DOI: 10.1186/1471-2105-15-162] [Citation(s) in RCA: 178] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2014] [Accepted: 05/14/2014] [Indexed: 02/07/2023] Open

Abstract

BACKGROUND

To leverage the potential of multi-omics studies, exploratory data analysis methods that provide systematic integration and comparison of multiple layers of omics information are required. We describe multiple co-inertia analysis (MCIA), an exploratory data analysis method that identifies co-relationships between multiple high dimensional datasets. Based on a covariance optimization criterion, MCIA simultaneously projects several datasets into the same dimensional space, transforming diverse sets of features onto the same scale, to extract the most variant from each dataset and facilitate biological interpretation and pathway analysis.

RESULTS

We demonstrate integration of multiple layers of information using MCIA, applied to two typical "omics" research scenarios. The integration of transcriptome and proteome profiles of cells in the NCI-60 cancer cell line panel revealed distinct, complementary features, which together increased the coverage and power of pathway analysis. Our analysis highlighted the importance of the leukemia extravasation signaling pathway in leukemia that was not highly ranked in the analysis of any individual dataset. Secondly, we compared transcriptome profiles of high grade serous ovarian tumors that were obtained, on two different microarray platforms and next generation RNA-sequencing, to identify the most informative platform and extract robust biomarkers of molecular subtypes. We discovered that the variance of RNA-sequencing data processed using RPKM had greater variance than that with MapSplice and RSEM. We provided novel markers highly associated to tumor molecular subtype combined from four data platforms. MCIA is implemented and available in the R/Bioconductor "omicade4" package.

CONCLUSION

We believe MCIA is an attractive method for data integration and visualization of several datasets of multi-omics features observed on the same set of individuals. The method is not dependent on feature annotation, and thus it can extract important features even when there are not present across all datasets. MCIA provides simple graphical representations for the identification of relationships between large datasets.

Collapse

131

Jiang M, Wang C, Zhang Y, Feng Y, Wang Y, Zhu Y. Sparse partial-least-squares discriminant analysis for different geographical origins of Salvia miltiorrhiza by (1) H-NMR-based metabolomics. PHYTOCHEMICAL ANALYSIS : PCA 2014;25:50-58. [PMID: 23868756 DOI: 10.1002/pca.2461] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Revised: 06/09/2013] [Accepted: 06/09/2013] [Indexed: 06/02/2023]

132

Mach N, Gao Y, Lemonnier G, Lecardonnel J, Oswald IP, Estellé J, Rogel-Gaillard C. The peripheral blood transcriptome reflects variations in immunity traits in swine: towards the identification of biomarkers. BMC Genomics 2013;14:894. [PMID: 24341289 PMCID: PMC3878494 DOI: 10.1186/1471-2164-14-894] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Accepted: 12/04/2013] [Indexed: 01/21/2023] Open

133

Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population. Neuroimage 2013;84:698-711. [PMID: 24096125 DOI: 10.1016/j.neuroimage.2013.09.048] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 09/11/2013] [Accepted: 09/20/2013] [Indexed: 12/12/2022] Open

134

Lin D, Zhang J, Li J, Calhoun VD, Deng HW, Wang YP. Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics 2013;14:245. [PMID: 23937249 PMCID: PMC3751310 DOI: 10.1186/1471-2105-14-245] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 08/08/2013] [Indexed: 11/30/2022] Open

Abstract

BACKGROUND

The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group).

RESULTS

We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features.

CONCLUSIONS

The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature selection simultaneously. It outperforms the two sCCA methods (CCA-l1 and CCA-group) by identifying the correlated features with more true positives while controlling total discordance at a lower level on the simulated data, even if the group effect does not exist or there are irrelevant features grouped with true correlated features. Compared with our proposed CCA-group sparse models, CCA-l1 tends to select less true correlated features while CCA-group inclines to select more redundant features.

Collapse

135

GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm. PLoS Genet 2013;9:e1003657. [PMID: 23950726 PMCID: PMC3738451 DOI: 10.1371/journal.pgen.1003657] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 05/30/2013] [Indexed: 01/06/2023] Open

Abstract

Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n>100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.

Nowadays, the availability of cheaper and accurate assays to quantify multiple (endo)phenotypes in large population cohorts allows multi-trait studies. However, these studies are limited by the lack of flexible models integrated with efficient computational tools for genome-wide multi SNPs-traits analyses. To overcome this problem, we propose a novel Bayesian analysis strategy and a new algorithmic implementation which exploits parallel processing architecture for fully multivariate modeling of groups of correlated phenotypes at the genome-wide scale. In addition to increased power of our algorithm over alternative Bayesian and well-established non-Bayesian multi-phenotype methods, we provide an application to a real case study of several blood lipid traits, and show how our method recovered most of the major associations and is better at refining multi-trait polygenic associations than alternative methods. We reveal and replicate in independent cohorts new associations with two phenotypic groups that were not detected by competing multivariate approaches and not noticed by a large meta-GWAS. We also discuss the applicability of the proposed method to large meta-analyses involving hundreds of thousands of individuals and to diverse genomic datasets where complex dependencies in the predictor space are present.

Collapse

136

Valledor L, Furuhashi T, Hanak AM, Weckwerth W. Systemic cold stress adaptation of Chlamydomonas reinhardtii. Mol Cell Proteomics 2013;12:2032-47. [PMID: 23564937 PMCID: PMC3734567 DOI: 10.1074/mcp.m112.026765] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 03/15/2013] [Indexed: 11/06/2022] Open

Abstract

Chlamydomonas reinhardtii is one of the most important model organisms nowadays phylogenetically situated between higher plants and animals (Merchant et al. 2007). Stress adaptation of this unicellular model algae is in the focus because of its relevance to biomass and biofuel production. Here, we have studied cold stress adaptation of C. reinhardtii hitherto not described for this algae whereas intensively studied in higher plants. Toward this goal, high throughput mass spectrometry was employed to integrate proteome, metabolome, physiological and cell-morphological changes during a time-course from 0 to 120 h. These data were complemented with RT-qPCR for target genes involved in central metabolism, signaling, and lipid biosynthesis. Using this approach dynamics in central metabolism were linked to cold-stress dependent sugar and autophagy pathways as well as novel genes in C. reinhardtii such as CKIN1, CKIN2 and a hitherto functionally not annotated protein named CKIN3. Cold stress affected extensively the physiology and the organization of the cell. Gluconeogenesis and starch biosynthesis pathways are activated leading to a pronounced starch and sugar accumulation. Quantitative lipid profiles indicate a sharp decrease in the lipophilic fraction and an increase in polyunsaturated fatty acids suggesting this as a mechanism of maintaining membrane fluidity. The proteome is completely remodeled during cold stress: specific candidates of the ribosome and the spliceosome indicate altered biosynthesis and degradation of proteins important for adaptation to low temperatures. Specific proteasome degradation may be mediated by the observed cold-specific changes in the ubiquitinylation system. Sparse partial least squares regression analysis was applied for protein correlation network analysis using proteins as predictors and Fv/Fm, FW, total lipids, and starch as responses. We applied also Granger causality analysis and revealed correlations between proteins and metabolites otherwise not detectable. Twenty percent of the proteins responsive to cold are uncharacterized proteins. This presents a considerable resource for new discoveries in cold stress biology in alga and plants.

Collapse

137

Chadeau-Hyam M, Campanella G, Jombart T, Bottolo L, Portengen L, Vineis P, Liquet B, Vermeulen RCH. Deciphering the complex: methodological overview of statistical models to derive OMICS-based biomarkers. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2013;54:542-557. [PMID: 23918146 DOI: 10.1002/em.21797] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Revised: 05/21/2013] [Accepted: 05/28/2013] [Indexed: 05/28/2023]

138

Kulpa DA, Lawani M, Cooper A, Peretz Y, Ahlers J, Sékaly RP. PD-1 coinhibitory signals: the link between pathogenesis and protection. Semin Immunol 2013;25:219-27. [PMID: 23548749 DOI: 10.1016/j.smim.2013.02.002] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 02/15/2013] [Indexed: 12/31/2022]

139

Shen R, Wang S, Mo Q. SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS. Ann Appl Stat 2013;7:269-294. [PMID: 24587839 DOI: 10.1214/12-aoas578] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

140

Inter-individual differences in response to dietary intervention: integrating omics platforms towards personalised dietary recommendations. Proc Nutr Soc 2013;72:207-18. [PMID: 23388096 DOI: 10.1017/s0029665113000025] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Abstract

Technologic advances now make it possible to collect large amounts of genetic, epigenetic, metabolomic and gut microbiome data. These data have the potential to transform approaches towards nutrition counselling by allowing us to recognise and embrace the metabolic, physiologic and genetic differences among individuals. The ultimate goal is to be able to integrate these multi-dimensional data so as to characterise the health status and disease risk of an individual and to provide personalised recommendations to maximise health. To this end, accurate and predictive systems-based measures of health are needed that incorporate molecular signatures of genes, transcripts, proteins, metabolites and microbes. Although we are making progress within each of these omics arenas, we have yet to integrate effectively multiple sources of biologic data so as to provide comprehensive phenotypic profiles. Observational studies have provided some insights into associative interactions between genetic or phenotypic variation and diet and their impact on health; however, very few human experimental studies have addressed these relationships. Dietary interventions that test prescribed diets in well-characterised study populations and that monitor system-wide responses (ideally using several omics platforms) are needed to make correlation-causation connections and to characterise phenotypes under controlled conditions. Given the growth in our knowledge, there is the potential to develop personalised dietary recommendations. However, developing these recommendations assumes that an improved understanding of the phenotypic complexities of individuals and their responses to the complexities of their diets will lead to a sustainable, effective approach to promote health and prevent disease - therein lies our challenge.

Collapse

141

Liquet B, Cao KAL, Hocini H, Thiébaut R. A novel approach for biomarker selection and the integration of repeated measures experiments from two assays. BMC Bioinformatics 2012;13:325. [PMID: 23216942 PMCID: PMC3627901 DOI: 10.1186/1471-2105-13-325] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 11/26/2012] [Indexed: 01/21/2023] Open

Abstract

BACKGROUND

High throughput 'omics' experiments are usually designed to compare changes observed between different conditions (or interventions) and to identify biomarkers capable of characterizing each condition. We consider the complex structure of repeated measurements from different assays where different conditions are applied on the same subjects.

RESULTS

We propose a two-step analysis combining a multilevel approach and a multivariate approach to reveal separately the effects of conditions within subjects from the biological variation between subjects. The approach is extended to two-factor designs and to the integration of two matched data sets. It allows internal variable selection to highlight genes able to discriminate the net condition effect within subjects. A simulation study was performed to demonstrate the good performance of the multilevel multivariate approach compared to a classical multivariate method. The multilevel multivariate approach outperformed the classical multivariate approach with respect to the classification error rate and the selection of relevant genes. The approach was applied to an HIV-vaccine trial evaluating the response with gene expression and cytokine secretion. The discriminant multilevel analysis selected a relevant subset of genes while the integrative multilevel analysis highlighted clusters of genes and cytokines that were highly correlated across the samples.

CONCLUSIONS

Our combined multilevel multivariate approach may help in finding signatures of vaccine effect and allows for a better understanding of immunological mechanisms activated by the intervention. The integrative analysis revealed clusters of genes, that were associated with cytokine secretion. These clusters can be seen as gene signatures to predict future cytokine response. The approach is implemented in the R package mixOmics (http://cran.r-project.org/) with associated tutorials to perform the analysis(a).

Collapse

142

González I, Cao KAL, Davis MJ, Déjean S. Visualising associations between paired 'omics' data sets. BioData Min 2012;5:19. [PMID: 23148523 PMCID: PMC3630015 DOI: 10.1186/1756-0381-5-19] [Citation(s) in RCA: 195] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 10/15/2012] [Indexed: 12/11/2022] Open

143

Carreno-Quintero N, Bouwmeester HJ, Keurentjes JJB. Genetic analysis of metabolome-phenotype interactions: from model to crop species. Trends Genet 2012;29:41-50. [PMID: 23084137 DOI: 10.1016/j.tig.2012.09.006] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Revised: 09/18/2012] [Accepted: 09/20/2012] [Indexed: 10/27/2022]

144

Tong P, Coombes KR. integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory. ACTA ACUST UNITED AC 2012;28:2861-9. [PMID: 23014630 DOI: 10.1093/bioinformatics/bts561] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

145

Cao H, Lei S, Deng HW, Wang YP. Identification of genes for complex diseases using integrated analysis of multiple types of genomic data. PLoS One 2012;7:e42755. [PMID: 22957024 PMCID: PMC3434191 DOI: 10.1371/journal.pone.0042755] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2012] [Accepted: 07/10/2012] [Indexed: 12/17/2022] Open

146

McWilliams B, Montana G. Multi-view predictive partitioning in high dimensions. Stat Anal Data Min 2012. [DOI: 10.1002/sam.11144] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

147

Integrative subtype discovery in glioblastoma using iCluster. PLoS One 2012;7:e35236. [PMID: 22539962 PMCID: PMC3335101 DOI: 10.1371/journal.pone.0035236] [Citation(s) in RCA: 147] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2011] [Accepted: 03/13/2012] [Indexed: 12/31/2022] Open

148

Comparison of Penalty Functions for Sparse Canonical Correlation Analysis. Comput Stat Data Anal 2012;56:245-254. [PMID: 21984855 DOI: 10.1016/j.csda.2011.07.012] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

149

Cao H, Lei S, Deng HW, Wang YP. Identification of genes for complex diseases by integrating multiple types of genomic data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012;2012:5541-5544. [PMID: 23367184 PMCID: PMC4164202 DOI: 10.1109/embc.2012.6347249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

150

Van Deun K, Wilderjans TF, van den Berg RA, Antoniadis A, Van Mechelen I. A flexible framework for sparse simultaneous component based data integration. BMC Bioinformatics 2011;12:448. [PMID: 22085701 PMCID: PMC3283562 DOI: 10.1186/1471-2105-12-448] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 11/15/2011] [Indexed: 12/05/2022] Open

Abstract

1 Background

High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account.

2 Results

We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of Escherichia coli samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks.

3 Conclusion

Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform (group lasso approach) as well as structures that involve all data platforms (Elitist lasso approach).

4 Availability

The additional file contains a MATLAB implementation of the sparse simultaneous component method.

Collapse