2
|
Mary-Huard T, Das S, Mukhopadhyay I, Robin S. Querying multiple sets of P-values through composed hypothesis testing. Bioinformatics 2021; 38:141-148. [PMID: 34478490 DOI: 10.1093/bioinformatics/btab592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 07/16/2021] [Accepted: 07/27/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as a set of P-values resulting from previous analyses, that need to be combined flexibly to explore complex hypotheses, while guaranteeing a low proportion of false discoveries. RESULTS We introduce the generic concept of composed hypothesis, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classifying the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated in simulations and on two different applications. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases. AVAILABILITY AND IMPLEMENTATION The QCH methodology is available in the qch package hosted on CRAN. Additionally, R codes to reproduce the Einkorn example are available on the personal webpage of the first author: https://www6.inrae.fr/mia-paris/Equipes/Membres/Tristan-Mary-Huard. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tristan Mary-Huard
- Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.,Génétique Quantitative et Evolution (GQE)-Le Moulon, Universite Paris-Saclay, INRAE, CNRS, AgroParisTech, Gif-sur-Yvette 91190, France
| | - Sarmistha Das
- Human Genetics Unit, Indian Statistical Institute, Kolkata 700108, India
| | | | - Stéphane Robin
- Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.,Centre d'Écologie et des Sciences de la Conservation (CESCO), MNHN, CNRS, Sorbonne Université, Paris 75005, France
| |
Collapse
|
5
|
Guo X, Song Y, Liu S, Gao M, Qi Y, Shang X. Linking genotype to phenotype in multi-omics data of small sample. BMC Genomics 2021; 22:537. [PMID: 34256701 PMCID: PMC8278664 DOI: 10.1186/s12864-021-07867-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 06/30/2021] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) that link genotype to phenotype represent an effective means to associate an individual genetic background with a disease or trait. However, single-omics data only provide limited information on biological mechanisms, and it is necessary to improve the accuracy for predicting the biological association between genotype and phenotype by integrating multi-omics data. Typically, gene expression data are integrated to analyze the effect of single nucleotide polymorphisms (SNPs) on phenotype. Such multi-omics data integration mainly follows two approaches: multi-staged analysis and meta-dimensional analysis, which respectively ignore intra-omics and inter-omics associations. Moreover, both approaches require omics data from a single sample set, and the large feature set of SNPs necessitates a large sample size for model establishment, but it is difficult to obtain multi-omics data from a single, large sample set. RESULTS To address this problem, we propose a method of genotype-phenotype association based on multi-omics data from small samples. The workflow of this method includes clustering genes using a protein-protein interaction network and gene expression data, screening gene clusters with group lasso, obtaining SNP clusters corresponding to the selected gene clusters through expression quantitative trait locus data, integrating SNP clusters and corresponding gene clusters and phenotypes into three-layer network blocks, analyzing and predicting based on each block, and obtaining the final prediction by taking the average. CONCLUSIONS We compare this method to others using two datasets and find that our method shows better results in both cases. Our method can effectively solve the prediction problem in multi-omics data of small sample, and provide valuable resources for further studies on the fusion of more omics data.
Collapse
Affiliation(s)
- Xinpeng Guo
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China
- School of Air and Missile Defense, Air Force Engineering University, Xi'an, 710051, People's Republic of China
| | - Yafei Song
- School of Air and Missile Defense, Air Force Engineering University, Xi'an, 710051, People's Republic of China
| | - Shuhui Liu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China
| | - Meihong Gao
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China
| | - Yang Qi
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, People's Republic of China.
| |
Collapse
|
6
|
Fu Y, Xu J, Tang Z, Wang L, Yin D, Fan Y, Zhang D, Deng F, Zhang Y, Zhang H, Wang H, Xing W, Yin L, Zhu S, Zhu M, Yu M, Li X, Liu X, Yuan X, Zhao S. A gene prioritization method based on a swine multi-omics knowledgebase and a deep learning model. Commun Biol 2020; 3:502. [PMID: 32913254 PMCID: PMC7483748 DOI: 10.1038/s42003-020-01233-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 08/07/2020] [Indexed: 12/27/2022] Open
Abstract
The analyses of multi-omics data have revealed candidate genes for objective traits. However, they are integrated poorly, especially in non-model organisms, and they pose a great challenge for prioritizing candidate genes for follow-up experimental verification. Here, we present a general convolutional neural network model that integrates multi-omics information to prioritize the candidate genes of objective traits. By applying this model to Sus scrofa, which is a non-model organism, but one of the most important livestock animals, the model precision was 72.9%, recall 73.5%, and F1-Measure 73.4%, demonstrating a good prediction performance compared with previous studies in Arabidopsis thaliana and Oryza sativa. Additionally, to facilitate the use of the model, we present ISwine (http://iswine.iomics.pro/), which is an online comprehensive knowledgebase in which we incorporated almost all the published swine multi-omics data. Overall, the results suggest that the deep learning strategy will greatly facilitate analyses of multi-omics integration in the future. Yuhua Fu et al. develop a CNN model that integrates multi-omics information to prioritize candidate genes of objective traits. Their model performs well when applied to important livestock non-model animals like Sus scrofa. Finally, the authors present ISwine, an online comprehensive knowledgebase which includes all published swine omics data to facilitate the integration of heterogeneous data.
Collapse
Affiliation(s)
- Yuhua Fu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China.,School of Computer Science and Technology, Wuhan University of Technology, 430070, Wuhan, Hubei, P.R. China
| | - Jingya Xu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Zhenshuang Tang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Lu Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Dong Yin
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Yu Fan
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Dongdong Zhang
- School of Computer Science and Technology, Wuhan University of Technology, 430070, Wuhan, Hubei, P.R. China
| | - Fei Deng
- School of Computer Science and Technology, Wuhan University of Technology, 430070, Wuhan, Hubei, P.R. China
| | - Yanping Zhang
- School of Computer Science and Technology, Wuhan University of Technology, 430070, Wuhan, Hubei, P.R. China
| | - Haohao Zhang
- School of Computer Science and Technology, Wuhan University of Technology, 430070, Wuhan, Hubei, P.R. China
| | - Haiyan Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Wenhui Xing
- School of Computer Science and Technology, Wuhan University of Technology, 430070, Wuhan, Hubei, P.R. China
| | - Lilin Yin
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Shilin Zhu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Mengjin Zhu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Mei Yu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Xinyun Li
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China
| | - Xiaolei Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China.
| | - Xiaohui Yuan
- School of Computer Science and Technology, Wuhan University of Technology, 430070, Wuhan, Hubei, P.R. China.
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, Key Laboratory of Swine Genetics and Breeding, Ministry of Agriculture, College of Animal Science and Technology, Huazhong Agricultural University, 430070, Wuhan, Hubei, P.R. China.
| |
Collapse
|