1
|
Shi Y, Fang J, Li J, Yu K, Zhu J, Lu Y. Fracture risk prediction in diabetes patients based on Lasso feature selection and Machine Learning. Comput Methods Biomech Biomed Engin 2024:1-17. [PMID: 39257307 DOI: 10.1080/10255842.2024.2400325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/12/2024] [Accepted: 08/21/2024] [Indexed: 09/12/2024]
Abstract
Fracture risk among individuals with diabetes poses significant clinical challenges due to the multifaceted relationship between diabetes and bone health. Diabetes not only affects bone density but also alters bone quality and structure, thereby increases the susceptibility to fractures. Given the rising prevalence of diabetes worldwide and its associated complications, accurate prediction of fracture risk in diabetic individuals has emerged as a pressing clinical need. This study aims to investigate the factors influencing fracture risk among diabetic patients. We propose a framework that combines Lasso feature selection with eight classification algorithms. Initially, Lasso regression is employed to select 24 significant features. Subsequently, we utilize grid search and 5-fold cross-validation to train and tune the selected classification algorithms, including KNN, Naive Bayes, Decision Tree, Random Forest, AdaBoost, XGBoost, Multi-layer Perceptron (MLP), and Support Vector Machine (SVM). Among models trained using these important features, Random Forest exhibits the highest performance with a predictive accuracy of 93.87%. Comparative analysis across all features, important features, and remaining features demonstrate the crucial role of features selected by Lasso regression in predicting fracture risk among diabetic patients. Besides, by using a feature importance ranking algorithm, we find several features that hold significant reference values for predicting early bone fracture risk in diabetic individuals.
Collapse
Affiliation(s)
- Yu Shi
- School of Computer Science & Technology, Soochow University, Suzhou, China
| | - Junhua Fang
- School of Computer Science & Technology, Soochow University, Suzhou, China
| | - Jiayi Li
- School of Computer Science & Technology, Soochow University, Suzhou, China
| | - Kaiwen Yu
- Orthopedics Department, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Jingbo Zhu
- Orthopedics Department, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Yan Lu
- Orthopedics Department, The First Affiliated Hospital of Soochow University, Suzhou, China
| |
Collapse
|
2
|
Rönn T, Perfilyev A, Oskolkov N, Ling C. Predicting type 2 diabetes via machine learning integration of multiple omics from human pancreatic islets. Sci Rep 2024; 14:14637. [PMID: 38918439 PMCID: PMC11199577 DOI: 10.1038/s41598-024-64846-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 06/13/2024] [Indexed: 06/27/2024] Open
Abstract
Type 2 diabetes (T2D) is the fastest growing non-infectious disease worldwide. Impaired insulin secretion from pancreatic beta-cells is a hallmark of T2D, but the mechanisms behind this defect are insufficiently characterized. Integrating multiple layers of biomedical information, such as different Omics, may allow more accurate understanding of complex diseases such as T2D. Our aim was to explore and use Machine Learning to integrate multiple sources of biological/molecular information (multiOmics), in our case RNA-sequening, DNA methylation, SNP and phenotypic data from islet donors with T2D and non-diabetic controls. We exploited Machine Learning to perform multiOmics integration of DNA methylation, expression, SNPs, and phenotypes from pancreatic islets of 110 individuals, with ~ 30% being T2D cases. DNA methylation was analyzed using Infinium MethylationEPIC array, expression was analyzed using RNA-sequencing, and SNPs were analyzed using HumanOmniExpress arrays. Supervised linear multiOmics integration via DIABLO based on Partial Least Squares (PLS) achieved an accuracy of 91 ± 15% of T2D prediction with an area under the curve of 0.96 ± 0.08 on the test dataset after cross-validation. Biomarkers identified by this multiOmics integration, including SACS and TXNIP DNA methylation, OPRD1 and RHOT1 expression and a SNP annotated to ANO1, provide novel insights into the interplay between different biological mechanisms contributing to T2D. This Machine Learning approach of multiOmics cross-sectional data from human pancreatic islets achieved a promising accuracy of T2D prediction, which may potentially find broad applications in clinical diagnostics. In addition, it delivered novel candidate biomarkers for T2D and links between them across the different Omics.
Collapse
Affiliation(s)
- Tina Rönn
- Epigenetics and Diabetes Unit, Department of Clinical Sciences, Lund University Diabetes Centre, Scania University Hospital, Lund University, 205 02, Malmö, Sweden
| | - Alexander Perfilyev
- Epigenetics and Diabetes Unit, Department of Clinical Sciences, Lund University Diabetes Centre, Scania University Hospital, Lund University, 205 02, Malmö, Sweden
| | - Nikolay Oskolkov
- Science for Life Laboratory, Department of Biology, National Bioinformatics Infrastructure Sweden, Lund University, Sölvegatan 35, 223 62, Lund, Sweden
| | - Charlotte Ling
- Epigenetics and Diabetes Unit, Department of Clinical Sciences, Lund University Diabetes Centre, Scania University Hospital, Lund University, 205 02, Malmö, Sweden.
| |
Collapse
|
3
|
Meng Y, Davison J, Clarke JT, Zobel M, Gerz M, Moora M, Öpik M, Bueno CG. Environmental modulation of plant mycorrhizal traits in the global flora. Ecol Lett 2023; 26:1862-1876. [PMID: 37766496 DOI: 10.1111/ele.14309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 08/15/2023] [Accepted: 08/21/2023] [Indexed: 09/29/2023]
Abstract
Mycorrhizal symbioses are known to strongly influence plant performance, structure plant communities and shape ecosystem dynamics. Plant mycorrhizal traits, such as those characterising mycorrhizal type (arbuscular (AM), ecto-, ericoid or orchid mycorrhiza) and status (obligately (OM), facultatively (FM) or non-mycorrhizal) offer valuable insight into plant belowground functionality. Here, we compile available plant mycorrhizal trait information and global occurrence data (∼ 100 million records) for 11,770 vascular plant species. Using a plant phylogenetic mega-tree and high-resolution climatic and edaphic data layers, we assess phylogenetic and environmental correlates of plant mycorrhizal traits. We find that plant mycorrhizal type is more phylogenetically conserved than plant mycorrhizal status, while environmental variables (both climatic and edaphic; notably soil texture) explain more variation in mycorrhizal status, especially FM. The previously underestimated role of environmental conditions has far-reaching implications for our understanding of ecosystem functioning under changing climatic and soil conditions.
Collapse
Affiliation(s)
- Yiming Meng
- Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| | - John Davison
- Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| | - John T Clarke
- GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
- Department of Ecology and Biogeography, Nicolaus Copernicus University in Toruń, Toruń, Poland
- Department of Zoology, Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| | - Martin Zobel
- Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| | - Maret Gerz
- Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| | - Mari Moora
- Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| | - Maarja Öpik
- Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
| | - C Guillermo Bueno
- Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia
- Pyrenean Institute of Ecology, IPE-CSIC, Jaca, Spain
| |
Collapse
|
4
|
Wu Y, Liu H, Liu S, Lou C. Estimate of near-surface NO 2 concentrations in Fenwei Plain, China, based on TROPOMI data and random forest model. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023; 195:1379. [PMID: 37882903 DOI: 10.1007/s10661-023-11993-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Accepted: 10/12/2023] [Indexed: 10/27/2023]
Abstract
Nitrogen dioxide (NO2) concentration is a crucial indicator of ground-level air quality, and elevated concentrations can adversely affect human health and the atmospheric environment. In this study, we utilized Tropospheric Monitoring Instrument (TROPOMI) tropospheric NO2 vertical column density data (VCD) and multi-source geographic data to establish a random forest regression (RF) model that accurately estimates NO2 concentrations near the ground in the Fenwei Plain. The model addresses the inherent limitations of traditional ground-based monitoring and provides data support for analyzing regional pollution spatial and temporal characteristics. (1) The RF model based on TROPOMI and geographic data demonstrates high estimation accuracy, with monthly average RF model fit and validation coefficient of determination (R2) reaching 0.949 and 0.875, respectively. (2) A complex nonlinear relationship exists between near-surface NO2 concentration and multi-source geographic data. The RF model's estimations reveal clear seasonal and regional variations in near-surface NO2 concentration. Concentrations are generally highest in winter, followed by spring and autumn, and lowest in summer. The high NO2 concentrations are primarily mainly distributed in the plains and river valleys with low elevation and dense population density. The model estimation results also indicate that the estimated effect is better when the NO2 concentration fluctuates less and anthropogenic emission reduction measures significantly impact the NO2 concentration near the ground. (3) The population exposure risk results indicate that most cities in the Fenwei Plain face varying exposure risks. These findings offer valuable insights for regional NO2 pollution management.
Collapse
Affiliation(s)
- Yarui Wu
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, 710054, Shaanxi, China.
| | - Honglei Liu
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, 710054, Shaanxi, China
| | - Shuangyue Liu
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, 710054, Shaanxi, China
| | - Chunhui Lou
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, 710054, Shaanxi, China
| |
Collapse
|
5
|
Young T, Laroche O, Walker SP, Miller MR, Casanovas P, Steiner K, Esmaeili N, Zhao R, Bowman JP, Wilson R, Bridle A, Carter CG, Nowak BF, Alfaro AC, Symonds JE. Prediction of Feed Efficiency and Performance-Based Traits in Fish via Integration of Multiple Omics and Clinical Covariates. BIOLOGY 2023; 12:1135. [PMID: 37627019 PMCID: PMC10452023 DOI: 10.3390/biology12081135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023]
Abstract
Fish aquaculture is a rapidly expanding global industry, set to support growing demands for sources of marine protein. Enhancing feed efficiency (FE) in farmed fish is required to reduce production costs and improve sector sustainability. Recognising that organisms are complex systems whose emerging phenotypes are the product of multiple interacting molecular processes, systems-based approaches are expected to deliver new biological insights into FE and growth performance. Here, we establish 14 diverse layers of multi-omics and clinical covariates to assess their capacities to predict FE and associated performance traits in a fish model (Oncorhynchus tshawytscha) and uncover the influential variables. Inter-omic relatedness between the different layers revealed several significant concordances, particularly between datasets originating from similar material/tissue and between blood indicators and some of the proteomic (liver), metabolomic (liver), and microbiomic layers. Single- and multi-layer random forest (RF) regression models showed that integration of all data layers provide greater FE prediction power than any single-layer model alone. Although FE was among the most challenging of the traits we attempted to predict, the mean accuracy of 40 different FE models in terms of root-mean square errors normalized to percentage was 30.4%, supporting RF as a feature selection tool and approach for complex trait prediction. Major contributions to the integrated FE models were derived from layers of proteomic and metabolomic data, with substantial influence also provided by the lipid composition layer. A correlation matrix of the top 27 variables in the models highlighted FE trait-associations with faecal bacteria (Serratia spp.), palmitic and nervonic acid moieties in whole body lipids, levels of free glycerol in muscle, and N-acetylglutamic acid content in liver. In summary, we identified subsets of molecular characteristics for the assessment of commercially relevant performance-based metrics in farmed Chinook salmon.
Collapse
Affiliation(s)
- Tim Young
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
- The Centre for Biomedical and Chemical Sciences, School of Science, Auckland University of Technology, Private Bag 92006, Auckland 1142, New Zealand
| | | | | | - Matthew R. Miller
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | | | | | - Noah Esmaeili
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Ruixiang Zhao
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - John P. Bowman
- Tasmanian Institute of Agricultural Research, University of Tasmania, Hobart 7005, Australia
| | - Richard Wilson
- Central Science Laboratory, Research Division, University of Tasmania, Hobart 7001, Australia
| | - Andrew Bridle
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Chris G. Carter
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
- Blue Economy Cooperative Research Centre, Launceston 7250, Australia
| | - Barbara F. Nowak
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Andrea C. Alfaro
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
| | - Jane E. Symonds
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| |
Collapse
|
6
|
Chen W, Lv X, Cao X, Yuan Z, Wang S, Getachew T, Mwacharo JM, Haile A, Quan K, Li Y, Sun W. Integration of the Microbiome, Metabolome and Transcriptome Reveals Escherichia coli F17 Susceptibility of Sheep. Animals (Basel) 2023; 13:ani13061050. [PMID: 36978593 PMCID: PMC10044122 DOI: 10.3390/ani13061050] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 03/09/2023] [Accepted: 03/11/2023] [Indexed: 03/17/2023] Open
Abstract
Escherichia coli (E. coli) F17 is one of the most common pathogens causing diarrhea in farm livestock. In the previous study, we accessed the transcriptomic and microbiomic profile of E. coli F17-antagonism (AN) and -sensitive (SE) lambs; however, the biological mechanism underlying E. coli F17 infection has not been fully elucidated. Therefore, the present study first analyzed the metabolite data obtained with UHPLC-MS/MS. A total of 1957 metabolites were profiled in the present study, and 11 differential metabolites were identified between E. coli F17 AN and SE lambs (i.e., FAHFAs and propionylcarnitine). Functional enrichment analyses showed that most of the identified metabolites were related to the lipid metabolism. Then, we presented a machine-learning approach (Random Forest) to integrate the microbiome, metabolome and transcriptome data, which identified subsets of potential biomarkers for E. coli F17 infection (i.e., GlcADG 18:0-18:2, ethylmalonic acid and FBLIM1); furthermore, the PCCs were calculated and the interaction network was constructed to gain insight into the crosstalk between the genes, metabolites and bacteria in E. coli F17 AN/SE lambs. By combing classic statistical approaches and a machine-learning approach, our results revealed subsets of metabolites, genes and bacteria that could be potentially developed as candidate biomarkers for E. coli F17 infection in lambs.
Collapse
Affiliation(s)
- Weihao Chen
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China
| | - Xiaoyang Lv
- Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou 225009, China
- International Joint Research Laboratory in Universities of Jiangsu Province of China for Domestic Animal Germplasm Resources and Genetic Improvement, Yangzhou University, Yangzhou 225009, China
| | - Xiukai Cao
- Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou 225009, China
| | - Zehu Yuan
- Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou 225009, China
| | - Shanhe Wang
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China
| | - Tesfaye Getachew
- International Centre for Agricultural Research in the Dry Areas, Addis Ababa 999047, Ethiopia
| | - Joram M. Mwacharo
- International Centre for Agricultural Research in the Dry Areas, Addis Ababa 999047, Ethiopia
| | - Aynalem Haile
- International Centre for Agricultural Research in the Dry Areas, Addis Ababa 999047, Ethiopia
| | - Kai Quan
- College of Animal Science and Technology, Henan University of Animal Husbandry and Economics, Zhengzhou 450046, China
| | - Yutao Li
- CSIRO Agriculture and Food, 306 Carmody Rd, St Lucia, QLD 4067, Australia
| | - Wei Sun
- College of Animal Science and Technology, Yangzhou University, Yangzhou 225009, China
- Joint International Research Laboratory of Agriculture and Agri-Product Safety of Ministry of Education of China, Yangzhou University, Yangzhou 225009, China
- International Joint Research Laboratory in Universities of Jiangsu Province of China for Domestic Animal Germplasm Resources and Genetic Improvement, Yangzhou University, Yangzhou 225009, China
- “Innovative China” “Belt and Road” International Agricultural Technology Innovation Institute for Evaluation, Protection, and Improvement on Sheep Genetic Resource, Yangzhou 225009, China
- Correspondence: ; Tel.: +86-13952750912
| |
Collapse
|
7
|
Kircher M, Säurich J, Selle M, Jung K. Assessing Outlier Probabilities in Transcriptomics Data When Evaluating a Classifier. Genes (Basel) 2023; 14:genes14020387. [PMID: 36833313 PMCID: PMC9956321 DOI: 10.3390/genes14020387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 01/27/2023] [Accepted: 01/30/2023] [Indexed: 02/04/2023] Open
Abstract
Outliers in the training or test set used to fit and evaluate a classifier on transcriptomics data can considerably change the estimated performance of the model. Hence, an either too weak or a too optimistic accuracy is then reported and the estimated model performance cannot be reproduced on independent data. It is then also doubtful whether a classifier qualifies for clinical usage. We estimate classifier performances in simulated gene expression data with artificial outliers and in two real-world datasets. As a new approach, we use two outlier detection methods within a bootstrap procedure to estimate the outlier probability for each sample and evaluate classifiers before and after outlier removal by means of cross-validation. We found that the removal of outliers changed the classification performance notably. For the most part, removing outliers improved the classification results. Taking into account the fact that there are various, sometimes unclear reasons for a sample to be an outlier, we strongly advocate to always report the performance of a transcriptomics classifier with and without outliers in training and test data. This provides a more diverse picture of a classifier's performance and prevents reporting models that later turn out to be not applicable for clinical diagnoses.
Collapse
|
8
|
Mahmood U, Li X, Fan Y, Chang W, Niu Y, Li J, Qu C, Lu K. Multi-omics revolution to promote plant breeding efficiency. FRONTIERS IN PLANT SCIENCE 2022; 13:1062952. [PMID: 36570904 PMCID: PMC9773847 DOI: 10.3389/fpls.2022.1062952] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
Crop production is the primary goal of agricultural activities, which is always taken into consideration. However, global agricultural systems are coming under increasing pressure from the rising food demand of the rapidly growing world population and changing climate. To address these issues, improving high-yield and climate-resilient related-traits in crop breeding is an effective strategy. In recent years, advances in omics techniques, including genomics, transcriptomics, proteomics, and metabolomics, paved the way for accelerating plant/crop breeding to cope with the changing climate and enhance food production. Optimized omics and phenotypic plasticity platform integration, exploited by evolving machine learning algorithms will aid in the development of biological interpretations for complex crop traits. The precise and progressive assembly of desire alleles using precise genome editing approaches and enhanced breeding strategies would enable future crops to excel in combating the changing climates. Furthermore, plant breeding and genetic engineering ensures an exclusive approach to developing nutrient sufficient and climate-resilient crops, the productivity of which can sustainably and adequately meet the world's food, nutrition, and energy needs. This review provides an overview of how the integration of omics approaches could be exploited to select crop varieties with desired traits.
Collapse
Affiliation(s)
- Umer Mahmood
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Xiaodong Li
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Yonghai Fan
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Wei Chang
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Yue Niu
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Jiana Li
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
- Academy of Agricultural Sciences, Southwest University, Chongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
| | - Cunmin Qu
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
- Academy of Agricultural Sciences, Southwest University, Chongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
| | - Kun Lu
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
- Academy of Agricultural Sciences, Southwest University, Chongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
| |
Collapse
|
9
|
Kao PH, Baiya S, Lai ZY, Huang CM, Jhan LH, Lin CJ, Lai YS, Kao CF. An advanced systems biology framework of feature engineering for cold tolerance genes discovery from integrated omics and non-omics data in soybean. FRONTIERS IN PLANT SCIENCE 2022; 13:1019709. [PMID: 36247545 PMCID: PMC9562094 DOI: 10.3389/fpls.2022.1019709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/06/2022] [Indexed: 06/16/2023]
Abstract
Soybean is sensitive to low temperatures during the crop growing season. An urgent demand for breeding cold-tolerant cultivars to alleviate the production loss is apparent to cope with this scenario. Cold-tolerant trait is a complex and quantitative trait controlled by multiple genes, environmental factors, and their interaction. In this study, we proposed an advanced systems biology framework of feature engineering for the discovery of cold tolerance genes (CTgenes) from integrated omics and non-omics (OnO) data in soybean. An integrative pipeline was introduced for feature selection and feature extraction from different layers in the integrated OnO data using data ensemble methods and the non-parameter random forest prioritization to minimize uncertainties and false positives for accuracy improvement of results. In total, 44, 143, and 45 CTgenes were identified in short-, mid-, and long-term cold treatment, respectively, from the corresponding gene-pool. These CTgenes outperformed the remaining genes, the random genes, and the other candidate genes identified by other approaches in an independent RNA-seq database. Furthermore, we applied pathway enrichment and crosstalk network analyses to uncover relevant physiological pathways with the discovery of underlying cold tolerance in hormone- and defense-related modules. Our CTgenes were validated by using 55 SNP genotype data of 56 soybean samples in cold tolerance experiments. This suggests that the CTgenes identified from our proposed systematic framework can effectively distinguish cold-resistant and cold-sensitive lines. It is an important advancement in the soybean cold-stress response. The proposed pipelines provide an alternative solution to biomarker discovery, module discovery, and sample classification underlying a particular trait in plants in a robust and efficient way.
Collapse
Affiliation(s)
- Pei-Hsiu Kao
- Department of Agronomy, College of Agriculture and Natural Resources, National Chung Hsing University, Taichung, Taiwan
| | - Supaporn Baiya
- Department of Resource and Environment, Faculty of Science at Sriracha, Kasetsart University, Sriracha, Thailand
| | - Zheng-Yuan Lai
- Department of Agronomy, College of Agriculture and Natural Resources, National Chung Hsing University, Taichung, Taiwan
| | - Chih-Min Huang
- Department of Agronomy, College of Agriculture and Natural Resources, National Chung Hsing University, Taichung, Taiwan
| | - Li-Hsin Jhan
- Department of Agronomy, College of Agriculture and Natural Resources, National Chung Hsing University, Taichung, Taiwan
| | - Chian-Jiun Lin
- Department of Agronomy, College of Agriculture and Natural Resources, National Chung Hsing University, Taichung, Taiwan
| | - Ya-Syuan Lai
- Department of Agronomy, College of Agriculture and Natural Resources, National Chung Hsing University, Taichung, Taiwan
| | - Chung-Feng Kao
- Department of Agronomy, College of Agriculture and Natural Resources, National Chung Hsing University, Taichung, Taiwan
- Advanced Plant Biotechnology Center, National Chung Hsing University, Taichung, Taiwan
| |
Collapse
|
10
|
Liang M, An B, Chang T, Deng T, Du L, Li K, Cao S, Du Y, Xu L, Zhang L, Gao X, Li J, Gao H. Incorporating kernelized multi-omics data improves the accuracy of genomic prediction. J Anim Sci Biotechnol 2022; 13:103. [PMID: 36127743 PMCID: PMC9490992 DOI: 10.1186/s40104-022-00756-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 07/08/2022] [Indexed: 11/18/2022] Open
Abstract
Background Genomic selection (GS) has revolutionized animal and plant breeding after the first implementation via early selection before measuring phenotypes. Besides genome, transcriptome and metabolome information are increasingly considered new sources for GS. Difficulties in building the model with multi-omics data for GS and the limit of specimen availability have both delayed the progress of investigating multi-omics. Results We utilized the Cosine kernel to map genomic and transcriptomic data as \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${n}\times {n}$$\end{document}n×n symmetric matrix (G matrix and T matrix), combined with the best linear unbiased prediction (BLUP) for GS. Here, we defined five kernel-based prediction models: genomic BLUP (GBLUP), transcriptome-BLUP (TBLUP), multi-omics BLUP (MBLUP, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\boldsymbol M=\mathrm{ratio}\times\boldsymbol G+(1-\mathrm{ratio})\times\boldsymbol T$$\end{document}M=ratio×G+(1-ratio)×T), multi-omics single-step BLUP (mssBLUP), and weighted multi-omics single-step BLUP (wmssBLUP) to integrate transcribed individuals and genotyped resource population. The predictive accuracy evaluations in four traits of the Chinese Simmental beef cattle population showed that (1) MBLUP was far preferred to GBLUP (ratio = 1.0), (2) the prediction accuracy of wmssBLUP and mssBLUP had 4.18% and 3.37% average improvement over GBLUP, (3) We also found the accuracy of wmssBLUP increased with the growing proportion of transcribed cattle in the whole resource population. Conclusions We concluded that the inclusion of transcriptome data in GS had the potential to improve accuracy. Moreover, wmssBLUP is accepted to be a promising alternative for the present situation in which plenty of individuals are genotyped when fewer are transcribed. Supplementary Information The online version contains supplementary material available at 10.1186/s40104-022-00756-6.
Collapse
Affiliation(s)
- Mang Liang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Bingxing An
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Tianpeng Chang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Tianyu Deng
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Lili Du
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Keanning Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Sheng Cao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Yueying Du
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Lingyang Xu
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Lupei Zhang
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Xue Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Junya Li
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China
| | - Huijiang Gao
- Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193, People's Republic of China.
| |
Collapse
|
11
|
Yu Z, Wang Z, Jiang Q, Wang J, Zheng J, Zhang T. Analysis of Factors of Productivity of Tight Conglomerate Reservoirs Based on Random Forest Algorithm. ACS OMEGA 2022; 7:20390-20404. [PMID: 35721933 PMCID: PMC9202053 DOI: 10.1021/acsomega.2c02546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Accepted: 05/20/2022] [Indexed: 05/25/2023]
Abstract
The tight conglomerate reservoir of Baikouquan formation in the MA 131 well block in the Junggar basin abounds with petroleum reserves, yet the vertical wells in this reservoir have achieved a limited development effect. The tight conglomerate reservoirs have become an important target for exploration and exploitation. The high-efficiency development scheme of a small well spacing three-dimensional (3D) staggered well pattern has been determined by a series of field tests on well pattern and well spacing development. Multistage fracturing with a horizontal well has been demonstrated as the primary development technology. The horizontal wells in the MA 131 small well spacing demonstration area have achieved significantly different development effects, and the major controlling factors for high and stable production of a single well remain unclear. In this study, we proposed an evaluation model of major productivity controlling factors of the tight conglomerate reservoir to provide a reference for oil recovery based on a random forest (RF) machine-learning algorithm. The productivity factors were investigated from two aspects: petrophysical facies that are capable of indicating the genetic mechanism of geological dessert and engineering dessert parameters forming complex fracture networks. Resultantly, the reservoir in the MA 131 well block can be classified into 12 petrophysical facies according to the sedimentary characteristics and diagenesis analysis. The mercury injection curves of a variety of petrophysical facies can be classified into four reservoir quality types. The RF model was trained on 80% of the data to predict the oil well class using the selected features as primary inputs while the remaining 20% of the data were set to test the model performance. The results indicated that the RF model produced excellent results with only 12 misclassifications across the entire data set of 627 samples that represent <2% error. The important evaluation score of the random forest algorithm model showed that the reservoir type, oil saturation, horizontal stress difference, and gravel content are the most important four indicators, with each value exceeding 15%. Brittleness and maximum horizontal stress are considered the least important indexes, with values of less than 5%. Reservoir quality and oil saturation were confirmed as the major controlling factors and material foundation for oil wells' high and stable production. As indicated in this study, stress difference and gravel content are the major controlling factors in the formation of a complex fracture network.
Collapse
Affiliation(s)
- Zhichao Yu
- College
of Geosciences, China University of Petroleum, Beijing 102249, China
| | - Zhizhang Wang
- College
of Geosciences, China University of Petroleum, Beijing 102249, China
| | - Qingping Jiang
- Xinjiang
Oilfield Company Research Institute of Exploration & Exploitation, Karamay, Xinjiang 834000, China
| | - Jie Wang
- CNOOC
China Limited, Shenzhen Branch, Shenzhen, Guangdong 518064, China
| | - Jingrong Zheng
- College
of Geosciences, China University of Petroleum, Beijing 102249, China
| | - Tianyou Zhang
- CNOOC
Research Institute Co., Ltd., Beijing 100028, China
| |
Collapse
|
12
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|
13
|
Rudar J, Porter TM, Wright M, Golding GB, Hajibabaei M. LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data. BMC Bioinformatics 2022; 23:110. [PMID: 35361114 PMCID: PMC8969335 DOI: 10.1186/s12859-022-04631-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 03/07/2022] [Indexed: 11/10/2022] Open
Abstract
Background Identification of biomarkers, which are measurable characteristics of biological datasets, can be challenging. Although amplicon sequence variants (ASVs) can be considered potential biomarkers, identifying important ASVs in high-throughput sequencing datasets is challenging. Noise, algorithmic failures to account for specific distributional properties, and feature interactions can complicate the discovery of ASV biomarkers. In addition, these issues can impact the replicability of various models and elevate false-discovery rates. Contemporary machine learning approaches can be leveraged to address these issues. Ensembles of decision trees are particularly effective at classifying the types of data commonly generated in high-throughput sequencing (HTS) studies due to their robustness when the number of features in the training data is orders of magnitude larger than the number of samples. In addition, when combined with appropriate model introspection algorithms, machine learning algorithms can also be used to discover and select potential biomarkers. However, the construction of these models could introduce various biases which potentially obfuscate feature discovery. Results We developed a decision tree ensemble, LANDMark, which uses oblique and non-linear cuts at each node. In synthetic and toy tests LANDMark consistently ranked as the best classifier and often outperformed the Random Forest classifier. When trained on the full metabarcoding dataset obtained from Canada’s Wood Buffalo National Park, LANDMark was able to create highly predictive models and achieved an overall balanced accuracy score of 0.96 ± 0.06. The use of recursive feature elimination did not impact LANDMark’s generalization performance and, when trained on data from the BE amplicon, it was able to outperform the Linear Support Vector Machine, Logistic Regression models, and Stochastic Gradient Descent models (p ≤ 0.05). Finally, LANDMark distinguishes itself due to its ability to learn smoother non-linear decision boundaries. Conclusions Our work introduces LANDMark, a meta-classifier which blends the characteristics of several machine learning models into a decision tree and ensemble learning framework. To our knowledge, this is the first study to apply this type of ensemble approach to amplicon sequencing data and we have shown that analyzing these datasets using LANDMark can produce highly predictive and consistent models. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04631-z.
Collapse
Affiliation(s)
- Josip Rudar
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.
| | - Teresita M Porter
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - Michael Wright
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, 1280 Main St. West, Hamilton, ON, L8S 4K1, Canada
| | - Mehrdad Hajibabaei
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.
| |
Collapse
|
14
|
Li J, Chen F, Liang H, Yan J. MoNET: an R package for multi-omic network analysis. Bioinformatics 2022; 38:1165-1167. [PMID: 34694378 DOI: 10.1093/bioinformatics/btab722] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 08/31/2021] [Accepted: 10/19/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The increasing availability of multi-omic data has enabled the discovery of disease biomarkers in different scales. Understanding the functional interaction between multi-omic biomarkers is becoming increasingly important due to its great potential for providing insights of the underlying molecular mechanism. RESULTS Leveraging multiple biological network databases, we integrated the relationship between single nucleotide polymorphisms (SNPs), genes/proteins and metabolites, and developed an R package Multi-omic Network Explorer Tool (MoNET) for multi-omic network analysis. This new tool enables users to not only track down the interaction of SNPs/genes with metabolome level, but also trace back for the potential risk variants/regulators given altered genes/metabolites. MoNET is expected to advance our understanding of the multi-omic findings by unveiling their transomic interactions and is likely to generate new hypotheses for further validation. AVAILABILITY AND IMPLEMENTATION The MoNET package is freely available on https://github.com/JW-Yan/MONET. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jin Li
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Feng Chen
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Hong Liang
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China
| | - Jingwen Yan
- Department of BioHealth Informatics, School of Informatics and Computing, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA
| |
Collapse
|
15
|
Meng D, Xu J, Zhao J. Analysis and prediction of hand, foot and mouth disease incidence in China using Random Forest and XGBoost. PLoS One 2021; 16:e0261629. [PMID: 34936688 PMCID: PMC8694472 DOI: 10.1371/journal.pone.0261629] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 12/06/2021] [Indexed: 12/13/2022] Open
Abstract
Hand, foot and mouth disease (HFMD) is an increasingly serious public health problem, and it has caused an outbreak in China every year since 2008. Predicting the incidence of HFMD and analyzing its influential factors are of great significance to its prevention. Now, machine learning has shown advantages in infectious disease models, but there are few studies on HFMD incidence based on machine learning that cover all the provinces in mainland China. In this study, we proposed two different machine learning algorithms, Random Forest and eXtreme Gradient Boosting (XGBoost), to perform our analysis and prediction. We first used Random Forest to examine the association between HFMD incidence and potential influential factors for 31 provinces in mainland China. Next, we established Random Forest and XGBoost prediction models using meteorological and social factors as the predictors. Finally, we applied our prediction models in four different regions of mainland China and evaluated the performance of them. Our results show that: 1) Meteorological factors and social factors jointly affect the incidence of HFMD in mainland China. Average temperature and population density are the two most significant influential factors; 2) Population flux has different delayed effect in affecting HFMD incidence in different regions. From a national perspective, the model using population flux data delayed for one month has better prediction performance; 3) The prediction capability of XGBoost model was better than that of Random Forest model from the overall perspective. XGBoost model is more suitable for predicting the incidence of HFMD in mainland China.
Collapse
Affiliation(s)
- Delin Meng
- Complexity Science Institute, Qingdao University, Qingdao, Shandong, China
| | - Jun Xu
- State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China
| | - Jijun Zhao
- Complexity Science Institute, Qingdao University, Qingdao, Shandong, China
- * E-mail:
| |
Collapse
|
16
|
De San-Martin BS, Ferreira VG, Bitencourt MR, Pereira PCG, Carrilho E, de Assunção NA, de Carvalho LRS. Metabolomics as a potential tool for the diagnosis of growth hormone deficiency (GHD): a review. ARCHIVES OF ENDOCRINOLOGY AND METABOLISM 2021; 64:654-663. [PMID: 33085993 PMCID: PMC10528619 DOI: 10.20945/2359-3997000000300] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 08/25/2020] [Indexed: 11/23/2022]
Abstract
Metabolomics uses several analytical tools to identify the chemical diversity of metabolites present in organisms. These metabolites are low molecular weight molecules (<1500 Da) classified as a final or intermediary product of metabolic processes. The application of this omics technology has become prominent in inferring physiological conditions through reporting on the phenotypic state; therefore, the introduction of metabolomics into clinical studies has been growing in recent years due to its efficiency in discriminating pathophysiological states. Regarding endocrine diseases, there is a great interest in verifying comprehensive and individualized physiological scenarios, in particular for growth hormone deficiency (GHD). The current GHD diagnostic tests are laborious and invasive and there is no exam with ideal reproducibility and sensitivity for diagnosis neither standard GH cut-off point. Therefore, this review was focussed on articles that applied metabolomics in the search for new biomarkers for GHD. The present work shows that the applications of metabolomics in GHD are still limited, since the little complementarily of analytical techniques, a low number of samples, GHD combined to other deficiencies, and idiopathic diagnosis shows a lack of progress. The results of the research are relevant and similar; however, their results do not provide an application for clinical practice due to the lack of multidisciplinary actions that would be needed to mediate the translation of the knowledge produced in the laboratory, if transferred to the medical setting.
Collapse
Affiliation(s)
- Breno Sena De San-Martin
- Escola Paulista de Medicina da Universidade Federal de São Paulo (EPM-UNIFESP), São Paulo, SP, Brasil
| | - Vinícius Guimarães Ferreira
- Instituto de Química de São Carlos da Universidade de São Paulo (IQSC-USP), São Carlos, SP, Brasil
- Instituto Nacional de Ciência e Tecnologia de Bioanalítica - INCTBio, Campinas, SP, Brasil
| | - Mariana Rechia Bitencourt
- Unidade de Endocrinologia do Desenvolvimento, Laboratório de Hormônios e Genética Molecular LIM42, Disciplina de Endocrinologia, Faculdade de Medicina da Universidade de São Paulo (FMUSP), São Paulo, SP, Brasil
| | - Paulo Cesar Gonçalves Pereira
- Unidade de Endocrinologia do Desenvolvimento, Laboratório de Hormônios e Genética Molecular LIM42, Disciplina de Endocrinologia, Faculdade de Medicina da Universidade de São Paulo (FMUSP), São Paulo, SP, Brasil
| | - Emanuel Carrilho
- Instituto de Química de São Carlos da Universidade de São Paulo (IQSC-USP), São Carlos, SP, Brasil
- Instituto Nacional de Ciência e Tecnologia de Bioanalítica - INCTBio, Campinas, SP, Brasil
| | - Nilson Antônio de Assunção
- Escola Paulista de Medicina da Universidade Federal de São Paulo (EPM-UNIFESP), São Paulo, SP, Brasil
- Departamento de Química, Instituto de Ciências Ambientais, Químicas e Farmacêuticas, Universidade Federal de São Paulo, Diadema, SP, Brasil,
| | - Luciani Renata Silveira de Carvalho
- Departamento de Química, Instituto de Ciências Ambientais, Químicas e Farmacêuticas, Universidade Federal de São Paulo, Diadema, SP, Brasil,
| |
Collapse
|
17
|
Jia B, Chen Y, Wu J. Bibliometric Analysis and Research Trend Forecast of Healthy Urban Planning for 40 Years (1981-2020). INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18189444. [PMID: 34574368 PMCID: PMC8464861 DOI: 10.3390/ijerph18189444] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 08/26/2021] [Accepted: 09/01/2021] [Indexed: 11/25/2022]
Abstract
The history of healthy city planning can be traced back to the beginning of the 19th century. Since the industrialization period, the harsh living conditions of cities and the outbreak of infectious diseases have promoted the coordinated development of urban planning and public health, and people have gradually realized the importance of urban design and planning to the health of residents. After searching keywords related to health city and urban planning, and excluding repeated, non-English, and unrelated papers, this work retrieved 2582 documents as the basic data (timespan is 1 January 1981–31 December 2020, retrieval time is 28 January 2021). Additionally, CiteSpace was used to analyze document co-citation, cooperation network, and topic co-occurrence. Subsequently, random forest algorithm was used to predict the probability of citation. Overall, this work found that the hot spots of healthy urban planning are physical activity, green space, urban green space, and mental health. It also shows the diversification of themes and the development trend of cross-fields in the field of healthy urban planning. In addition, the article found that two factors, namely, the average number of citations of the first author and whether the article belongs to the field of environmental research, have a great impact on the number of citations of the article. This work is of practical significance to relevant practitioners and researchers, because it provides guidance for hot topics and future research directions in the field of healthy urban planning.
Collapse
|
18
|
Reel PS, Reel S, Pearson E, Trucco E, Jefferson E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 2021; 49:107739. [PMID: 33794304 DOI: 10.1016/j.biotechadv.2021.107739] [Citation(s) in RCA: 265] [Impact Index Per Article: 88.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/01/2021] [Accepted: 03/25/2021] [Indexed: 02/06/2023]
Abstract
With the development of modern high-throughput omic measurement platforms, it has become essential for biomedical studies to undertake an integrative (combined) approach to fully utilise these data to gain insights into biological systems. Data from various omics sources such as genetics, proteomics, and metabolomics can be integrated to unravel the intricate working of systems biology using machine learning-based predictive algorithms. Machine learning methods offer novel techniques to integrate and analyse the various omics data enabling the discovery of new biomarkers. These biomarkers have the potential to help in accurate disease prediction, patient stratification and delivery of precision medicine. This review paper explores different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease. It provides insight and recommendations for interdisciplinary professionals who envisage employing machine learning skills in multi-omics studies.
Collapse
Affiliation(s)
- Parminder S Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Smarti Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Ewan Pearson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Emanuele Trucco
- VAMPIRE project, Computing, School of Science and Engineering, University of Dundee, Dundee, United Kingdom
| | - Emily Jefferson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom.
| |
Collapse
|
19
|
Pazhamala LT, Kudapa H, Weckwerth W, Millar AH, Varshney RK. Systems biology for crop improvement. THE PLANT GENOME 2021; 14:e20098. [PMID: 33949787 DOI: 10.1002/tpg2.20098] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 03/09/2021] [Indexed: 05/19/2023]
Abstract
In recent years, generation of large-scale data from genome, transcriptome, proteome, metabolome, epigenome, and others, has become routine in several plant species. Most of these datasets in different crop species, however, were studied independently and as a result, full insight could not be gained on the molecular basis of complex traits and biological networks. A systems biology approach involving integration of multiple omics data, modeling, and prediction of the cellular functions is required to understand the flow of biological information that underlies complex traits. In this context, systems biology with multiomics data integration is crucial and allows a holistic understanding of the dynamic system with the different levels of biological organization interacting with external environment for a phenotypic expression. Here, we present recent progress made in the area of various omics studies-integrative and systems biology approaches with a special focus on application to crop improvement. We have also discussed the challenges and opportunities in multiomics data integration, modeling, and understanding of the biology of complex traits underpinning yield and stress tolerance in major cereals and legumes.
Collapse
Affiliation(s)
- Lekha T Pazhamala
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
| | - Himabindu Kudapa
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
| | - Wolfram Weckwerth
- Department of Ecogenomics and Systems Biology, University of Vienna, Vienna, Austria
- Vienna Metabolomics Center, University of Vienna, Vienna, Austria
| | - A Harvey Millar
- ARC Centre of Excellence in Plant Energy Biology and School of Molecular Sciences, The University of Western Australia, Perth, WA, Australia
| | - Rajeev K Varshney
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324, India
- State Agricultural Biotechnology Centre, Crop Research Innovation Centre, Food Futures Institute, Murdoch University, Murdoch, WA, Australia
| |
Collapse
|
20
|
Kim DY, Kim JM. Multi-omics integration strategies for animal epigenetic studies - A review. Anim Biosci 2021; 34:1271-1282. [PMID: 33902167 PMCID: PMC8255897 DOI: 10.5713/ab.21.0042] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 04/21/2021] [Indexed: 12/15/2022] Open
Abstract
Genome-wide studies provide considerable insights into the genetic background of animals; however, the inheritance of several heritable factors cannot be elucidated. Epigenetics explains these heritabilities, including those of genes influenced by environmental factors. Knowledge of the mechanisms underlying epigenetics enables understanding the processes of gene regulation through interactions with the environment. Recently developed next-generation sequencing (NGS) technologies help understand the interactional changes in epigenetic mechanisms. There are large sets of NGS data available; however, the integrative data analysis approaches still have limitations with regard to reliably interpreting the epigenetic changes. This review focuses on the epigenetic mechanisms and profiling methods and multi-omics integration methods that can provide comprehensive biological insights in animal genetic studies.
Collapse
Affiliation(s)
- Do-Young Kim
- Department of Animal Science and Technology, Chung-Ang University, Anseong, Gyeonggi 17546, Korea
| | - Jun-Mo Kim
- Department of Animal Science and Technology, Chung-Ang University, Anseong, Gyeonggi 17546, Korea
| |
Collapse
|
21
|
Singh G, Papoutsoglou EA, Keijts-Lalleman F, Vencheva B, Rice M, Visser RG, Bachem CW, Finkers R. Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait. BMC PLANT BIOLOGY 2021; 21:198. [PMID: 33894758 PMCID: PMC8070292 DOI: 10.1186/s12870-021-02943-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 03/29/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employed to support humans by distilling relevant information from large corpora of free text and structuring it in a way that lends itself to further computational analyses. For this pilot, we developed a pipeline that uses NLP on biological literature to produce knowledge networks. We focused on the flesh color of potato, a well-studied trait with known associations, and we investigated whether these knowledge networks can assist us in formulating new hypotheses on the underlying biological processes. RESULTS We trained an NLP model based on a manually annotated corpus of 34 full-text potato articles, to recognize relevant biological entities and relationships between them in text (genes, proteins, metabolites and traits). This model detected the number of biological entities with a precision of 97.65% and a recall of 88.91% on the training set. We conducted a time series analysis on 4023 PubMed abstract of plant genetics-based articles which focus on 4 major Solanaceous crops (tomato, potato, eggplant and capsicum), to determine that the networks contained both previously known and contemporaneously unknown leads to subsequently discovered biological phenomena relating to flesh color. A novel time-based analysis of these networks indicates a connection between our trait and a candidate gene (zeaxanthin epoxidase) already two years prior to explicit statements of that connection in the literature. CONCLUSIONS Our time-based analysis indicates that network-assisted hypothesis generation shows promise for knowledge discovery, data integration and hypothesis generation in scientific research.
Collapse
Affiliation(s)
- Gurnoor Singh
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ The Netherlands
| | | | | | | | - Mark Rice
- IBM Netherlands, Amsterdam, The Netherlands
| | - Richard G.F. Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ The Netherlands
| | - Christian W.B. Bachem
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ The Netherlands
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ The Netherlands
| |
Collapse
|
22
|
Xu Y, Zhao Y, Wang X, Ma Y, Li P, Yang Z, Zhang X, Xu C, Xu S. Incorporation of parental phenotypic data into multi-omic models improves prediction of yield-related traits in hybrid rice. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:261-272. [PMID: 32738177 PMCID: PMC7868986 DOI: 10.1111/pbi.13458] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 06/14/2020] [Accepted: 07/22/2020] [Indexed: 05/15/2023]
Abstract
Hybrid breeding has been shown to effectively increase rice productivity. However, identifying desirable hybrids out of numerous potential combinations is a daunting challenge. Genomic selection holds great promise for accelerating hybrid breeding by enabling early selection before phenotypes are measured. With the recent advances in multi-omic technologies, hybrid prediction based on transcriptomic and metabolomic data has received increasing attention. However, the current omic-based hybrid prediction has ignored parental phenotypic information, which is of fundamental importance in plant breeding. In this study, we integrated parental phenotypic information into various multi-omic prediction models applied in hybrid breeding of rice and compared the predictabilities of 15 combinations from four sets of predictors from the parents, that is genome, transcriptome, metabolome and phenome. The predictability for each combination was evaluated using the best linear unbiased prediction and a modified fast HAT method. We found significant interactions between predictors and traits in predictability, but joint prediction with various combinations of the predictors significantly improved predictability relative to prediction of any single source omic data for each trait investigated. Incorporation of parental phenotypic data into various omic predictors increased the predictability, averagely by 13.6%, 54.5%, 19.9% and 8.3%, for grain yield, number of tillers per plant, number of grains per panicle and 1000 grain weight, respectively. Among nine models of incorporating parental traits, the AD-All model was the most effective one. This novel strategy of incorporating parental phenotypic data into multi-omic prediction is expected to improve hybrid breeding progress, especially with the development of high-throughput phenotyping technologies.
Collapse
Affiliation(s)
- Yang Xu
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Yue Zhao
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Xin Wang
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Ying Ma
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Pengcheng Li
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Zefeng Yang
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Xuecai Zhang
- International Maize and Wheat Improvement Center (CIMMYT)MexicoDFMexico
| | - Chenwu Xu
- Jiangsu Key Laboratory of Crop Genetics and PhysiologyKey Laboratory of Plant Functional Genomics of Ministry of EducationJiangsu Key Laboratory of Crop Genomics and Molecular BreedingCo‐Innovation Center for Modern Production Technology of Grain CropsAgricultural College of Yangzhou UniversityYangzhouChina
| | - Shizhong Xu
- Department of Botany and Plant SciencesUniversity of CaliforniaRiversideCAUSA
| |
Collapse
|
23
|
Proteome-wide Systems Genetics to Identify Functional Regulators of Complex Traits. Cell Syst 2021; 12:5-22. [PMID: 33476553 DOI: 10.1016/j.cels.2020.10.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/15/2020] [Accepted: 10/07/2020] [Indexed: 02/08/2023]
Abstract
Proteomic technologies now enable the rapid quantification of thousands of proteins across genetically diverse samples. Integration of these data with systems-genetics analyses is a powerful approach to identify new regulators of economically important or disease-relevant phenotypes in various populations. In this review, we summarize the latest proteomic technologies and discuss technical challenges for their use in population studies. We demonstrate how the analysis of correlation structure and loci mapping can be used to identify genetic factors regulating functional protein networks and complex traits. Finally, we provide an extensive summary of the use of proteome-wide systems genetics throughout fungi, plant, and animal kingdoms and discuss the power of this approach to identify candidate regulators and drug targets in large human consortium studies.
Collapse
|
24
|
Acharjee A, Larkman J, Xu Y, Cardoso VR, Gkoutos GV. A random forest based biomarker discovery and power analysis framework for diagnostics research. BMC Med Genomics 2020; 13:178. [PMID: 33228632 PMCID: PMC7685541 DOI: 10.1186/s12920-020-00826-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 11/15/2020] [Indexed: 11/25/2022] Open
Abstract
Background Biomarker identification is one of the major and important goal of functional genomics and translational medicine studies. Large scale –omics data are increasingly being accumulated and can provide vital means for the identification of biomarkers for the early diagnosis of complex disease and/or for advanced patient/diseases stratification. These tasks are clearly interlinked, and it is essential that an unbiased and stable methodology is applied in order to address them. Although, recently, many, primarily machine learning based, biomarker identification approaches have been developed, the exploration of potential associations between biomarker identification and the design of future experiments remains a challenge. Methods In this study, using both simulated and published experimentally derived datasets, we assessed the performance of several state-of-the-art Random Forest (RF) based decision approaches, namely the Boruta method, the permutation based feature selection without correction method, the permutation based feature selection with correction method, and the backward elimination based feature selection method. Moreover, we conducted a power analysis to estimate the number of samples required for potential future studies. Results We present a number of different RF based stable feature selection methods and compare their performances using simulated, as well as published, experimentally derived, datasets. Across all of the scenarios considered, we found the Boruta method to be the most stable methodology, whilst the Permutation (Raw) approach offered the largest number of relevant features, when allowed to stabilise over a number of iterations. Finally, we developed and made available a web interface (https://joelarkman.shinyapps.io/PowerTools/) to streamline power calculations thereby aiding the design of potential future studies within a translational medicine context. Conclusions We developed a RF-based biomarker discovery framework and provide a web interface for our framework, termed PowerTools, that caters the design of appropriate and cost-effective subsequent future omics study.
Collapse
Affiliation(s)
- Animesh Acharjee
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK. .,Institute of Translational Medicine, University Hospitals Birmingham NHS, Foundation Trust, Birmingham, B15 2TT, UK. .,NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospital Birmingham, Birmingham, B15 2WB, UK.
| | - Joseph Larkman
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham NHS, Foundation Trust, Birmingham, B15 2TT, UK
| | - Yuanwei Xu
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham NHS, Foundation Trust, Birmingham, B15 2TT, UK
| | - Victor Roth Cardoso
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham NHS, Foundation Trust, Birmingham, B15 2TT, UK.,MRC Health Data Research UK (HDR UK), London, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT, UK.,Institute of Translational Medicine, University Hospitals Birmingham NHS, Foundation Trust, Birmingham, B15 2TT, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, University Hospital Birmingham, Birmingham, B15 2WB, UK.,MRC Health Data Research UK (HDR UK), London, UK.,NIHR Experimental Cancer Medicine Centre, Birmingham, B15 2TT, UK.,NIHR Biomedical Research Centre, University Hospital Birmingham, Birmingham, B15 2TT, UK
| |
Collapse
|
25
|
Statistical and Machine-Learning Analyses in Nutritional Genomics Studies. Nutrients 2020; 12:nu12103140. [PMID: 33066636 PMCID: PMC7602401 DOI: 10.3390/nu12103140] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 10/08/2020] [Accepted: 10/10/2020] [Indexed: 12/18/2022] Open
Abstract
Nutritional compounds may have an influence on different OMICs levels, including genomics, epigenomics, transcriptomics, proteomics, metabolomics, and metagenomics. The integration of OMICs data is challenging but may provide new knowledge to explain the mechanisms involved in the metabolism of nutrients and diseases. Traditional statistical analyses play an important role in description and data association; however, these statistical procedures are not sufficiently enough powered to interpret the large integrated multiple OMICs (multi-OMICS) datasets. Machine learning (ML) approaches can play a major role in the interpretation of multi-OMICS in nutrition research. Specifically, ML can be used for data mining, sample clustering, and classification to produce predictive models and algorithms for integration of multi-OMICs in response to dietary intake. The objective of this review was to investigate the strategies used for the analysis of multi-OMICs data in nutrition studies. Sixteen recent studies aimed to understand the association between dietary intake and multi-OMICs data are summarized. Multivariate analysis in multi-OMICs nutrition studies is used more commonly for analyses. Overall, as nutrition research incorporated multi-OMICs data, the use of novel approaches of analysis such as ML needs to complement the traditional statistical analyses to fully explain the impact of nutrition on health and disease.
Collapse
|
26
|
Balmant KM, Noble JD, C Alves F, Dervinis C, Conde D, Schmidt HW, Vazquez AI, Barbazuk WB, Campos GDL, Resende MFR, Kirst M. Xylem systems genetics analysis reveals a key regulator of lignin biosynthesis in Populus deltoides. Genome Res 2020; 30:1131-1143. [PMID: 32817237 PMCID: PMC7462072 DOI: 10.1101/gr.261438.120] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Accepted: 07/13/2020] [Indexed: 02/01/2023]
Abstract
Despite the growing resources and tools for high-throughput characterization and analysis of genomic information, the discovery of the genetic elements that regulate complex traits remains a challenge. Systems genetics is an emerging field that aims to understand the flow of biological information that underlies complex traits from genotype to phenotype. In this study, we used a systems genetics approach to identify and evaluate regulators of the lignin biosynthesis pathway in Populus deltoides by combining genome, transcriptome, and phenotype data from a population of 268 unrelated individuals of P. deltoides The discovery of lignin regulators began with the quantitative genetic analysis of the xylem transcriptome and resulted in the detection of 6706 and 4628 significant local- and distant-eQTL associations, respectively. Among the locally regulated genes, we identified the R2R3-MYB transcription factor MYB125 (Potri.003G114100) as a putative trans-regulator of the majority of genes in the lignin biosynthesis pathway. The expression of MYB125 in a diverse population positively correlated with lignin content. Furthermore, overexpression of MYB125 in transgenic poplar resulted in increased lignin content, as well as altered expression of genes in the lignin biosynthesis pathway. Altogether, our findings indicate that MYB125 is involved in the control of a transcriptional coexpression network of lignin biosynthesis genes during secondary cell wall formation in P. deltoides.
Collapse
Affiliation(s)
- Kelly M Balmant
- School of Forest Resources and Conservation, University of Florida, Gainesville, Florida 32611, USA
| | - Jerald D Noble
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, Florida 32611, USA
| | - Filipe C Alves
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824, USA
| | - Christopher Dervinis
- School of Forest Resources and Conservation, University of Florida, Gainesville, Florida 32611, USA
| | - Daniel Conde
- School of Forest Resources and Conservation, University of Florida, Gainesville, Florida 32611, USA
| | - Henry W Schmidt
- School of Forest Resources and Conservation, University of Florida, Gainesville, Florida 32611, USA
| | - Ana I Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824, USA
| | - William B Barbazuk
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, Florida 32611, USA
- Department of Biology, University of Florida, Gainesville, Florida 32611, USA
- Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | - Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824, USA
- Statistics Department, Michigan State University, East Lansing, Michigan 48824, USA
| | - Marcio F R Resende
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, Florida 32611, USA
- Horticulture Sciences Department, University of Florida, Gainesville, Florida 32611, USA
| | - Matias Kirst
- School of Forest Resources and Conservation, University of Florida, Gainesville, Florida 32611, USA
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, Florida 32611, USA
- Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| |
Collapse
|
27
|
Shi WJ, Zhuang Y, Russell PH, Hobbs BD, Parker MM, Castaldi PJ, Rudra P, Vestal B, Hersh CP, Saba LM, Kechris K. Unsupervised discovery of phenotype-specific multi-omics networks. Bioinformatics 2020; 35:4336-4343. [PMID: 30957844 DOI: 10.1093/bioinformatics/btz226] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 02/01/2019] [Accepted: 04/05/2019] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Complex diseases often involve a wide spectrum of phenotypic traits. Better understanding of the biological mechanisms relevant to each trait promotes understanding of the etiology of the disease and the potential for targeted and effective treatment plans. There have been many efforts towards omics data integration and network reconstruction, but limited work has examined the incorporation of relevant (quantitative) phenotypic traits. RESULTS We propose a novel technique, sparse multiple canonical correlation network analysis (SmCCNet), for integrating multiple omics data types along with a quantitative phenotype of interest, and for constructing multi-omics networks that are specific to the phenotype. As a case study, we focus on miRNA-mRNA networks. Through simulations, we demonstrate that SmCCNet has better overall prediction performance compared to popular gene expression network construction and integration approaches under realistic settings. Applying SmCCNet to studies on chronic obstructive pulmonary disease (COPD) and breast cancer, we found enrichment of known relevant pathways (e.g. the Cadherin pathway for COPD and the interferon-gamma signaling pathway for breast cancer) as well as less known omics features that may be important to the diseases. Although those applications focus on miRNA-mRNA co-expression networks, SmCCNet is applicable to a variety of omics and other data types. It can also be easily generalized to incorporate multiple quantitative phenotype simultaneously. The versatility of SmCCNet suggests great potential of the approach in many areas. AVAILABILITY AND IMPLEMENTATION The SmCCNet algorithm is written in R, and is freely available on the web at https://cran.r-project.org/web/packages/SmCCNet/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- W Jenny Shi
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Yonghua Zhuang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Pamela H Russell
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Brian D Hobbs
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Margaret M Parker
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Peter J Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Pratyaydipta Rudra
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.,Department of Statistics, Oklahoma State University, Stillwater, OK
| | - Brian Vestal
- Center for Genes, Environment & Health, National Jewish Health, Denver, CO, USA
| | - Craig P Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Laura M Saba
- Department of Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
28
|
Jamil IN, Remali J, Azizan KA, Nor Muhammad NA, Arita M, Goh HH, Aizat WM. Systematic Multi-Omics Integration (MOI) Approach in Plant Systems Biology. FRONTIERS IN PLANT SCIENCE 2020; 11:944. [PMID: 32754171 PMCID: PMC7371031 DOI: 10.3389/fpls.2020.00944] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/10/2020] [Indexed: 05/03/2023]
Abstract
Across all facets of biology, the rapid progress in high-throughput data generation has enabled us to perform multi-omics systems biology research. Transcriptomics, proteomics, and metabolomics data can answer targeted biological questions regarding the expression of transcripts, proteins, and metabolites, independently, but a systematic multi-omics integration (MOI) can comprehensively assimilate, annotate, and model these large data sets. Previous MOI studies and reviews have detailed its usage and practicality on various organisms including human, animals, microbes, and plants. Plants are especially challenging due to large poorly annotated genomes, multi-organelles, and diverse secondary metabolites. Hence, constructive and methodological guidelines on how to perform MOI for plants are needed, particularly for researchers newly embarking on this topic. In this review, we thoroughly classify multi-omics studies on plants and verify workflows to ensure successful omics integration with accurate data representation. We also propose three levels of MOI, namely element-based (level 1), pathway-based (level 2), and mathematical-based integration (level 3). These MOI levels are described in relation to recent publications and tools, to highlight their practicality and function. The drawbacks and limitations of these MOI are also discussed for future improvement toward more amenable strategies in plant systems biology.
Collapse
Affiliation(s)
- Ili Nadhirah Jamil
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Juwairiah Remali
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Kamalrul Azlan Azizan
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Masanori Arita
- Bioinformation & DDBJ Center, National Institute of Genetics (NIG), Mishima, Japan
- Metabolome Informatics Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Hoe-Han Goh
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Wan Mohd Aizat
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| |
Collapse
|
29
|
Moreira FF, Oliveira HR, Volenec JJ, Rainey KM, Brito LF. Integrating High-Throughput Phenotyping and Statistical Genomic Methods to Genetically Improve Longitudinal Traits in Crops. FRONTIERS IN PLANT SCIENCE 2020; 11:681. [PMID: 32528513 PMCID: PMC7264266 DOI: 10.3389/fpls.2020.00681] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 04/30/2020] [Indexed: 05/28/2023]
Abstract
The rapid development of remote sensing in agronomic research allows the dynamic nature of longitudinal traits to be adequately described, which may enhance the genetic improvement of crop efficiency. For traits such as light interception, biomass accumulation, and responses to stressors, the data generated by the various high-throughput phenotyping (HTP) methods requires adequate statistical techniques to evaluate phenotypic records throughout time. As a consequence, information about plant functioning and activation of genes, as well as the interaction of gene networks at different stages of plant development and in response to environmental stimulus can be exploited. In this review, we outline the current analytical approaches in quantitative genetics that are applied to longitudinal traits in crops throughout development, describe the advantages and pitfalls of each approach, and indicate future research directions and opportunities.
Collapse
Affiliation(s)
- Fabiana F. Moreira
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Hinayah R. Oliveira
- Department of Animal Sciences, Purdue University, West Lafayette, IN, United States
| | - Jeffrey J. Volenec
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Katy M. Rainey
- Department of Agronomy, Purdue University, West Lafayette, IN, United States
| | - Luiz F. Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
30
|
Eicher T, Kinnebrew G, Patt A, Spencer K, Ying K, Ma Q, Machiraju R, Mathé EA. Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources. Metabolites 2020; 10:E202. [PMID: 32429287 PMCID: PMC7281435 DOI: 10.3390/metabo10050202] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 05/07/2020] [Accepted: 05/13/2020] [Indexed: 02/06/2023] Open
Abstract
As researchers are increasingly able to collect data on a large scale from multiple clinical and omics modalities, multi-omics integration is becoming a critical component of metabolomics research. This introduces a need for increased understanding by the metabolomics researcher of computational and statistical analysis methods relevant to multi-omics studies. In this review, we discuss common types of analyses performed in multi-omics studies and the computational and statistical methods that can be used for each type of analysis. We pinpoint the caveats and considerations for analysis methods, including required parameters, sample size and data distribution requirements, sources of a priori knowledge, and techniques for the evaluation of model accuracy. Finally, for the types of analyses discussed, we provide examples of the applications of corresponding methods to clinical and basic research. We intend that our review may be used as a guide for metabolomics researchers to choose effective techniques for multi-omics analyses relevant to their field of study.
Collapse
Affiliation(s)
- Tara Eicher
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
| | - Garrett Kinnebrew
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Bioinformatics Shared Resource Group, The Ohio State University, Columbus, OH 43210, USA
| | - Andrew Patt
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
| | - Kyle Spencer
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Biomedical Sciences Graduate Program, The Ohio State University, Columbus, OH 43210, USA
- Nationwide Children’s Research Hospital, Columbus, OH 43210, USA
| | - Kevin Ying
- Comprehensive Cancer Center, The Ohio State University and James Cancer Hospital, Columbus, OH 43210, USA;
- Molecular, Cellular and Developmental Biology Program, The Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
| | - Raghu Machiraju
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Computer Science and Engineering Department, The Ohio State University College of Engineering, Columbus, OH 43210, USA
- Department of Pathology, Wexner Medical Center, The Ohio State University, Columbus, OH 43210, USA
- Translational Data Analytics Institute, The Ohio State University, Columbus, OH 43210, USA
| | - Ewy A. Mathé
- Biomedical Informatics Department, The Ohio State University College of Medicine, Columbus, OH 43210, USA; (T.E.); (G.K.); (K.S.); (Q.M.); (R.M.)
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, NIH, 9800 Medical Center Dr., Rockville, MD, 20892, USA;
| |
Collapse
|
31
|
Zhang X, Yang S, Srivastava G, Chen MY, Cheng X. Hybridization of cognitive computing for food services. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.106051] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
32
|
Clinical-learning versus machine-learning for transdiagnostic prediction of psychosis onset in individuals at-risk. Transl Psychiatry 2019; 9:259. [PMID: 31624229 PMCID: PMC6797779 DOI: 10.1038/s41398-019-0600-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 05/03/2019] [Accepted: 05/31/2019] [Indexed: 02/08/2023] Open
Abstract
Predicting the onset of psychosis in individuals at-risk is based on robust prognostic model building methods including a priori clinical knowledge (also termed clinical-learning) to preselect predictors or machine-learning methods to select predictors automatically. To date, there is no empirical research comparing the prognostic accuracy of these two methods for the prediction of psychosis onset. In a first experiment, no improved performance was observed when machine-learning methods (LASSO and RIDGE) were applied-using the same predictors-to an individualised, transdiagnostic, clinically based, risk calculator previously developed on the basis of clinical-learning (predictors: age, gender, age by gender, ethnicity, ICD-10 diagnostic spectrum), and externally validated twice. In a second experiment, two refined versions of the published model which expanded the granularity of the ICD-10 diagnosis were introduced: ICD-10 diagnostic categories and ICD-10 diagnostic subdivisions. Although these refined versions showed an increase in apparent performance, their external performance was similar to the original model. In a third experiment, the three refined models were analysed under machine-learning and clinical-learning with a variable event per variable ratio (EPV). The best performing model under low EPVs was obtained through machine-learning approaches. The development of prognostic models on the basis of a priori clinical knowledge, large samples and adequate events per variable is a robust clinical prediction method to forecast psychosis onset in patients at-risk, and is comparable to machine-learning methods, which are more difficult to interpret and implement. Machine-learning methods should be preferred for high dimensional data when no a priori knowledge is available.
Collapse
|
33
|
Zhuang YY, Liu HJ, Song X, Ju Y, Peng H. A Linear Regression Predictor for Identifying N 6-Methyladenosine Sites Using Frequent Gapped K-mer Pattern. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:673-680. [PMID: 31707204 PMCID: PMC6849367 DOI: 10.1016/j.omtn.2019.10.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 08/19/2019] [Accepted: 10/03/2019] [Indexed: 01/07/2023]
Abstract
N6-methyladenosine (m6A) is one of the most common and abundant modifications in RNA, which is related to many biological processes in humans. Abnormal RNA modifications are often associated with a series of diseases, including tumors, neurogenic diseases, and embryonic retardation. Therefore, identifying m6A sites is of paramount importance in the post-genomic age. Although many lab-based methods have been proposed to annotate m6A sites, they are time consuming and cost ineffective. In view of the drawbacks of the intrinsic methods in RNA sequence recognition, computational methods are suggested as a supplement to identify m6A sites. In this study, we develop a novel feature extraction algorithm based on the frequent gapped k-mer pattern (FGKP) and apply the linear regression to construct the prediction model. The new predictor is used to identify m6A sites in the Saccharomyces cerevisiae database. It has been shown by the 10-fold cross-validation that the performance is better than that of recent methods. Comparative results indicate that our model has great potential to become a useful and effective tool for genome analysis and gain more insights for locating m6A sites.
Collapse
Affiliation(s)
- Y Y Zhuang
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - H J Liu
- College of Information Technology and Computer Science, University of the Cordilleras, Baguio 2600, Philippines
| | - X Song
- School of Computer and Information Technology, Nanyang Normal University, Nanyang 473000, China.
| | - Y Ju
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - H Peng
- School of Informatics, Xiamen University, Xiamen 361005, China
| |
Collapse
|
34
|
Krautenbacher N, Flach N, Böck A, Laubhahn K, Laimighofer M, Theis FJ, Ankerst DP, Fuchs C, Schaub B. A strategy for high-dimensional multivariable analysis classifies childhood asthma phenotypes from genetic, immunological, and environmental factors. Allergy 2019; 74:1364-1373. [PMID: 30737985 PMCID: PMC6767756 DOI: 10.1111/all.13745] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Revised: 12/22/2018] [Accepted: 01/06/2019] [Indexed: 12/14/2022]
Abstract
Background Associations between childhood asthma phenotypes and genetic, immunological, and environmental factors have been previously established. Yet, strategies to integrate high‐dimensional risk factors from multiple distinct data sets, and thereby increase the statistical power of analyses, have been hampered by a preponderance of missing data and lack of methods to accommodate them. Methods We assembled questionnaire, diagnostic, genotype, microarray, RT‐qPCR, flow cytometry, and cytokine data (referred to as data modalities) to use as input factors for a classifier that could distinguish healthy children, mild‐to‐moderate allergic asthmatics, and nonallergic asthmatics. Based on data from 260 German children aged 4‐14 from our university outpatient clinic, we built a novel multilevel prediction approach for asthma outcome which could deal with a present complex missing data structure. Results The optimal learning method was boosting based on all data sets, achieving an area underneath the receiver operating characteristic curve (AUC) for three classes of phenotypes of 0.81 (95%‐confidence interval (CI): 0.65‐0.94) using leave‐one‐out cross‐validation. Besides improving the AUC, our integrative multilevel learning approach led to tighter CIs than using smaller complete predictor data sets (AUC = 0.82 [0.66‐0.94] for boosting). The most important variables for classifying childhood asthma phenotypes comprised novel identified genes, namely PKN2 (protein kinase N2), PTK2 (protein tyrosine kinase 2), and ALPP (alkaline phosphatase, placental). Conclusion Our combination of several data modalities using a novel strategy improved classification of childhood asthma phenotypes but requires validation in external populations. The generic approach is applicable to other multilevel data‐based risk prediction settings, which typically suffer from incomplete data.
Collapse
Affiliation(s)
- Norbert Krautenbacher
- Institute of Computational Biology Helmholtz Zentrum München German Research Center for Environmental Health GmbH Neuherberg Germany
- Technische Universität München Center for Mathematics Chair of Mathematical Modeling of Biological Systems Garching Germany
| | - Nicolai Flach
- Institute of Computational Biology Helmholtz Zentrum München German Research Center for Environmental Health GmbH Neuherberg Germany
- Technische Universität München Center for Mathematics Chair of Mathematical Modeling of Biological Systems Garching Germany
| | - Andreas Böck
- Department of Pulmonary and Allergy Dr. von Hauner Children's Hospital LMU Munich Germany
| | - Kristina Laubhahn
- Department of Pulmonary and Allergy Dr. von Hauner Children's Hospital LMU Munich Germany
- Member of German Lung Centre (DZL) CPC Munich Germany
| | - Michael Laimighofer
- Institute of Computational Biology Helmholtz Zentrum München German Research Center for Environmental Health GmbH Neuherberg Germany
- Technische Universität München Center for Mathematics Chair of Mathematical Modeling of Biological Systems Garching Germany
| | - Fabian J. Theis
- Institute of Computational Biology Helmholtz Zentrum München German Research Center for Environmental Health GmbH Neuherberg Germany
- Technische Universität München Center for Mathematics Chair of Mathematical Modeling of Biological Systems Garching Germany
| | - Donna P. Ankerst
- Technische Universität München Center for Mathematics Chair of Mathematical Modeling of Biological Systems Garching Germany
- University of Texas Health Science Center at San Antonio San Antonio Texas
| | - Christiane Fuchs
- Institute of Computational Biology Helmholtz Zentrum München German Research Center for Environmental Health GmbH Neuherberg Germany
- Technische Universität München Center for Mathematics Chair of Mathematical Modeling of Biological Systems Garching Germany
- Faculty of Business Administration and Economics Bielefeld University Bielefeld Germany
| | - Bianca Schaub
- Department of Pulmonary and Allergy Dr. von Hauner Children's Hospital LMU Munich Germany
- Member of German Lung Centre (DZL) CPC Munich Germany
| |
Collapse
|
35
|
Ajjolli Nagaraja A, Fontaine N, Delsaut M, Charton P, Damour C, Offmann B, Grondin-Perez B, Cadet F. Flux prediction using artificial neural network (ANN) for the upper part of glycolysis. PLoS One 2019; 14:e0216178. [PMID: 31067238 PMCID: PMC6505829 DOI: 10.1371/journal.pone.0216178] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 04/15/2019] [Indexed: 01/08/2023] Open
Abstract
The selection of optimal enzyme concentration in multienzyme cascade reactions for the highest product yield in practice is very expensive and time-consuming process. The modelling of biological pathways is a difficult process because of the complexity of the system. The mathematical modelling of the system using an analytical approach depends on the many parameters of enzymes which rely on tedious and expensive experiments. The artificial neural network (ANN) method has been successively applied in different fields of science to perform complex functions. In this study, ANN models were trained to predict the flux for the upper part of glycolysis as inferred by NADH consumption, using four enzyme concentrations i.e., phosphoglucoisomerase, phosphofructokinase, fructose-bisphosphate-aldolase, triose-phosphate-isomerase. Out of three ANN algorithms, the neuralnet package with two activation functions, “logistic” and “tanh” were implemented. The prediction of the flux was very efficient: RMSE and R2 were 0.847, 0.93 and 0.804, 0.94 respectively for logistic and tanh functions using a cross validation procedure. This study showed that a systemic approach such as ANN could be used for accurate prediction of the flux through the metabolic pathway. This could help to save a lot of time and costs, particularly from an industrial perspective. The R-code is available at: https://github.com/DSIMB/ANN-Glycolysis-Flux-Prediction.
Collapse
Affiliation(s)
- Anamya Ajjolli Nagaraja
- LE2P, Laboratory of Energy, Electronics and Processes EA 4079, Faculty of Sciences and Technology, University of La Reunion, France
| | | | - Mathieu Delsaut
- LE2P, Laboratory of Energy, Electronics and Processes EA 4079, Faculty of Sciences and Technology, University of La Reunion, France
| | - Philippe Charton
- DSIMB, INSERM, UMR S-1134, Laboratory of ExcellenceLABEX GR, Faculty of Sciences and Technology, University of La Reunion & University Paris Diderot, Paris, France
| | - Cedric Damour
- LE2P, Laboratory of Energy, Electronics and Processes EA 4079, Faculty of Sciences and Technology, University of La Reunion, France
| | - Bernard Offmann
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, chemin de la Houssinière, France
| | - Brigitte Grondin-Perez
- LE2P, Laboratory of Energy, Electronics and Processes EA 4079, Faculty of Sciences and Technology, University of La Reunion, France
| | - Frederic Cadet
- DSIMB, INSERM, UMR S-1134, Laboratory of ExcellenceLABEX GR, Faculty of Sciences and Technology, University of La Reunion & University Paris Diderot, Paris, France
- * E-mail:
| |
Collapse
|
36
|
Li Z, Gao N, Martini JWR, Simianer H. Integrating Gene Expression Data Into Genomic Prediction. Front Genet 2019; 10:126. [PMID: 30858865 PMCID: PMC6397893 DOI: 10.3389/fgene.2019.00126] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2018] [Accepted: 02/04/2019] [Indexed: 01/14/2023] Open
Abstract
Gene expression profiles potentially hold valuable information for the prediction of breeding values and phenotypes. In this study, the utility of transcriptome data for phenotype prediction was tested with 185 inbred lines of Drosophila melanogaster for nine traits in two sexes. We incorporated the transcriptome data into genomic prediction via two methods: GTBLUP and GRBLUP, both combining single nucleotide polymorphisms (SNPs) and transcriptome data. The genotypic data was used to construct the common additive genomic relationship, which was used in genomic best linear unbiased prediction (GBLUP) or jointly in a linear mixed model with a transcriptome-based linear kernel (GTBLUP), or with a transcriptome-based Gaussian kernel (GRBLUP). We studied the predictive ability of the models and discuss a concept of "omics-augmented broad sense heritability" for the multi-omics era. For most traits, GRBLUP and GBLUP provided similar predictive abilities, but GRBLUP explained more of the phenotypic variance. There was only one trait (olfactory perception to Ethyl Butyrate in females) in which the predictive ability of GRBLUP (0.23) was significantly higher than the predictive ability of GBLUP (0.21). Our results suggest that accounting for transcriptome data has the potential to improve genomic predictions if transcriptome data can be included on a larger scale.
Collapse
Affiliation(s)
- Zhengcao Li
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany
| | - Ning Gao
- State Key Laboratory of Biocontrol, Guangzhou Higher Education Mega Center, School of Life Science, Sun Yat-sen University, Guangzhou, China
| | | | - Henner Simianer
- Animal Breeding and Genetics Group, Department of Animal Sciences, Center for Integrated Breeding Research, University of Göttingen, Göttingen, Germany
| |
Collapse
|
37
|
Segal JP, Mullish BH, Quraishi MN, Acharjee A, Williams HRT, Iqbal T, Hart AL, Marchesi JR. The application of omics techniques to understand the role of the gut microbiota in inflammatory bowel disease. Therap Adv Gastroenterol 2019; 12:1756284818822250. [PMID: 30719076 PMCID: PMC6348496 DOI: 10.1177/1756284818822250] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Accepted: 11/23/2018] [Indexed: 02/04/2023] Open
Abstract
The aetiopathogenesis of inflammatory bowel diseases (IBD) involves the complex interaction between a patient's genetic predisposition, environment, gut microbiota and immune system. Currently, however, it is not known if the distinctive perturbations of the gut microbiota that appear to accompany both Crohn's disease and ulcerative colitis are the cause of, or the result of, the intestinal inflammation that characterizes IBD. With the utilization of novel systems biology technologies, we can now begin to understand not only details about compositional changes in the gut microbiota in IBD, but increasingly also the alterations in microbiota function that accompany these. Technologies such as metagenomics, metataxomics, metatranscriptomics, metaproteomics and metabonomics are therefore allowing us a deeper understanding of the role of the microbiota in IBD. Furthermore, the integration of these systems biology technologies through advancing computational and statistical techniques are beginning to understand the microbiome interactions that both contribute to health and diseased states in IBD. This review aims to explore how such systems biology technologies are advancing our understanding of the gut microbiota, and their potential role in delineating the aetiology, development and clinical care of IBD.
Collapse
Affiliation(s)
- Jonathan P. Segal
- Inflammatory Bowel Disease Department, St Mark’s Hospital, Harrow HA1 3UJ, UK
| | - Benjamin H. Mullish
- Division of Integrative Systems Medicine and Digestive Disease, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, UK
| | - Mohammed Nabil Quraishi
- Institute of Immunology and Immunotherapy, University of Birmingham, Department of Gastroenterology, University Hospital, Birmingham, UK
| | - Animesh Acharjee
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, UK
| | - Horace R. T. Williams
- Division of Integrative Systems Medicine and Digestive Disease, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, UK
| | - Tariq Iqbal
- Institute of Immunology and Immunotherapy, University of Birmingham, Department of Gastroenterology, University Hospital, Birmingham, UK
| | - Ailsa L. Hart
- Inflammatory Bowel Disease Department, St Mark’s Hospital, Harrow, UK
- Department of Surgery and Cancer, Division of Integrative Systems Medicine and Digestive Disease, Faculty of Medicine, Imperial College, London, UK
| | - Julian R. Marchesi
- Department of Surgery and Cancer, Division of Integrative Systems Medicine and Digestive Disease, Faculty of Medicine, Imperial College, London, UK
- School of Biosciences, Cardiff University, Cardiff, UK
| |
Collapse
|
38
|
Men H, Jiao Y, Shi Y, Gong F, Chen Y, Fang H, Liu J. Odor Fingerprint Analysis Using Feature Mining Method Based on Olfactory Sensory Evaluation. SENSORS (BASEL, SWITZERLAND) 2018; 18:E3387. [PMID: 30309029 PMCID: PMC6210366 DOI: 10.3390/s18103387] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Revised: 10/06/2018] [Accepted: 10/08/2018] [Indexed: 12/01/2022]
Abstract
In this paper, we aim to use odor fingerprint analysis to identify and detect various odors. We obtained the olfactory sensory evaluation of eight different brands of Chinese liquor by a lab-developed intelligent nose. From the respective combination of the time domain and frequency domain, we extract features to reflect the samples comprehensively. However, the extracted feature combined time domain and frequency domain will bring redundant information that affects performance. Therefore, we proposed data by Principal Component Analysis (PCA) and Variable Importance Projection (VIP) to delete redundant information to construct a more precise odor fingerprint. Then, Random Forest (RF) and Probabilistic Neural Network (PNN) were built based on the above. Results showed that the VIP-based models achieved better classification performance than PCA-based models. In addition, the peak performance (92.5%) of the VIP-RF model had a higher classification rate than the VIP-PNN model (90%). In conclusion, odor fingerprint analysis using a feature mining method based on the olfactory sensory evaluation can be applied to monitor product quality in the actual process of industrialization.
Collapse
Affiliation(s)
- Hong Men
- Advanced Sensor Technology Institute, College of Automation Engineering, Northeast Electric Power University, Jilin 132012, China.
| | - Yanan Jiao
- Advanced Sensor Technology Institute, College of Automation Engineering, Northeast Electric Power University, Jilin 132012, China.
| | - Yan Shi
- Advanced Sensor Technology Institute, College of Automation Engineering, Northeast Electric Power University, Jilin 132012, China.
| | - Furong Gong
- Advanced Sensor Technology Institute, College of Automation Engineering, Northeast Electric Power University, Jilin 132012, China.
| | - Yizhou Chen
- Department of Neurobiology and Behavior, University of California, Irvine, CA 92697, USA.
| | - Hairui Fang
- Advanced Sensor Technology Institute, College of Automation Engineering, Northeast Electric Power University, Jilin 132012, China.
| | - Jingjing Liu
- Advanced Sensor Technology Institute, College of Automation Engineering, Northeast Electric Power University, Jilin 132012, China.
| |
Collapse
|
39
|
Darst B, Engelman CD, Tian Y, Lorenzo Bermejo J. Data mining and machine learning approaches for the integration of genome-wide association and methylation data: methodology and main conclusions from GAW20. BMC Genet 2018; 19:76. [PMID: 30255774 PMCID: PMC6157271 DOI: 10.1186/s12863-018-0646-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Multiple layers of genetic and epigenetic variability are being simultaneously explored in an increasing number of health studies. We summarize here different approaches applied in the Data Mining and Machine Learning group at the GAW20 to integrate genome-wide genotype and methylation array data. RESULTS We provide a non-intimidating introduction to some frequently used methods to investigate high-dimensional molecular data and compare the different approaches tried by group members: random forest, deep learning, cluster analysis, mixed models, and gene-set enrichment analysis. Group contributions were quite heterogeneous regarding investigated data sets (real vs simulated), conducted data quality control and assessed phenotypes (eg, metabolic syndrome vs relative differences of log-transformed triglyceride concentrations before and after fenofibrate treatment). However, some common technical issues were detected, leading to practical recommendations. CONCLUSIONS Different sources of correlation were identified by group members, including population stratification, family structure, batch effects, linkage disequilibrium and correlation of methylation values at neighboring cytosine-phosphate-guanine (CpG) sites, and the majority of applied approaches were able to take into account identified correlation structures. The ability to efficiently deal with high-dimensional omics data, and the model free nature of the approaches that did not require detailed model specifications were clearly recognized as the main strengths of applied methods. A limitation of random forest is its sensitivity to highly correlated variables. The parameter setup and the interpretation of results from deep learning methods, in particular deep neural networks, can be extremely challenging. Cluster analysis and mixed models may need some predimension reduction based on existing literature, data filtering, and supplementary statistical methods, and gene-set enrichment analysis requires biological insight.
Collapse
Affiliation(s)
- Burcu Darst
- Department of Population Health Sciences, School of Medicine and Public Health, University of Wisconsin, 610 Walnut St. 1007 WARF, Madison, WI 53726 USA
| | - Corinne D. Engelman
- Department of Population Health Sciences, School of Medicine and Public Health, University of Wisconsin, 610 Walnut St. 1007 WARF, Madison, WI 53726 USA
| | - Ye Tian
- Department of Biochemistry and Medical Genetics, University of Manitoba, 745 Bannatyne Ave, Winnipeg, MB R3E 0J9 Canada
- Department of Electrical and Computer Engineering, University of Manitoba, 745 Bannatyne Ave, Winnipeg, MB R3E 0J9 Canada
| | - Justo Lorenzo Bermejo
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany
| |
Collapse
|
40
|
Berlin R, Gruen R, Best J. Systems Medicine Disease: Disease Classification and Scalability Beyond Networks and Boundary Conditions. Front Bioeng Biotechnol 2018; 6:112. [PMID: 30131956 PMCID: PMC6090066 DOI: 10.3389/fbioe.2018.00112] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 07/18/2018] [Indexed: 12/26/2022] Open
Abstract
In order to accommodate the forthcoming wealth of health and disease related information, from genome to body sensors to population and the environment, the approach to disease description and definition demands re-examination. Traditional classification methods remain trapped by history; to provide the descriptive features that are required for a comprehensive description of disease, systems science, which realizes dynamic processes, adaptive response, and asynchronous communication channels, must be applied (Wolkenhauer et al., 2013). When Disease is viewed beyond the thresholds of lines and threshold boundaries, disease definition is not only the result of reductionist, mechanistic categories which reluctantly face re-composition. Disease is process and synergy as the characteristics of Systems Biology and Systems Medicine are included. To capture the wealth of information and contribute meaningfully to medical practice and biology research, Disease classification goes beyond a single spatial biologic level or static time assignment to include the interface of Disease process and organism response (Bechtel, 2017a; Green et al., 2017).
Collapse
Affiliation(s)
- Richard Berlin
- Department of Computer Science, University of Illinois, Urbana, IL, United States
| | - Russell Gruen
- Department of Surgery, Nanyang Institute of Technology in Health and Medicine, Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - James Best
- Lee Kong China School of Medicine, Nanyang Technological University, Singapore, Singapore
- Imperial College, London, United Kingdom
| |
Collapse
|
41
|
Wang Y, Xia ST, Tang Q, Wu J, Zhu X. A Novel Consistent Random Forest Framework: Bernoulli Random Forests. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3510-3523. [PMID: 28816676 DOI: 10.1109/tnnls.2017.2729778] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Random forests (RFs) are recognized as one type of ensemble learning method and are effective for the most classification and regression tasks. Despite their impressive empirical performance, the theory of RFs has yet been fully proved. Several theoretically guaranteed RF variants have been presented, but their poor practical performance has been criticized. In this paper, a novel RF framework is proposed, named Bernoulli RFs (BRFs), with the aim of solving the RF dilemma between theoretical consistency and empirical performance. BRF uses two independent Bernoulli distributions to simplify the tree construction, in contrast to the RFs proposed by Breiman. The two Bernoulli distributions are separately used to control the splitting feature and splitting point selection processes of tree construction. Consequently, theoretical consistency is ensured in BRF, i.e., the convergence of learning performance to optimum will be guaranteed when infinite data are given. Importantly, our proposed BRF is consistent for both classification and regression. The best empirical performance is achieved by BRF when it is compared with state-of-the-art theoretical/consistent RFs. This advance in RF research toward closing the gap between theory and practice is verified by the theoretical and experimental studies in this paper.
Collapse
|
42
|
Liu C, Liu B, Liu L, Zhang EL, Sun BD, Xu G, Chen J, Gao YQ. Arachidonic Acid Metabolism Pathway Is Not Only Dominant in Metabolic Modulation but Associated With Phenotypic Variation After Acute Hypoxia Exposure. Front Physiol 2018; 9:236. [PMID: 29615930 PMCID: PMC5864929 DOI: 10.3389/fphys.2018.00236] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 03/02/2018] [Indexed: 12/22/2022] Open
Abstract
Background: The modulation of arachidonic acid (AA) metabolism pathway is identified in metabolic alterations after hypoxia exposure, but its biological function is controversial. We aimed at integrating plasma metabolomic and transcriptomic approaches to systematically explore the roles of the AA metabolism pathway in response to acute hypoxia using an acute mountain sickness (AMS) model. Methods: Blood samples were obtained from 53 enrolled subjects before and after exposure to high altitude. Ultra-performance liquid chromatography-quadrupole time-of-flight mass spectrometry and RNA sequencing were separately performed for metabolomic and transcriptomic profiling, respectively. Influential modules comprising essential metabolites and genes were identified by weighted gene co-expression network analysis (WGCNA) after integrating metabolic information with phenotypic and transcriptomic datasets, respectively. Results: Enrolled subjects exhibited diverse response manners to hypoxia. Combined with obviously altered heart rate, oxygen saturation, hemoglobin, and Lake Louise Score (LLS), metabolomic profiling detected that 36 metabolites were highly related to clinical features in hypoxia responses, out of which 27 were upregulated and nine were downregulated, and could be mapped to AA metabolism pathway significantly. Integrated analysis of metabolomic and transcriptomic data revealed that these dominant molecules showed remarkable association with genes in gas transport incapacitation and disorders of hemoglobin metabolism pathways, such as ALAS2, HEMGN. After detailed description of AA metabolism pathway, we found that the molecules of 15-d-PGJ2, PGA2, PGE2, 12-O-3-OH-LTB4, LTD4, LTE4 were significantly up-regulated after hypoxia stimuli, and increased in those with poor response manner to hypoxia particularly. Further analysis in another cohort showed that genes in AA metabolism pathway such as PTGES, PTGS1, GGT1, TBAS1 et al. were excessively elevated in subjects in maladaptation to hypoxia. Conclusion: This is the first study to construct the map of AA metabolism pathway in response to hypoxia and reveal the crosstalk between phenotypic variation under hypoxia and the AA metabolism pathway. These findings may improve our understanding of the advanced pathophysiological mechanisms in acute hypoxic diseases and provide new insights into critical roles of the AA metabolism pathway in the development and prevention of these diseases.
Collapse
Affiliation(s)
- Chang Liu
- Institute of Medicine and Hygienic Equipment for High Altitude Region, College of High Altitude Military Medicine, Army Medical University, Third Military Medical University, Chongqing, China.,Key Laboratory of High Altitude Environmental Medicine, Army Medical University, Third Military Medical University, Ministry of Education, Chongqing, China.,Key Laboratory of High Altitude Medicine, People's Liberation Army, Chongqing, China
| | - Bao Liu
- Institute of Medicine and Hygienic Equipment for High Altitude Region, College of High Altitude Military Medicine, Army Medical University, Third Military Medical University, Chongqing, China.,Key Laboratory of High Altitude Environmental Medicine, Army Medical University, Third Military Medical University, Ministry of Education, Chongqing, China.,Key Laboratory of High Altitude Medicine, People's Liberation Army, Chongqing, China.,The 12th Hospital of Chinese People's Liberation Army, Kashi, China
| | - Lu Liu
- Institute of Medicine and Hygienic Equipment for High Altitude Region, College of High Altitude Military Medicine, Army Medical University, Third Military Medical University, Chongqing, China.,Key Laboratory of High Altitude Environmental Medicine, Army Medical University, Third Military Medical University, Ministry of Education, Chongqing, China.,Key Laboratory of High Altitude Medicine, People's Liberation Army, Chongqing, China
| | - Er-Long Zhang
- Institute of Medicine and Hygienic Equipment for High Altitude Region, College of High Altitude Military Medicine, Army Medical University, Third Military Medical University, Chongqing, China.,Key Laboratory of High Altitude Environmental Medicine, Army Medical University, Third Military Medical University, Ministry of Education, Chongqing, China.,Key Laboratory of High Altitude Medicine, People's Liberation Army, Chongqing, China
| | - Bind-da Sun
- Institute of Medicine and Hygienic Equipment for High Altitude Region, College of High Altitude Military Medicine, Army Medical University, Third Military Medical University, Chongqing, China.,Key Laboratory of High Altitude Environmental Medicine, Army Medical University, Third Military Medical University, Ministry of Education, Chongqing, China.,Key Laboratory of High Altitude Medicine, People's Liberation Army, Chongqing, China
| | - Gang Xu
- Institute of Medicine and Hygienic Equipment for High Altitude Region, College of High Altitude Military Medicine, Army Medical University, Third Military Medical University, Chongqing, China.,Key Laboratory of High Altitude Environmental Medicine, Army Medical University, Third Military Medical University, Ministry of Education, Chongqing, China.,Key Laboratory of High Altitude Medicine, People's Liberation Army, Chongqing, China
| | - Jian Chen
- Institute of Medicine and Hygienic Equipment for High Altitude Region, College of High Altitude Military Medicine, Army Medical University, Third Military Medical University, Chongqing, China.,Key Laboratory of High Altitude Environmental Medicine, Army Medical University, Third Military Medical University, Ministry of Education, Chongqing, China.,Key Laboratory of High Altitude Medicine, People's Liberation Army, Chongqing, China
| | - Yu-Qi Gao
- Institute of Medicine and Hygienic Equipment for High Altitude Region, College of High Altitude Military Medicine, Army Medical University, Third Military Medical University, Chongqing, China.,Key Laboratory of High Altitude Environmental Medicine, Army Medical University, Third Military Medical University, Ministry of Education, Chongqing, China.,Key Laboratory of High Altitude Medicine, People's Liberation Army, Chongqing, China
| |
Collapse
|
43
|
Acharjee A, Chibon PY, Kloosterman B, America T, Renaut J, Maliepaard C, Visser RGF. Genetical genomics of quality related traits in potato tubers using proteomics. BMC PLANT BIOLOGY 2018; 18:20. [PMID: 29361908 PMCID: PMC5781343 DOI: 10.1186/s12870-018-1229-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 01/09/2018] [Indexed: 05/21/2023]
Abstract
BACKGROUND Recent advances in ~omics technologies such as transcriptomics, metabolomics and proteomics along with genotypic profiling have permitted the genetic dissection of complex traits such as quality traits in non-model species. To get more insight into the genetic factors underlying variation in quality traits related to carbohydrate and starch metabolism and cold sweetening, we determined the protein content and composition in potato tubers using 2D-gel electrophoresis in a diploid potato mapping population. Upon analyzing we made sure that the proteins from the patatin family were excluded to ensure a better representation of the other proteins. RESULTS We subsequently performed pQTL analyses for all other proteins with a sufficient representation in the population and established a relationship between proteins and 26 potato tuber quality traits (e.g. flesh colour, enzymatic discoloration) by co-localization on the genetic map and a direct correlation study of protein abundances and phenotypic traits. Over 1643 unique protein spots were detected in total over the two harvests. We were able to map pQTLs for over 300 different protein spots some of which co-localized with traits such as starch content and cold sweetening. pQTLs were observed on every chromosome although not evenly distributed over the chromosomes. The largest number of pQTLs was found for chromosome 8 and the lowest for chromosome number 10. For some 20 protein spots multiple QTLs were observed. CONCLUSIONS From this analysis, hotspot areas for protein QTLs were identified on chromosomes three, five, eight and nine. The hotspot on chromosome 3 coincided with a QTL previously identified for total protein content and had more than 23 pQTLs in the region from 70 to 80 cM. Some of the co-localizing protein spots associated with some of the most interesting tuber quality traits were identified, albeit far less than we had anticipated at the onset of the experiments.
Collapse
Affiliation(s)
- Animesh Acharjee
- Graduate School Experimental Plant Sciences, Wageningen, The Netherlands
- Plant Breeding, Wageningen University and Research, PO Box 386, 6700 AJ Wageningen, The Netherlands
- Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, B15 2TT UK
| | - Pierre-Yves Chibon
- Graduate School Experimental Plant Sciences, Wageningen, The Netherlands
- Plant Breeding, Wageningen University and Research, PO Box 386, 6700 AJ Wageningen, The Netherlands
| | - Bjorn Kloosterman
- Plant Breeding, Wageningen University and Research, PO Box 386, 6700 AJ Wageningen, The Netherlands
- Present address: Keygene NV, PO Box 216, 6700 AE Wageningen, The Netherlands
| | - Twan America
- Centre for BioSystems Genomics, P.O. Box 98, 6700 AA Wageningen, The Netherlands
- Business unit BiosciencesWageningen University and Research, P.O. Box 16, 6700 AA Wageningen, The Netherlands
| | - Jenny Renaut
- Centre de Recherche Public - Gabriel Lippmann Department of Environment and Agrobiotechnologies (EVA) 41, rue du Brill, L-4422 Belvaux, Luxembourg
| | - Chris Maliepaard
- Plant Breeding, Wageningen University and Research, PO Box 386, 6700 AJ Wageningen, The Netherlands
| | - Richard G. F. Visser
- Plant Breeding, Wageningen University and Research, PO Box 386, 6700 AJ Wageningen, The Netherlands
- Centre for BioSystems Genomics, P.O. Box 98, 6700 AA Wageningen, The Netherlands
| |
Collapse
|
44
|
Kim M, Tagkopoulos I. Data integration and predictive modeling methods for multi-omics datasets. Mol Omics 2018; 14:8-25. [DOI: 10.1039/c7mo00051k] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
We provide an overview of opportunities and challenges in multi-omics predictive analytics with particular emphasis on data integration and machine learning methods.
Collapse
Affiliation(s)
- Minseung Kim
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| | - Ilias Tagkopoulos
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| |
Collapse
|
45
|
Voelckel C, Gruenheit N, Lockhart P. Evolutionary Transcriptomics and Proteomics: Insight into Plant Adaptation. TRENDS IN PLANT SCIENCE 2017; 22:462-471. [PMID: 28365131 DOI: 10.1016/j.tplants.2017.03.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 02/21/2017] [Accepted: 03/01/2017] [Indexed: 06/07/2023]
Abstract
Comparative transcriptomics and proteomics (T&P) have brought biological insight into development, gene function, and physiological stress responses. However, RNA-seq and high-throughput proteomics remain underutilised in studies of plant adaptation. These methodologies have created discovery tools with the potential to significantly advance our understanding of adaptive diversification. We outline experimental recommendations for their application. We discuss analysis models and approaches that accelerate the identification of adaptive gene sets and integrate transcriptome, proteome, phenotypic, and environmental data. Finally, we encourage widespread uptake and future developments in T&P that will advance our understanding of evolution and adaptation.
Collapse
Affiliation(s)
| | - Nicole Gruenheit
- Faculty of Biology, Health, and Medicine, University of Manchester, Manchester, UK
| | - Peter Lockhart
- Institute for Fundamental Sciences, Massey University, Palmerston North, New Zealand
| |
Collapse
|
46
|
IPF-LASSO: Integrative L1-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:7691937. [PMID: 28546826 PMCID: PMC5435977 DOI: 10.1155/2017/7691937] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 03/14/2017] [Indexed: 11/29/2022]
Abstract
As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.
Collapse
|
47
|
Kopczynski D, Coman C, Zahedi RP, Lorenz K, Sickmann A, Ahrends R. Multi-OMICS: a critical technical perspective on integrative lipidomics approaches. Biochim Biophys Acta Mol Cell Biol Lipids 2017; 1862:808-811. [PMID: 28193460 DOI: 10.1016/j.bbalip.2017.02.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Revised: 02/03/2017] [Accepted: 02/06/2017] [Indexed: 02/06/2023]
Abstract
During the past decades, high-throughput approaches for analyzing different molecular classes such as nucleic acids, proteins, metabolites, and lipids have grown rapidly. These approaches became powerful tools for getting a fundamental understanding of biological systems. Considering each approach and its results separately, relations and causal connections between these classes have no chance to be revealed, since only separate molecular snapshots are provided. Only a combined approach, not fully established yet, with the integration of the corresponding data, might yield a comprehensive and complete understanding of biological processes, such as crosstalk and interactions in signaling pathways. Taking two or more omics-methods into consideration for analysis is referred to as a multi-omics approach, which is gradually evolving. In this critical note, we briefly discuss the relevance, challenges, current state, and potential of data integration from multi-omics approaches, with a special focus on lipidomics analysis, listing the advantages and gaps in this field. This article is part of a Special Issue entitled: BBALIP_Lipidomics Opinion Articles edited by Sepp Kohlwein.
Collapse
Affiliation(s)
- Dominik Kopczynski
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Str. 6b, Dortmund, Germany
| | - Cristina Coman
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Str. 6b, Dortmund, Germany
| | - Rene P Zahedi
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Str. 6b, Dortmund, Germany
| | - Kristina Lorenz
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Str. 6b, Dortmund, Germany; West German Heart and Vascular Center Essen, University Hospital Essen, Essen, Germany
| | - Albert Sickmann
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Str. 6b, Dortmund, Germany; Medizinische Fakultät, Medizinische Proteom-Center (MPC), Ruhr-Universität Bochum, Bochum, Germany; Department of Chemistry, College of Physical Sciences, University of Aberdeen, Aberdeen, Scotland, UK
| | - Robert Ahrends
- Leibniz-Institut für Analytische Wissenschaften-ISAS-e.V., Otto-Hahn-Str. 6b, Dortmund, Germany.
| |
Collapse
|
48
|
Acharjee A, Prentice P, Acerini C, Smith J, Hughes IA, Ong K, Griffin JL, Dunger D, Koulman A. The translation of lipid profiles to nutritional biomarkers in the study of infant metabolism. Metabolomics 2017; 13:25. [PMID: 28190990 PMCID: PMC5272886 DOI: 10.1007/s11306-017-1166-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Accepted: 01/12/2017] [Indexed: 02/02/2023]
Abstract
INTRODUCTION Links between early life exposures and later health outcomes may, in part, be due to nutritional programming in infancy. This hypothesis is supported by observed long-term benefits associated with breastfeeding, such as better cognitive development in childhood, and lower risks of obesity and high blood pressure in later life. However, the possible underlying mechanisms are expected to be complex and may be difficult to disentangle due to the lack of understanding of the metabolic processes that differentiate breastfed infants compared to those receiving just formula feed. OBJECTIVE Our aim was to investigate the relationships between infant feeding and the lipid profiles and to validate specific lipids in separate datasets so that a small set of lipids can be used as nutritional biomarkers. METHOD We utilized a direct infusion high-resolution mass spectrometry method to analyse the lipid profiles of 3.2 mm dried blood spot samples collected at age 3 months from the Cambridge Baby Growth Study (CBGS-1), which formed the discovery cohort. For validation two sample sets were profiled: Cambridge Baby Growth Study (CBGS-2) and Pregnancy Outcome Prediction Study (POPS). Lipidomic profiles were compared between infant groups who were either exclusively breastfed, exclusively formula-fed or mixed-fed at various levels. Data analysis included supervised Random Forest method with combined classification and regression mode. Selection of lipids was based on an iterative backward elimination procedure without compromising the class error in the classification mode. CONCLUSION From this study, we were able to identify and validate three lipids: PC(35:2), SM(36:2) and SM(39:1) that can be used collectively as biomarkers for infant nutrition during early development. These biomarkers can be used to determine whether young infants (3-6 months) are breast-fed or receive formula milk.
Collapse
Affiliation(s)
- Animesh Acharjee
- 0000 0004 0606 2472grid.415055.0MRC Elsie Widdowson Laboratory, Cambridge, UK
- 0000000121885934grid.5335.0Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Philippa Prentice
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - Carlo Acerini
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - James Smith
- 0000 0004 0606 2472grid.415055.0MRC Elsie Widdowson Laboratory, Cambridge, UK
- 0000 0004 1936 8403grid.9909.9School of Food Science and Nutrition, University of Leeds, Leeds, UK
| | - Ieuan A. Hughes
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - Ken Ong
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
- 0000000121885934grid.5335.0MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Julian L. Griffin
- 0000 0004 0606 2472grid.415055.0MRC Elsie Widdowson Laboratory, Cambridge, UK
- 0000000121885934grid.5335.0Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - David Dunger
- 0000000121885934grid.5335.0Department of Paediatrics, University of Cambridge, Cambridge, UK
| | - Albert Koulman
- 0000 0004 0606 2472grid.415055.0MRC Elsie Widdowson Laboratory, Cambridge, UK
- 0000000121885934grid.5335.0NIHR BRC Clinical Metabolomics and Lipidomics Laboratory, Level 4, Laboratory Block, Cambridge University Hospitals, University of Cambridge, Hills Road, Cambridge, CB2 0QQ UK
| |
Collapse
|
49
|
Fabres PJ, Collins C, Cavagnaro TR, Rodríguez López CM. A Concise Review on Multi-Omics Data Integration for Terroir Analysis in Vitis vinifera. FRONTIERS IN PLANT SCIENCE 2017; 8:1065. [PMID: 28676813 PMCID: PMC5477006 DOI: 10.3389/fpls.2017.01065] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 06/02/2017] [Indexed: 05/19/2023]
Abstract
Vitis vinifera (grapevine) is one of the most important fruit crops, both for fresh consumption and wine and spirit production. The term terroir is frequently used in viticulture and the wine industry to relate wine sensory attributes to its geographic origin. Although, it can be cultivated in a wide range of environments, differences in growing conditions have a significant impact on fruit traits that ultimately affect wine quality. Understanding how fruit quality and yield are controlled at a molecular level in grapevine in response to environmental cues has been a major driver of research. Advances in the area of genomics, epigenomics, transcriptomics, proteomics and metabolomics, have significantly increased our knowledge on the abiotic regulation of yield and quality in many crop species, including V. vinifera. The integrated analysis of multiple 'omics' can give us the opportunity to better understand how plants modulate their response to different environments. However, 'omics' technologies provide a large amount of biological data and its interpretation is not always straightforward, especially when different 'omic' results are combined. Here we examine the current strategies used to integrate multi-omics, and how these have been used in V. vinifera. In addition, we also discuss the importance of including epigenomics data when integrating omics data as epigenetic mechanisms could play a major role as an intermediary between the environment and the genome.
Collapse
Affiliation(s)
- Pastor Jullian Fabres
- Environmental Epigenetics and Genetics Group, Plant Research Centre, School of Agriculture, Food and Wine, University of Adelaide, Glen OsmondSA, Australia
| | - Cassandra Collins
- The Waite Research Institute, The School of Agriculture, Food and Wine, The University of Adelaide, Glen OsmondSA, Australia
| | - Timothy R. Cavagnaro
- The Waite Research Institute, The School of Agriculture, Food and Wine, The University of Adelaide, Glen OsmondSA, Australia
| | - Carlos M. Rodríguez López
- Environmental Epigenetics and Genetics Group, Plant Research Centre, School of Agriculture, Food and Wine, University of Adelaide, Glen OsmondSA, Australia
- *Correspondence: Carlos M. Rodríguez López,
| |
Collapse
|
50
|
Bhattacharjee B, Shafi M, Acharjee A. Investigating the Influence Relationship Models for Stocks in Indian Equity Market: A Weighted Network Modelling Study. PLoS One 2016; 11:e0166087. [PMID: 27846251 PMCID: PMC5113066 DOI: 10.1371/journal.pone.0166087] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 10/22/2016] [Indexed: 11/18/2022] Open
Abstract
The socio-economic systems today possess high levels of both interconnectedness and interdependencies, and such system-level relationships behave very dynamically. In such situations, it is all around perceived that influence is a perplexing power that has an overseeing part in affecting the dynamics and behaviours of involved ones. As a result of the force & direction of influence, the transformative change of one entity has a cogent aftereffect on the other entities in the system. The current study employs directed weighted networks for investigating the influential relationship patterns existent in a typical equity market as an outcome of inter-stock interactions happening at the market level, the sectorial level and the industrial level. The study dataset is derived from 335 constituent stocks of 'Standard & Poor Bombay Stock Exchange 500 index' and study period is 1st June 2005 to 30th June 2015. The study identifies the set of most dynamically influential stocks & their respective temporal pattern at three hierarchical levels: the complete equity market, different sectors, and constituting industry segments of those sectors. A detailed influence relationship analysis is performed for the sectorial level network of the construction sector, and it was found that stocks belonging to the cement industry possessed high influence within this sector. Also, the detailed network analysis of construction sector revealed that it follows scale-free characteristics and power law distribution. In the industry specific influence relationship analysis for cement industry, methods based on threshold filtering and minimum spanning tree were employed to derive a set of sub-graphs having temporally stable high-correlation structure over this ten years period.
Collapse
Affiliation(s)
- Biplab Bhattacharjee
- School of Management Studies, National Institute of Technology, Calicut, Kerala, India
| | - Muhammad Shafi
- School of Management Studies, National Institute of Technology, Calicut, Kerala, India
| | - Animesh Acharjee
- School of Management Studies, National Institute of Technology, Calicut, Kerala, India
- Department of Biochemistry, Sanger Building, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|