1
|
Kim R, Lin T, Pang G, Liu Y, Tungate AS, Hendry PL, Kurz MC, Peak DA, Jones J, Rathlev NK, Swor RA, Domeier R, Velilla MA, Lewandowski C, Datner E, Pearson C, Lee D, Mitchell PM, McLean SA, Linnstaedt SD. Derivation and validation of risk prediction for posttraumatic stress symptoms following trauma exposure. Psychol Med 2023; 53:4952-4961. [PMID: 35775366 DOI: 10.1017/s003329172200191x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
BACKGROUND Posttraumatic stress symptoms (PTSS) are common following traumatic stress exposure (TSE). Identification of individuals with PTSS risk in the early aftermath of TSE is important to enable targeted administration of preventive interventions. In this study, we used baseline survey data from two prospective cohort studies to identify the most influential predictors of substantial PTSS. METHODS Self-identifying black and white American women and men (n = 1546) presenting to one of 16 emergency departments (EDs) within 24 h of motor vehicle collision (MVC) TSE were enrolled. Individuals with substantial PTSS (⩾33, Impact of Events Scale - Revised) 6 months after MVC were identified via follow-up questionnaire. Sociodemographic, pain, general health, event, and psychological/cognitive characteristics were collected in the ED and used in prediction modeling. Ensemble learning methods and Monte Carlo cross-validation were used for feature selection and to determine prediction accuracy. External validation was performed on a hold-out sample (30% of total sample). RESULTS Twenty-five percent (n = 394) of individuals reported PTSS 6 months following MVC. Regularized linear regression was the top performing learning method. The top 30 factors together showed good reliability in predicting PTSS in the external sample (Area under the curve = 0.79 ± 0.002). Top predictors included acute pain severity, recovery expectations, socioeconomic status, self-reported race, and psychological symptoms. CONCLUSIONS These analyses add to a growing literature indicating that influential predictors of PTSS can be identified and risk for future PTSS estimated from characteristics easily available/assessable at the time of ED presentation following TSE.
Collapse
Affiliation(s)
- Raphael Kim
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
- Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA
| | - Tina Lin
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
| | - Gehao Pang
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
| | - Yufeng Liu
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA
- Department of Genetics, Carolina Center for Genome Sciences, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA
| | - Andrew S Tungate
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
| | - Phyllis L Hendry
- Department of Emergency Medicine, University of Florida College of Medicine, Jacksonville, FL, USA
| | - Michael C Kurz
- Department of Emergency Medicine, University of Alabama, Birmingham, AL, USA
| | - David A Peak
- Department of Emergency Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Jeffrey Jones
- Department of Emergency Medicine, Spectrum Health Butterworth Campus, Grand Rapids, MI, USA
| | - Niels K Rathlev
- Department of Emergency Medicine, Baystate State Health System, Springfield, MA, USA
| | - Robert A Swor
- Department of Emergency Medicine, Beaumont Hospital, Royal Oak, MI, USA
| | - Robert Domeier
- Department of Emergency Medicine, St Joseph Mercy Health System, Ann Arbor, MI, USA
| | | | | | - Elizabeth Datner
- Department of Emergency Medicine, Albert Einstein Medical Center, Philadelphia, PA, USA
| | - Claire Pearson
- Department of Emergency Medicine, Detroit Receiving, Detroit, MI, USA
| | - David Lee
- Department of Emergency Medicine, North Shore University Hospital, Manhasset, NY, USA
| | - Patricia M Mitchell
- Department of Emergency Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Samuel A McLean
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
- Department of Emergency Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Sarah D Linnstaedt
- Institute for Trauma Recovery, University of North Carolina, Chapel Hill, NC, USA
- Department of Anesthesiology, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
2
|
Role of artificial intelligence and machine learning in interventional cardiology. Curr Probl Cardiol 2023; 48:101698. [PMID: 36921654 DOI: 10.1016/j.cpcardiol.2023.101698] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Accepted: 03/08/2023] [Indexed: 03/16/2023]
Abstract
Directed by two decades of technological processes and remodeling, the dynamic quality of healthcare data combined with the progress of computational power has allowed for rapid progress in artificial intelligence (AI). In interventional cardiology, AI has shown potential in providing data interpretation and automated analysis from electrocardiogram (ECG), echocardiography, computed tomography angiography (CTA), magnetic resonance imaging (MRI), and electronic patient data. Clinical decision support has the potential to assist in improving patient safety and making prognostic and diagnostic conjectures in interventional cardiology procedures. Robot-assisted percutaneous coronary intervention (R-PCI), along with functional and quantitative assessment of coronary artery ischemia and plaque burden on intravascular ultrasound (IVUS), are the major applications of AI. Machine learning (ML) algorithms are used in these applications, and they have the potential to bring a paradigm shift in intervention. Recently, an efficient branch of ML has emerged as a deep learning algorithm for numerous cardiovascular (CV) applications. However, the impact DL on the future of cardiology practice is not clear. Predictive models based on DL have several limitations including low generalizability and decision processing in cardiac anatomy.
Collapse
|
3
|
Hsu W, Warren JR, Riddle PJ. Medication adherence prediction through temporal modelling in cardiovascular disease management. BMC Med Inform Decis Mak 2022; 22:313. [DOI: 10.1186/s12911-022-02052-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 11/16/2022] [Indexed: 11/30/2022] Open
Abstract
Abstract
Background
Chronic conditions place a considerable burden on modern healthcare systems. Within New Zealand and worldwide cardiovascular disease (CVD) affects a significant proportion of the population and it is the leading cause of death. Like other chronic diseases, the course of cardiovascular disease is usually prolonged and its management necessarily long-term. Despite being highly effective in reducing CVD risk, non-adherence to long-term medication continues to be a longstanding challenge in healthcare delivery. The study investigates the benefits of integrating patient history and assesses the contribution of explicitly temporal models to medication adherence prediction in the context of lipid-lowering therapy.
Methods
Data from a CVD risk assessment tool is linked to routinely collected national and regional data sets including pharmaceutical dispensing, hospitalisation, lab test results and deaths. The study extracts a sub-cohort from 564,180 patients who had primary CVD risk assessment for analysis. Based on community pharmaceutical dispensing record, proportion of days covered (PDC) $$\ge$$
≥
80 is used as the threshold for adherence. Two years (8 quarters) of patient history before their CVD risk assessment is used as the observation window to predict patient adherence in the subsequent 5 years (20 quarters). The predictive performance of temporal deep learning models long short-term memory (LSTM) and simple recurrent neural networks (Simple RNN) are compared against non-temporal models multilayer perceptron (MLP), ridge classifier (RC) and logistic regression (LR). Further, the study investigates the effect of lengthening the observation window on the task of adherence prediction.
Results
Temporal models that use sequential data outperform non-temporal models, with LSTM producing the best predictive performance achieving a ROC AUC of 0.805. A performance gap is observed between models that can discover non-linear interactions between predictor variables and their linear counter parts, with neural network (NN) based models significantly outperforming linear models. Additionally, the predictive advantage of temporal models become more pronounced when the length of the observation window is increased.
Conclusion
The findings of the study provide evidence that using deep temporal models to integrate patient history in adherence prediction is advantageous. In particular, the RNN architecture LSTM significantly outperforms all other model comparators.
Collapse
|
4
|
Comparison of artificial intelligence algorithms and their ranking for the prediction of genetic merit in sheep. Sci Rep 2022; 12:18726. [PMID: 36333409 PMCID: PMC9636184 DOI: 10.1038/s41598-022-23499-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 11/01/2022] [Indexed: 11/06/2022] Open
Abstract
As the amount of data on farms grows, it is important to evaluate the potential of artificial intelligence for making farming predictions. Considering all this, this study was undertaken to evaluate various machine learning (ML) algorithms using 52-year data for sheep. Data preparation was done before analysis. Breeding values were estimated using Best Linear Unbiased Prediction. 12 ML algorithms were evaluated for their ability to predict the breeding values. The variance inflation factor for all features selected through principal component analysis (PCA) was 1. The correlation coefficients between true and predicted values for artificial neural networks, Bayesian ridge regression, classification and regression trees, gradient boosting algorithm, K nearest neighbours, multivariate adaptive regression splines (MARS) algorithm, polynomial regression, principal component regression (PCR), random forests, support vector machines, XGBoost algorithm were 0.852, 0.742, 0.869, 0.915, 0.781, 0.746, 0.742, 0.746, 0.917, 0.777, 0.915 respectively for breeding value prediction. Random forests had the highest correlation coefficients. Among the prediction equations generated using OLS, the highest coefficient of determination was 0.569. A total of 12 machine learning models were developed from the prediction of breeding values in sheep in the present study. It may be said that machine learning techniques can perform predictions with reasonable accuracies and can thus be viable alternatives to conventional strategies for breeding value prediction.
Collapse
|
5
|
Cheng B, Zhou P, Chen Y. Machine-learning algorithms based on personalized pathways for a novel predictive model for the diagnosis of hepatocellular carcinoma. BMC Bioinformatics 2022; 23:248. [PMID: 35739471 PMCID: PMC9219178 DOI: 10.1186/s12859-022-04805-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 06/20/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND At present, the diagnostic ability of hepatocellular carcinoma (HCC) based on serum alpha-fetoprotein level is limited. Finding markers that can effectively distinguish cancer and non-cancerous tissues is important for improving the diagnostic efficiency of HCC. RESULTS In this study, we developed a predictive model for HCC diagnosis using personalized biological pathways combined with a machine learning algorithm based on regularized regression and carry out relevant examinations. In two training sets, the overall cross-study-validated area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve and the Brier score of the diagnostic model were 0.987 [95%confidence interval (CI): 0.979-0.996], 0.981 and 0.091, respectively. Besides, the model showed good transferability in external validation set. In TCGA-LIHC cohort, the AUROC, AURPC and Brier score were 0.992 (95%CI: 0.985-0.998), 0.967 and 0.112, respectively. The diagnostic model has accomplished very impressive performance in distinguishing HCC from non-cancerous liver tissues. Moreover, we further analyzed the extracted biological pathways to explore molecular features and prognostic factors. The risk score generated from a 12-gene signature extracted from the characteristic pathways was correlated with some immune related pathways and served as an independent prognostic factor for HCC. CONCLUSION We used personalized biological pathways analysis and machine learning algorithm to construct a highly accurate HCC diagnostic model. The excellent interpretable performance and good transferability of this model enables it with great potential for personalized medicine, which can assist clinicians in diagnosis for HCC patients.
Collapse
Affiliation(s)
- Binglin Cheng
- Department of Radiation Oncology, Nanfang Hospital, Southern Medical University, 1838 Guangzhou Avenue North, Baiyun District, Guangzhou, 510515, Guangdong Province, China.,The First School of Clinical Medicine, Southern Medical University, Guangzhou, Guangdong Province, China
| | - Peitao Zhou
- Department of Radiation Oncology, Nanfang Hospital, Southern Medical University, 1838 Guangzhou Avenue North, Baiyun District, Guangzhou, 510515, Guangdong Province, China
| | - Yuhan Chen
- Department of Radiation Oncology, Nanfang Hospital, Southern Medical University, 1838 Guangzhou Avenue North, Baiyun District, Guangzhou, 510515, Guangdong Province, China.
| |
Collapse
|
6
|
Patiyal S, Dhall A, Raghava GPS. Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: Benchmarking of mutation calling techniques. Biol Methods Protoc 2022; 7:bpac012. [PMID: 35734767 PMCID: PMC9204470 DOI: 10.1093/biomethods/bpac012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 05/20/2022] [Accepted: 05/20/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
Identification of somatic mutations with high precision is one of the major challenges in the prediction of high-risk liver-cancer patients. In the past, number of mutations calling techniques have been developed that include MuTect2, MuSE, Varscan2, and SomaticSniper. In this study, an attempt has been made to benchmark the potential of these techniques in predicting the prognostic biomarkers for liver cancer. Initially, we extracted somatic mutations in liver cancer patients using Variant Call Format (VCF) and Mutation Annotation Format (MAF) files from the cancer genome atlas. In terms of size, the MAF files are 42 times smaller than VCF files and containing only high-quality somatic mutations. Further, machine learning based models have been developed for predicting high-risk cancer patients using mutations obtained from different techniques. The performance of different techniques and data files have been compared based on their potential to discriminate high and low-risk liver-cancer patients. Based on correlation analysis, we selected 80 genes having significant negative-correlation with the overall survival of liver cancer patients. The univariate survival analysis revealed the prognostic role of highly mutated genes. Single-gene based analysis showed that MuTect2 technique based MAF file has achieved maximum hazard ratio (HRLAMC3) of 9.25 with p-value 1.78E-06. Further, we developed various prediction models using risk-associated top-10 genes for each technique. Our results indicate that MuTect2 technique based VCF files outperform all other methods with maximum Area Under the Receiver-Operating Characteristic (AUROC) curve of 0.765 and HR 4.50 (p-value 3.83E-15). Eventually, VCF file generated using MuTect2 technique performs better among other mutation calling techniques for the prediction of high-risk liver cancer patients. We hope that our findings will provide a useful and comprehensive comparison of various mutation calling techniques for the prognostic analysis of cancer patients. In order to serve the scientific community, we have provided a Python-based pipeline to develop the prediction models using mutation profiles (VCF/MAF) of cancer patients. It is available on GitHub at https://github.com/raghavagps/mutation_bench.
Collapse
Affiliation(s)
- Sumeet Patiyal
- Indraprastha Institute of Information Technology Department of Computational Biology, , Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Indraprastha Institute of Information Technology Department of Computational Biology, , Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Indraprastha Institute of Information Technology Department of Computational Biology, , Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
7
|
Ma C, Wu M, Ma S. Analysis of cancer omics data: a selective review of statistical techniques. Brief Bioinform 2022; 23:6510158. [PMID: 35039832 DOI: 10.1093/bib/bbab585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 12/19/2021] [Accepted: 12/20/2021] [Indexed: 11/13/2022] Open
Abstract
Cancer is an omics disease. The development in high-throughput profiling has fundamentally changed cancer research and clinical practice. Compared with clinical, demographic and environmental data, the analysis of omics data-which has higher dimensionality, weaker signals and more complex distributional properties-is much more challenging. Developments in the literature are often 'scattered', with individual studies focused on one or a few closely related methods. The goal of this review is to assist cancer researchers with limited statistical expertise in establishing the 'overall framework' of cancer omics data analysis. To facilitate understanding, we mainly focus on intuition, concepts and key steps, and refer readers to the original publications for mathematical details. This review broadly covers unsupervised and supervised analysis, as well as individual-gene-based, gene-set-based and gene-network-based analysis. We also briefly discuss 'special topics' including interaction analysis, multi-datasets analysis and multi-omics analysis.
Collapse
Affiliation(s)
- Chenjin Ma
- College of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing, China
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
8
|
Evaluation and Prediction on the Effect of Ionic Properties of Solvent Extraction Performance of Oily Sludge Using Machine Learning. Molecules 2021; 26:molecules26247551. [PMID: 34946635 PMCID: PMC8708711 DOI: 10.3390/molecules26247551] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 12/06/2021] [Accepted: 12/08/2021] [Indexed: 11/20/2022] Open
Abstract
Oily sludge produced in the process of petroleum exploitation and utilization is a kind of hazardous waste that needs to be urgently dealt with in the petrochemical industry. The oil content of oily sludge is generally between 15–50% and has a great potential for oil resource utilization. However, its composition is complex, in which asphaltene is of high viscosity and difficult to separate. In this study, The oily sludge was extracted with toluene as solvent, supplemented by three kinds of ionic liquids (1-ethyl-3-methylimidazole tetrafluoroborate ([EMIM] [BF4]), 1-ethyl-3-methylimidazole trifluoro-acetate ([EMIM] [TA]), 1-ethyl-3-methylimidazole Dicyandiamide ([EMIM] [N(CN)2])) and three kinds of deep eutectic solutions (choline chloride/urea (ChCl/U), choline chloride / ethylene glycol (ChCl/EG), and choline chloride/malonic acid (ChCl/MA)). This experiment investigates the effect of physicochemical properties of the solvents on oil recovery and three machine learning methods (ridge regression, multilayer perceptron, and support vector regression) are used to predict the association between them. Depending on the linear correlation of variables, it is found that the conductivity of ionic liquid is the key characteristic affecting the extraction treatment in this system.
Collapse
|
9
|
Tatsumi K, Igarashi N, Mengxue X. Prediction of plant-level tomato biomass and yield using machine learning with unmanned aerial vehicle imagery. PLANT METHODS 2021; 17:77. [PMID: 34266447 PMCID: PMC8281694 DOI: 10.1186/s13007-021-00761-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 06/04/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND The objective of this study is twofold. First, ascertain the important variables that predict tomato yields from plant height (PH) and vegetation index (VI) maps. The maps were derived from images taken by unmanned aerial vehicles (UAVs). Second, examine the accuracy of predictions of tomato fresh shoot masses (SM), fruit weights (FW), and the number of fruits (FN) from multiple machine learning algorithms using selected variable sets. To realize our objective, ultra-high-resolution RGB and multispectral images were collected by a UAV on ten days in 2020's tomato growing season. From these images, 756 total variables, including first- (e.g., average, standard deviation, skewness, range, and maximum) and second-order (e.g., gray-level co-occurrence matrix features and growth rates of PH and VIs) statistics for each plant, were extracted. Several selection algorithms (i.e., Boruta, DALEX, genetic algorithm, least absolute shrinkage and selection operator, and recursive feature elimination) were used to select the variable sets useful for predicting SM, FW, and FN. Random forests, ridge regressions, and support vector machines were used to predict the yield using the top five selected variable sets. RESULTS First-order statistics of PH and VIs collected during the early to mid-fruit formation periods, about one month prior to harvest, were important variables for predicting SM. Similar to the case for SM, variables collected approximately one month prior to harvest were important for predicting FW and FN. Furthermore, variables related to PH were unimportant for prediction. Compared with predictions obtained using only first-order statistics, those obtained using the second-order statistics of VIs were more accurate for FW and FN. The prediction accuracy of SM, FW, and FN by models constructed from all variables (rRMSE = 8.8-28.1%) was better than that from first-order statistics (rRMSE = 10.0-50.1%). CONCLUSIONS In addition to basic statistics (e.g., average and standard deviation), we derived second-order statistics of PH and VIs at the plant level using the ultra-high resolution UAV images. Our findings indicated that our variable selection method reduced the number variables needed for tomato yield prediction, improving the efficiency of phenotypic data collection and assisting with the selection of high-yield lines within breeding programs.
Collapse
Affiliation(s)
- Kenichi Tatsumi
- Department of Environmental and Agricultural Engineering, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan.
| | - Noa Igarashi
- Faculty of Agriculture Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Xiao Mengxue
- Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| |
Collapse
|
10
|
Tatsumi K, Igarashi N, Mengxue X. Prediction of plant-level tomato biomass and yield using machine learning with unmanned aerial vehicle imagery. PLANT METHODS 2021; 17:77. [PMID: 34266447 DOI: 10.21203/rs.3.rs-344860/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 06/04/2021] [Indexed: 05/21/2023]
Abstract
BACKGROUND The objective of this study is twofold. First, ascertain the important variables that predict tomato yields from plant height (PH) and vegetation index (VI) maps. The maps were derived from images taken by unmanned aerial vehicles (UAVs). Second, examine the accuracy of predictions of tomato fresh shoot masses (SM), fruit weights (FW), and the number of fruits (FN) from multiple machine learning algorithms using selected variable sets. To realize our objective, ultra-high-resolution RGB and multispectral images were collected by a UAV on ten days in 2020's tomato growing season. From these images, 756 total variables, including first- (e.g., average, standard deviation, skewness, range, and maximum) and second-order (e.g., gray-level co-occurrence matrix features and growth rates of PH and VIs) statistics for each plant, were extracted. Several selection algorithms (i.e., Boruta, DALEX, genetic algorithm, least absolute shrinkage and selection operator, and recursive feature elimination) were used to select the variable sets useful for predicting SM, FW, and FN. Random forests, ridge regressions, and support vector machines were used to predict the yield using the top five selected variable sets. RESULTS First-order statistics of PH and VIs collected during the early to mid-fruit formation periods, about one month prior to harvest, were important variables for predicting SM. Similar to the case for SM, variables collected approximately one month prior to harvest were important for predicting FW and FN. Furthermore, variables related to PH were unimportant for prediction. Compared with predictions obtained using only first-order statistics, those obtained using the second-order statistics of VIs were more accurate for FW and FN. The prediction accuracy of SM, FW, and FN by models constructed from all variables (rRMSE = 8.8-28.1%) was better than that from first-order statistics (rRMSE = 10.0-50.1%). CONCLUSIONS In addition to basic statistics (e.g., average and standard deviation), we derived second-order statistics of PH and VIs at the plant level using the ultra-high resolution UAV images. Our findings indicated that our variable selection method reduced the number variables needed for tomato yield prediction, improving the efficiency of phenotypic data collection and assisting with the selection of high-yield lines within breeding programs.
Collapse
Affiliation(s)
- Kenichi Tatsumi
- Department of Environmental and Agricultural Engineering, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan.
| | - Noa Igarashi
- Faculty of Agriculture Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Xiao Mengxue
- Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| |
Collapse
|
11
|
Scott MF, Fradgley N, Bentley AR, Brabbs T, Corke F, Gardner KA, Horsnell R, Howell P, Ladejobi O, Mackay IJ, Mott R, Cockram J. Limited haplotype diversity underlies polygenic trait architecture across 70 years of wheat breeding. Genome Biol 2021; 22:137. [PMID: 33957956 PMCID: PMC8101041 DOI: 10.1186/s13059-021-02354-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 04/16/2021] [Indexed: 11/25/2022] Open
Abstract
Background Selection has dramatically shaped genetic and phenotypic variation in bread wheat. We can assess the genomic basis of historical phenotypic changes, and the potential for future improvement, using experimental populations that attempt to undo selection through the randomizing effects of recombination. Results We bred the NIAB Diverse MAGIC multi-parent population comprising over 500 recombinant inbred lines, descended from sixteen historical UK bread wheat varieties released between 1935 and 2004. We sequence the founders’ genes and promoters by capture, and the MAGIC population by low-coverage whole-genome sequencing. We impute 1.1 M high-quality SNPs that are over 99% concordant with array genotypes. Imputation accuracy only marginally improves when including the founders’ genomes as a haplotype reference panel. Despite capturing 73% of global wheat genetic polymorphism, 83% of genes cluster into no more than three haplotypes. We phenotype 47 agronomic traits over 2 years and map 136 genome-wide significant associations, concentrated at 42 genetic loci with large and often pleiotropic effects. Around half of these overlap known quantitative trait loci. Most traits exhibit extensive polygenicity, as revealed by multi-locus shrinkage modelling. Conclusions Our results are consistent with a gene pool of low haplotypic diversity, containing few novel loci of large effect. Most past, and projected future, phenotypic changes arising from existing variation involve fine-scale shuffling of a few haplotypes to recombine dozens of polygenic alleles of small effect. Moreover, extensive pleiotropy means selection on one trait will have unintended consequences, exemplified by the negative trade-off between yield and protein content, unless selection and recombination can break unfavorable trait-trait associations. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02354-7.
Collapse
Affiliation(s)
- Michael F Scott
- University College London (UCL) Genetics Institute, Gower St, London, WC1E 6BT, UK.,Current address: School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Nick Fradgley
- National Institute for Agricultural Botany (NIAB), 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Alison R Bentley
- National Institute for Agricultural Botany (NIAB), 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK.,Current address: International Maize and Wheat Improvement Center (CIMMYT), El Batán, Texcoco, Mexico
| | | | - Fiona Corke
- The National Plant Phenomics Centre, Institute of Biological, Rural and Environmental Sciences (IBERS), Aberystwyth University, Gogerddan, Aberystwyth, SY23 3EE, UK
| | - Keith A Gardner
- National Institute for Agricultural Botany (NIAB), 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Richard Horsnell
- National Institute for Agricultural Botany (NIAB), 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Phil Howell
- National Institute for Agricultural Botany (NIAB), 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | | | - Ian J Mackay
- National Institute for Agricultural Botany (NIAB), 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK.,Current address: SRUC, Peter Wilson Building King's Buildings, W Mains Rd, Edinburgh, EH9 3JG, UK
| | - Richard Mott
- University College London (UCL) Genetics Institute, Gower St, London, WC1E 6BT, UK.
| | - James Cockram
- National Institute for Agricultural Botany (NIAB), 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK.
| |
Collapse
|
12
|
Item response theory as a feature selection and interpretation tool in the context of machine learning. Med Biol Eng Comput 2021; 59:471-482. [PMID: 33534111 DOI: 10.1007/s11517-020-02301-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 12/22/2020] [Indexed: 10/22/2022]
Abstract
Optimizing the number and utility of features to use in a classification analysis has been the subject of many research studies. Most current models use end-classifications as part of the feature reduction process, leading to circularity in the methodology. The approach demonstrated in the present research uses item response theory (IRT) to select features independent of the end-classification results without the biased accuracies that this circularity engenders. Dichotomous and polytomous IRT models were used to analyze 30 histological breast cancer features from 569 patients using the Wisconsin Diagnostic Breast Cancer data set. Based on their characteristics, three features were selected for use in a machine learning classifier. For comparison purposes, two machine learning-based feature selection protocols were run-recursive feature elimination (RFE) and ridge regression-and the three features selected from these analyses were also used in the subsequent learning classifier. Classification results demonstrated that all three selection processes performed comparably. The non-biased nature of the IRT protocol and information provided about the specific characteristics of the features as to why they are of use in classification help to shed light on understanding which attributes of features make them suitable for use in a machine learning context.
Collapse
|
13
|
Frouin A, Dandine-Roulland C, Pierre-Jean M, Deleuze JF, Ambroise C, Le Floch E. Exploring the Link Between Additive Heritability and Prediction Accuracy From a Ridge Regression Perspective. Front Genet 2020; 11:581594. [PMID: 33329721 PMCID: PMC7672157 DOI: 10.3389/fgene.2020.581594] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
Genome-Wide Association Studies (GWAS) explain only a small fraction of heritability for most complex human phenotypes. Genomic heritability estimates the variance explained by the SNPs on the whole genome using mixed models and accounts for the many small contributions of SNPs in the explanation of a phenotype. This paper approaches heritability from a machine learning perspective, and examines the close link between mixed models and ridge regression. Our contribution is two-fold. First, we propose estimating genomic heritability using a predictive approach via ridge regression and Generalized Cross Validation (GCV). We show that this is consistent with classical mixed model based estimation. Second, we derive simple formulae that express prediction accuracy as a function of the ratio n p , where n is the population size and p the total number of SNPs. These formulae clearly show that a high heritability does not imply an accurate prediction when p > n. Both the estimation of heritability via GCV and the prediction accuracy formulae are validated using simulated data and real data from UK Biobank.
Collapse
Affiliation(s)
- Arthur Frouin
- CNRGH, Institut Jacob, CEA - Université Paris-Saclay, Évry, France
| | | | | | - Jean-François Deleuze
- CNRGH, Institut Jacob, CEA - Université Paris-Saclay, Évry, France.,Centre d'Etude du Polymorphisme Humain, Fondation Jean Dausset, Paris, France
| | - Christophe Ambroise
- LaMME, Université Paris-Saclay, CNRS, Université d'Évry val d'Essonne, Évry, France
| | - Edith Le Floch
- CNRGH, Institut Jacob, CEA - Université Paris-Saclay, Évry, France
| |
Collapse
|
14
|
Bernardo R. Reinventing quantitative genetics for plant breeding: something old, something new, something borrowed, something BLUE. Heredity (Edinb) 2020; 125:375-385. [PMID: 32296132 PMCID: PMC7784685 DOI: 10.1038/s41437-020-0312-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 03/23/2020] [Accepted: 03/23/2020] [Indexed: 01/19/2023] Open
Abstract
The goals of quantitative genetics differ according to its field of application. In plant breeding, the main focus of quantitative genetics is on identifying candidates with the best genotypic value for a target population of environments. Keeping quantitative genetics current requires keeping old concepts that remain useful, letting go of what has become archaic, and introducing new concepts and methods that support contemporary breeding. The core concept of continuous variation being due to multiple Mendelian loci remains unchanged. Because the entirety of germplasm available in a breeding program is not in Hardy-Weinberg equilibrium, classical concepts that assume random mating, such as the average effect of an allele and additive variance, need to be retired in plant breeding. Doing so is feasible because with molecular markers, mixed-model approaches that require minimal genetic assumptions can be used for best linear unbiased estimation (BLUE) and prediction. Plant breeding would benefit from borrowing approaches found useful in other disciplines. Examples include reliability as a new measure of the influence of genetic versus nongenetic effects, and operations research and simulation approaches for designing breeding programs. The genetic entities in such simulations should not be generic but should be represented by the pedigrees, marker data, and phenotypic data for the actual germplasm in a breeding program. Over the years, quantitative genetics in plant breeding has become increasingly empirical and computational and less grounded in theory. This trend will continue as the amount and types of data available in a breeding program increase.
Collapse
Affiliation(s)
- Rex Bernardo
- Department of Agronomy and Plant Genetics, University of Minnesota, 411 Borlaug Hall, 1991 Buford Circle, Saint Paul, MN, 55108, USA.
| |
Collapse
|
15
|
CUX2, BRAP and ALDH2 are associated with metabolic traits in people with excessive alcohol consumption. Sci Rep 2020; 10:18118. [PMID: 33093602 PMCID: PMC7583246 DOI: 10.1038/s41598-020-75199-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Accepted: 10/12/2020] [Indexed: 12/21/2022] Open
Abstract
Molecular mechanisms that prompt or mitigate excessive alcohol consumption could be partly explained by metabolic shifts. This genome-wide association study aims to identify the susceptibility gene loci for excessive alcohol consumption by jointly measuring weekly alcohol consumption and γ-GT levels. We analysed the Taiwan Biobank data of 18,363 Taiwanese people, including 1945 with excessive alcohol use. We found that one or two copies of the G allele in rs671 (ALDH2) increased the risk of excessive alcohol consumption, while one or two copies of the C allele in rs3782886 (BRAP) reduced the risk of excessive alcohol consumption. To minimize the influence of extensive regional linkage disequilibrium, we used the ridge regression. The ridge coefficients of rs7398833, rs671 and rs3782886 were unchanged across different values of the shrinkage parameter. The three variants corresponded to posttranscriptional activity, including cut-like homeobox 2 (a protein coded by CUX2), Glu504Lys of acetaldehyde dehydrogenase 2 (a protein encoded by ALDH2) and Glu4Gly of BRCA1-associated protein (a protein encoded by BRAP). We found that Glu504Lys of ALDH2 and Glu4Gly of BRAP are involved in the negative regulation of excessive alcohol consumption. The mechanism underlying the γ-GT-catalytic metabolic reaction in excessive alcohol consumption is associated with ALDH2, BRAP and CUX2. Further study is needed to clarify the roles of ALDH2, BRAP and CUX2 in the liver–brain endocrine axis connecting metabolic shifts with excessive alcohol consumption.
Collapse
|
16
|
Zhou J, Qiu Y, Chen S, Liu L, Liao H, Chen H, Lv S, Li X. A Novel Three-Stage Framework for Association Analysis Between SNPs and Brain Regions. Front Genet 2020; 11:572350. [PMID: 33193677 PMCID: PMC7542238 DOI: 10.3389/fgene.2020.572350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 08/17/2020] [Indexed: 12/17/2022] Open
Abstract
Motivation: At present, a number of correlation analysis methods between SNPs and ROIs have been devised to explore the pathogenic mechanism of Alzheimer's disease. However, some of the deficiencies inherent in these methods, including lack of statistical efficacy and biological meaning. This study aims at addressing issues: insufficient correlation by previous methods (relative high regression error) and the lack of biological meaning in association analysis. Results: In this paper, a novel three-stage SNPs and ROIs correlation analysis framework is proposed. Firstly, clustering algorithm is applied to remove the potential linkage unbalanced structure of two SNPs. Then, the group sparse model is used to introduce prior information such as gene structure and linkage unbalanced structure to select feature SNPs. After the above steps, each SNP has a weight vector corresponding to each ROI, and the importance of SNPs can be judged according to the weights in the feature vector, and then the feature SNPs can be selected. Finally, for the selected feature SNPS, a support vector machine regression model is used to implement the prediction of the ROIs phenotype values. The experimental results under multiple performance measures show that the proposed method has better accuracy than other methods.
Collapse
Affiliation(s)
- Juan Zhou
- School of Software, East China Jiaotong University, Nanchang, China
| | - Yangping Qiu
- School of Software, East China Jiaotong University, Nanchang, China
| | - Shuo Chen
- School of Software, East China Jiaotong University, Nanchang, China
| | - Liyue Liu
- School of Software, East China Jiaotong University, Nanchang, China
| | - Huifa Liao
- School of Software, East China Jiaotong University, Nanchang, China
| | - Hongli Chen
- School of Software, East China Jiaotong University, Nanchang, China
| | - Shanguo Lv
- School of Software, East China Jiaotong University, Nanchang, China
| | - Xiong Li
- School of Software, East China Jiaotong University, Nanchang, China
| |
Collapse
|
17
|
Thomas M, Sakoda LC, Hoffmeister M, Rosenthal EA, Lee JK, van Duijnhoven FJB, Platz EA, Wu AH, Dampier CH, de la Chapelle A, Wolk A, Joshi AD, Burnett-Hartman A, Gsur A, Lindblom A, Castells A, Win AK, Namjou B, Van Guelpen B, Tangen CM, He Q, Li CI, Schafmayer C, Joshu CE, Ulrich CM, Bishop DT, Buchanan DD, Schaid D, Drew DA, Muller DC, Duggan D, Crosslin DR, Albanes D, Giovannucci EL, Larson E, Qu F, Mentch F, Giles GG, Hakonarson H, Hampel H, Stanaway IB, Figueiredo JC, Huyghe JR, Minnier J, Chang-Claude J, Hampe J, Harley JB, Visvanathan K, Curtis KR, Offit K, Li L, Le Marchand L, Vodickova L, Gunter MJ, Jenkins MA, Slattery ML, Lemire M, Woods MO, Song M, Murphy N, Lindor NM, Dikilitas O, Pharoah PDP, Campbell PT, Newcomb PA, Milne RL, MacInnis RJ, Castellví-Bel S, Ogino S, Berndt SI, Bézieau S, Thibodeau SN, Gallinger SJ, Zaidi SH, Harrison TA, Keku TO, Hudson TJ, Vymetalkova V, Moreno V, Martín V, Arndt V, Wei WQ, Chung W, Su YR, Hayes RB, White E, Vodicka P, Casey G, Gruber SB, Schoen RE, Chan AT, Potter JD, Brenner H, Jarvik GP, Corley DA, Peters U, Hsu L. Genome-wide Modeling of Polygenic Risk Score in Colorectal Cancer Risk. Am J Hum Genet 2020; 107:432-444. [PMID: 32758450 PMCID: PMC7477007 DOI: 10.1016/j.ajhg.2020.07.006] [Citation(s) in RCA: 101] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Accepted: 07/13/2020] [Indexed: 02/08/2023] Open
Abstract
Accurate colorectal cancer (CRC) risk prediction models are critical for identifying individuals at low and high risk of developing CRC, as they can then be offered targeted screening and interventions to address their risks of developing disease (if they are in a high-risk group) and avoid unnecessary screening and interventions (if they are in a low-risk group). As it is likely that thousands of genetic variants contribute to CRC risk, it is clinically important to investigate whether these genetic variants can be used jointly for CRC risk prediction. In this paper, we derived and compared different approaches to generating predictive polygenic risk scores (PRS) from genome-wide association studies (GWASs) including 55,105 CRC-affected case subjects and 65,079 control subjects of European ancestry. We built the PRS in three ways, using (1) 140 previously identified and validated CRC loci; (2) SNP selection based on linkage disequilibrium (LD) clumping followed by machine-learning approaches; and (3) LDpred, a Bayesian approach for genome-wide risk prediction. We tested the PRS in an independent cohort of 101,987 individuals with 1,699 CRC-affected case subjects. The discriminatory accuracy, calculated by the age- and sex-adjusted area under the receiver operating characteristics curve (AUC), was highest for the LDpred-derived PRS (AUC = 0.654) including nearly 1.2 M genetic variants (the proportion of causal genetic variants for CRC assumed to be 0.003), whereas the PRS of the 140 known variants identified from GWASs had the lowest AUC (AUC = 0.629). Based on the LDpred-derived PRS, we are able to identify 30% of individuals without a family history as having risk for CRC similar to those with a family history of CRC, whereas the PRS based on known GWAS variants identified only top 10% as having a similar relative risk. About 90% of these individuals have no family history and would have been considered average risk under current screening guidelines, but might benefit from earlier screening. The developed PRS offers a way for risk-stratified CRC screening and other targeted interventions.
Collapse
Affiliation(s)
- Minta Thomas
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Lori C Sakoda
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
| | - Michael Hoffmeister
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Elisabeth A Rosenthal
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA 98195, USA
| | - Jeffrey K Lee
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
| | - Franzel J B van Duijnhoven
- Division of Human Nutrition and Health, Wageningen University & Research, Wageningen 176700, the Netherlands
| | - Elizabeth A Platz
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, and the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287, USA
| | - Anna H Wu
- University of Southern California, Preventative Medicine, Los Angeles, CA 90089, USA
| | - Christopher H Dampier
- Department of Surgery, University of Virginia Health System, Charlottesville, VA 22903, USA
| | - Albert de la Chapelle
- Department of Cancer Biology and Genetics and the Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Alicja Wolk
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm 17177, Sweden
| | - Amit D Joshi
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | | | - Andrea Gsur
- Institute of Cancer Research, Department of Medicine I, Medical University Vienna, Vienna 1090, Austria
| | - Annika Lindblom
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm 17177, Sweden; Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm 17177, Sweden
| | - Antoni Castells
- Gastroenterology Department, Hospital Clínic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), University of Barcelona, Barcelona 08007, Spain
| | - Aung Ko Win
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Bahram Namjou
- Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Cincinnati VA Medical Center, Cincinnati, OH 45229, USA
| | - Bethany Van Guelpen
- Department of Radiation Sciences, Oncology Unit, Umeå University, Umeå 90187, Sweden; Wallenberg Centre for Molecular Medicine, Umeå University, Umeå 90187, Sweden
| | - Catherine M Tangen
- SWOG Statistical Center, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Qianchuan He
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Christopher I Li
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Clemens Schafmayer
- Department of General Surgery, University Hospital Rostock, Rostock 18051, Germany
| | - Corinne E Joshu
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, and the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287, USA
| | - Cornelia M Ulrich
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT 84112, USA
| | - D Timothy Bishop
- Leeds Institute of Cancer and Pathology, University of Leeds, Leeds LS2 9JT, UK
| | - Daniel D Buchanan
- University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Parkville, VIC 3010, Australia; Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, VIC 3010, Australia; Genomic Medicine and Family Cancer Clinic, Royal Melbourne Hospital, Parkville, VIC 3010, Australia
| | - Daniel Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA
| | - David A Drew
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - David C Muller
- School of Public Health, Imperial College London, London SW7 2AZ, UK
| | - David Duggan
- Translational Genomics Research Institute - An Affiliate of City of Hope, Phoenix, AZ 85003, USA
| | - David R Crosslin
- Department of Bioinformatics and Medical Education, University of Washington Medical Center, Seattle, WA 98195, USA
| | - Demetrius Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Edward L Giovannucci
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Nutrition, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02108, USA
| | - Eric Larson
- Kaiser Permanente Washington Research Institute, Seattle, WA 98101, USA
| | - Flora Qu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Frank Mentch
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Graham G Giles
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia; Cancer Epidemiology Division, Cancer Council Victoria, 615 St Kilda Road, Melbourne, VIC 3004, Australia; Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC 3168, Australia
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Heather Hampel
- Division of Human Genetics, Department of Internal Medicine, The Ohio State University Comprehensive Cancer Center, Columbus, OH 43210, USA
| | - Ian B Stanaway
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA 98195, USA
| | - Jane C Figueiredo
- Department of Medicine, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA; Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Jeroen R Huyghe
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Jessica Minnier
- School of Public Health, Oregon Health & Science University, Portland, OR 97239, USA
| | - Jenny Chang-Claude
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, 69120 Germany; University Medical Centre Hamburg-Eppendorf, University Cancer Centre Hamburg (UCCH), Hamburg 20246, Germany
| | - Jochen Hampe
- Department of Medicine I, University Hospital Dresden, Technische Universität Dresden (TU Dresden), Dresden 01062, Germany
| | - John B Harley
- Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Cincinnati VA Medical Center, Cincinnati, OH 45229, USA
| | - Kala Visvanathan
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, and the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Baltimore, MD 21287, USA
| | - Keith R Curtis
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Kenneth Offit
- Clinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medical College, NY 10065, USA
| | - Li Li
- Department of Family Medicine, University of Virginia, Charlottesville, VA 22903, USA
| | | | - Ludmila Vodickova
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, 142 20 Prague 4, Czech Republic; Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, 128 00 Prague, Czech Republic; Faculty of Medicine and Biomedical Center in Pilsen, Charles University, 323 00 Pilsen, Czech Republic
| | - Marc J Gunter
- Nutrition and Metabolism Section, International Agency for Research on Cancer, World Health Organization, Lyon 69372, France
| | - Mark A Jenkins
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia
| | - Martha L Slattery
- Department of Internal Medicine, University of Utah, Salt Lake City, UT 84132, USA
| | - Mathieu Lemire
- PanCuRx Translational Research Initiative, Ontario, Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Michael O Woods
- Memorial University of Newfoundland, Discipline of Genetics, St. John's, NL A1B 3R7, Canada
| | - Mingyang Song
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Department of Nutrition, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - Neil Murphy
- Nutrition and Metabolism Section, International Agency for Research on Cancer, World Health Organization, Lyon 69372, France
| | - Noralane M Lindor
- Department of Health Science Research, Mayo Clinic, Scottsdale, AZ 85260, USA
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | - Paul D P Pharoah
- Department of Public Health and Primary Care, University of Cambridge, Cambridge CB2 0SR, UK
| | - Peter T Campbell
- Behavioral and Epidemiology Research Group, American Cancer Society, Atlanta, GA 30303, USA
| | - Polly A Newcomb
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; School of Public Health, University of Washington, Seattle, WA 98195, USA
| | - Roger L Milne
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia; Cancer Epidemiology Division, Cancer Council Victoria, 615 St Kilda Road, Melbourne, VIC 3004, Australia; Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC 3168, Australia
| | - Robert J MacInnis
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3000, Australia; Cancer Epidemiology Division, Cancer Council Victoria, 615 St Kilda Road, Melbourne, VIC 3004, Australia
| | - Sergi Castellví-Bel
- Gastroenterology Department, Hospital Clínic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), University of Barcelona, Barcelona 08007, Spain
| | - Shuji Ogino
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Program in MPE Molecular Pathological Epidemiology, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA; Department of Oncologic Pathology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Sonja I Berndt
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Stéphane Bézieau
- Service de Génétique Médicale, Centre Hospitalier Universitaire (CHU) Nantes, Nantes 44093, France
| | - Stephen N Thibodeau
- Division of Laboratory Genetics, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN 85054, USA
| | - Steven J Gallinger
- Lunenfeld Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, ON M5G1X5, Canada
| | - Syed H Zaidi
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Tabitha A Harrison
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Temitope O Keku
- Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Thomas J Hudson
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Veronika Vymetalkova
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, 142 20 Prague 4, Czech Republic; Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, 128 00 Prague, Czech Republic; Faculty of Medicine and Biomedical Center in Pilsen, Charles University, 323 00 Pilsen, Czech Republic
| | - Victor Moreno
- Oncology Data Analytics Program, Catalan Institute of Oncology, L'Hospitalet de Llobregat, Barcelona 08908, Spain; CIBER Epidemiología y Salud Pública (CIBERESP), Madrid 28029, Spain; Department of Clinical Sciences, Faculty of Medicine, University of Barcelona, Barcelona 08907, Spain; ONCOBEL Program, Bellvitge Biomedical Research Institute (IDIBELL), L'Hospitalet de Llobregat, Barcelona 08908, Spain
| | - Vicente Martín
- CIBER Epidemiología y Salud Pública (CIBERESP), Madrid 28029, Spain; Biomedicine Institute (IBIOMED), University of León, León 24071, Spain
| | - Volker Arndt
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Wendy Chung
- Office of Research & Development, Department of Veterans Affairs, Washington, DC 20420, USA; Departments of Pediatrics and Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Yu-Ru Su
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Richard B Hayes
- Division of Epidemiology, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| | - Emily White
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
| | - Pavel Vodicka
- Department of Molecular Biology of Cancer, Institute of Experimental Medicine of the Czech Academy of Sciences, 142 20 Prague 4, Czech Republic; Institute of Biology and Medical Genetics, First Faculty of Medicine, Charles University, 128 00 Prague, Czech Republic; Faculty of Medicine and Biomedical Center in Pilsen, Charles University, 323 00 Pilsen, Czech Republic
| | - Graham Casey
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
| | - Stephen B Gruber
- Department of Preventive Medicine, USC Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - Robert E Schoen
- Department of Medicine and Epidemiology, University of Pittsburgh Medical Center, Pittsburgh, PA 15219, USA
| | - Andrew T Chan
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
| | - John D Potter
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Centre for Public Health Research, Massey University, Wellington 6140, New Zealand
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg 69120, Germany; Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg 69120, Germany; German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg 69120, Germany
| | - Gail P Jarvik
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA 98195, USA; Genome Sciences, University of Washington Medical Center, Seattle, WA 98195, USA
| | - Douglas A Corley
- Division of Research, Kaiser Permanente Northern California, Oakland, CA 94612, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Epidemiology, University of Washington, Seattle, WA 98195, USA.
| | - Li Hsu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
18
|
Han Y, Adolphs R. Estimating the heritability of psychological measures in the Human Connectome Project dataset. PLoS One 2020; 15:e0235860. [PMID: 32645058 PMCID: PMC7347217 DOI: 10.1371/journal.pone.0235860] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 06/24/2020] [Indexed: 12/03/2022] Open
Abstract
The Human Connectome Project (HCP) is a large structural and functional MRI dataset with a rich array of behavioral and genotypic measures, as well as a biologically verified family structure. This makes it a valuable resource for investigating questions about individual differences, including questions about heritability. While its MRI data have been analyzed extensively in this regard, to our knowledge a comprehensive estimation of the heritability of the behavioral dataset has never been conducted. Using a set of behavioral measures of personality, emotion and cognition, we show that it is possible to re-identify the same individual across two testing times (fingerprinting), and to identify identical twins significantly above chance. Standard heritability estimates of 37 behavioral measures were derived from twin correlations, and machine-learning models (univariate linear model, Ridge classifier and Random Forest model) were trained to classify monozygotic twins and dizygotic twins. Correlations between the standard heritability metric and each set of model weights ranged from 0.36 to 0.7, and questionnaire-based and task-based measures did not differ significantly in their heritability. We further explored the heritability of a smaller number of latent factors extracted from the 37 measures and repeated the heritability estimation; in this case, the correlations between the standard heritability and each set of model weights were lower, ranging from 0.05 to 0.43. One specific discrepancy arose for the general intelligence factor, which all models assigned high importance, but the standard heritability calculation did not. We present a thorough investigation of the heritabilities of the behavioral measures in the HCP as a resource for other investigators, and illustrate the utility of machine-learning methods for qualitative characterization of the differential heritability across diverse measures.
Collapse
Affiliation(s)
- Yanting Han
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, United States of America
- * E-mail:
| | - Ralph Adolphs
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, United States of America
- Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, United States of America
- Chen Neuroscience Institute, California Institute of Technology, Pasadena, CA, United States of America
| |
Collapse
|
19
|
Hybrid Breeding for MLN Resistance: Heterosis, Combining Ability, and Hybrid Prediction. PLANTS 2020; 9:plants9040468. [PMID: 32276322 PMCID: PMC7238107 DOI: 10.3390/plants9040468] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 03/24/2020] [Accepted: 03/25/2020] [Indexed: 11/18/2022]
Abstract
Prior knowledge on heterosis and quantitative genetic parameters on maize lethal necrosis (MLN) can help the breeders to develop numerous resistant or tolerant hybrids with optimum resources. Our objectives were to (1) estimate the quantitative genetic parameters for MLN disease severity, (2) investigate the efficiency of the prediction of hybrid performance based on parental per se and general combining ability (GCA) effects, and (3) examine the potential of hybrid prediction for MLN resistance or tolerance based on markers. Fifty elite maize inbred lines were selected based on their response to MLN under artificial inoculation. Crosses were made in a half diallel mating design to produce 307 F1 hybrids. All hybrids were evaluated in MLN quarantine facility in Naivasha, Kenya for two seasons under artificial inoculation. All 50 inbreds were genotyped with genotyping-by-sequencing (GBS) SNPs. The phenotypic variation was significant for all traits and the heritability was moderate to high. We observed that hybrids were superior to the mean performance of the parents for disease severity (−14.57%) and area under disease progress curve (AUDPC) (14.9%). Correlations were significant and moderate between line per se and GCA; and mean of parental value with hybrid performance for both disease severity and AUDPC value. Very low and negative correlation was observed between parental lines marker based genetic distance and heterosis. Nevertheless, the correlation of GCA effects was very high with hybrid performance which can suggests as a good predictor of MLN resistance. Genomic prediction of hybrid performance for MLN is high for both traits. We therefore conclude that there is potential for prediction of hybrid performance for MLN. Overall, the estimated quantitative genetic parameters suggest that through targeted approach, it is possible to develop outstanding lines and hybrids for MLN resistance.
Collapse
|
20
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
21
|
|
22
|
Nyaga C, Gowda M, Beyene Y, Muriithi WT, Makumbi D, Olsen MS, Suresh LM, Bright JM, Das B, Prasanna BM. Genome-Wide Analyses and Prediction of Resistance to MLN in Large Tropical Maize Germplasm. Genes (Basel) 2019; 11:genes11010016. [PMID: 31877962 PMCID: PMC7016728 DOI: 10.3390/genes11010016] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 12/17/2019] [Accepted: 12/18/2019] [Indexed: 11/16/2022] Open
Abstract
Maize lethal necrosis (MLN), caused by co-infection of maize chlorotic mottle virus and sugarcane mosaic virus, can lead up to 100% yield loss. Identification and validation of genomic regions can facilitate marker assisted breeding for resistance to MLN. Our objectives were to identify marker-trait associations using genome wide association study and assess the potential of genomic prediction for MLN resistance in a large panel of diverse maize lines. A set of 1400 diverse maize tropical inbred lines were evaluated for their response to MLN under artificial inoculation by measuring disease severity or incidence and area under disease progress curve (AUDPC). All lines were genotyped with genotyping by sequencing (GBS) SNPs. The phenotypic variation was significant for all traits and the heritability estimates were moderate to high. GWAS revealed 32 significantly associated SNPs for MLN resistance (at p < 1.0 × 10−6). For disease severity, these significantly associated SNPs individually explained 3–5% of the total phenotypic variance, whereas for AUDPC they explained 3–12% of the total proportion of phenotypic variance. Most of significant SNPs were consistent with the previous studies and assists to validate and fine map the big quantitative trait locus (QTL) regions into few markers’ specific regions. A set of putative candidate genes associated with the significant markers were identified and their functions revealed to be directly or indirectly involved in plant defense responses. Genomic prediction revealed reasonable prediction accuracies. The prediction accuracies significantly increased with increasing marker densities and training population size. These results support that MLN is a complex trait controlled by few major and many minor effect genes.
Collapse
Affiliation(s)
- Christine Nyaga
- Department of Agricultural Science and Technology, Kenyatta University, Nairobi 43844-00100, Kenya; (C.N.); (W.T.M.)
- International Maize and Wheat Improvement Centre (CIMMYT), World Agroforestry Centre (ICRAF), United Nations Avenue, Gigiri, Nairobi 1041-00621, Kenya; (Y.B.); (D.M.); (M.S.O.); (L.M.S.); (J.M.B.); (B.D.); (B.M.P.)
| | - Manje Gowda
- International Maize and Wheat Improvement Centre (CIMMYT), World Agroforestry Centre (ICRAF), United Nations Avenue, Gigiri, Nairobi 1041-00621, Kenya; (Y.B.); (D.M.); (M.S.O.); (L.M.S.); (J.M.B.); (B.D.); (B.M.P.)
- Correspondence: ; Tel.: +254-727-019-454
| | - Yoseph Beyene
- International Maize and Wheat Improvement Centre (CIMMYT), World Agroforestry Centre (ICRAF), United Nations Avenue, Gigiri, Nairobi 1041-00621, Kenya; (Y.B.); (D.M.); (M.S.O.); (L.M.S.); (J.M.B.); (B.D.); (B.M.P.)
| | - Wilson T. Muriithi
- Department of Agricultural Science and Technology, Kenyatta University, Nairobi 43844-00100, Kenya; (C.N.); (W.T.M.)
| | - Dan Makumbi
- International Maize and Wheat Improvement Centre (CIMMYT), World Agroforestry Centre (ICRAF), United Nations Avenue, Gigiri, Nairobi 1041-00621, Kenya; (Y.B.); (D.M.); (M.S.O.); (L.M.S.); (J.M.B.); (B.D.); (B.M.P.)
| | - Michael S. Olsen
- International Maize and Wheat Improvement Centre (CIMMYT), World Agroforestry Centre (ICRAF), United Nations Avenue, Gigiri, Nairobi 1041-00621, Kenya; (Y.B.); (D.M.); (M.S.O.); (L.M.S.); (J.M.B.); (B.D.); (B.M.P.)
| | - L. M. Suresh
- International Maize and Wheat Improvement Centre (CIMMYT), World Agroforestry Centre (ICRAF), United Nations Avenue, Gigiri, Nairobi 1041-00621, Kenya; (Y.B.); (D.M.); (M.S.O.); (L.M.S.); (J.M.B.); (B.D.); (B.M.P.)
| | - Jumbo M. Bright
- International Maize and Wheat Improvement Centre (CIMMYT), World Agroforestry Centre (ICRAF), United Nations Avenue, Gigiri, Nairobi 1041-00621, Kenya; (Y.B.); (D.M.); (M.S.O.); (L.M.S.); (J.M.B.); (B.D.); (B.M.P.)
| | - Biswanath Das
- International Maize and Wheat Improvement Centre (CIMMYT), World Agroforestry Centre (ICRAF), United Nations Avenue, Gigiri, Nairobi 1041-00621, Kenya; (Y.B.); (D.M.); (M.S.O.); (L.M.S.); (J.M.B.); (B.D.); (B.M.P.)
| | - Boddupalli M. Prasanna
- International Maize and Wheat Improvement Centre (CIMMYT), World Agroforestry Centre (ICRAF), United Nations Avenue, Gigiri, Nairobi 1041-00621, Kenya; (Y.B.); (D.M.); (M.S.O.); (L.M.S.); (J.M.B.); (B.D.); (B.M.P.)
| |
Collapse
|
23
|
Li J, Veeranampalayam-Sivakumar AN, Bhatta M, Garst ND, Stoll H, Stephen Baenziger P, Belamkar V, Howard R, Ge Y, Shi Y. Principal variable selection to explain grain yield variation in winter wheat from features extracted from UAV imagery. PLANT METHODS 2019; 15:123. [PMID: 31695728 PMCID: PMC6824016 DOI: 10.1186/s13007-019-0508-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 10/19/2019] [Indexed: 05/23/2023]
Abstract
BACKGROUND Automated phenotyping technologies are continually advancing the breeding process. However, collecting various secondary traits throughout the growing season and processing massive amounts of data still take great efforts and time. Selecting a minimum number of secondary traits that have the maximum predictive power has the potential to reduce phenotyping efforts. The objective of this study was to select principal features extracted from UAV imagery and critical growth stages that contributed the most in explaining winter wheat grain yield. Five dates of multispectral images and seven dates of RGB images were collected by a UAV system during the spring growing season in 2018. Two classes of features (variables), totaling to 172 variables, were extracted for each plot from the vegetation index and plant height maps, including pixel statistics and dynamic growth rates. A parametric algorithm, LASSO regression (the least angle and shrinkage selection operator), and a non-parametric algorithm, random forest, were applied for variable selection. The regression coefficients estimated by LASSO and the permutation importance scores provided by random forest were used to determine the ten most important variables influencing grain yield from each algorithm. RESULTS Both selection algorithms assigned the highest importance score to the variables related with plant height around the grain filling stage. Some vegetation indices related variables were also selected by the algorithms mainly at earlier to mid growth stages and during the senescence. Compared with the yield prediction using all 172 variables derived from measured phenotypes, using the selected variables performed comparable or even better. We also noticed that the prediction accuracy on the adapted NE lines (r = 0.58-0.81) was higher than the other lines (r = 0.21-0.59) included in this study with different genetic backgrounds. CONCLUSIONS With the ultra-high resolution plot imagery obtained by the UAS-based phenotyping we are now able to derive more features, such as the variation of plant height or vegetation indices within a plot other than just an averaged number, that are potentially very useful for the breeding purpose. However, too many features or variables can be derived in this way. The promising results from this study suggests that the selected set from those variables can have comparable prediction accuracies on the grain yield prediction than the full set of them but possibly resulting in a better allocation of efforts and resources on phenotypic data collection and processing.
Collapse
Affiliation(s)
- Jiating Li
- Department of Biological Systems Engineering, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| | | | - Madhav Bhatta
- Department of Agronomy, University of Wisconsin-Madison, Madison, WI 53706 USA
| | - Nicholas D. Garst
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| | - Hannah Stoll
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| | - P. Stephen Baenziger
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| | - Vikas Belamkar
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| | - Reka Howard
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| | - Yufeng Ge
- Department of Biological Systems Engineering, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| | - Yeyin Shi
- Department of Biological Systems Engineering, University of Nebraska-Lincoln, Lincoln, NE 68583 USA
| |
Collapse
|
24
|
Williams DR, Rhemtulla M, Wysocki AC, Rast P. On Nonregularized Estimation of Psychological Networks. MULTIVARIATE BEHAVIORAL RESEARCH 2019; 54:719-750. [PMID: 30957629 PMCID: PMC6736701 DOI: 10.1080/00273171.2019.1575716] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
An important goal for psychological science is developing methods to characterize relationships between variables. Customary approaches use structural equation models to connect latent factors to a number of observed measurements, or test causal hypotheses between observed variables. More recently, regularized partial correlation networks have been proposed as an alternative approach for characterizing relationships among variables through off-diagonal elements in the precision matrix. While the graphical Lasso (glasso) has emerged as the default network estimation method, it was optimized in fields outside of psychology with very different needs, such as high dimensional data where the number of variables (p) exceeds the number of observations (n). In this article, we describe the glasso method in the context of the fields where it was developed, and then we demonstrate that the advantages of regularization diminish in settings where psychological networks are often fitted ( p≪n ). We first show that improved properties of the precision matrix, such as eigenvalue estimation, and predictive accuracy with cross-validation are not always appreciable. We then introduce nonregularized methods based on multiple regression and a nonparametric bootstrap strategy, after which we characterize performance with extensive simulations. Our results demonstrate that the nonregularized methods can be used to reduce the false-positive rate, compared to glasso, and they appear to provide consistent performance across sparsity levels, sample composition (p/n), and partial correlation size. We end by reviewing recent findings in the statistics literature that suggest alternative methods often have superior performance than glasso, as well as suggesting areas for future research in psychology. The nonregularized methods have been implemented in the R package GGMnonreg.
Collapse
Affiliation(s)
- Donald R Williams
- Department of Psychology, University of California , Davis , CA , USA
| | - Mijke Rhemtulla
- Department of Psychology, University of California , Davis , CA , USA
| | - Anna C Wysocki
- Department of Psychology, University of California , Davis , CA , USA
| | - Philippe Rast
- Department of Psychology, University of California , Davis , CA , USA
| |
Collapse
|
25
|
Li C, Huang Q, Yang R, Dai Y, Zeng Y, Tao L, Li X, Zeng J, Wang Q. Gut microbiota composition and bone mineral loss-epidemiologic evidence from individuals in Wuhan, China. Osteoporos Int 2019; 30:1003-1013. [PMID: 30666372 DOI: 10.1007/s00198-019-04855-5] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 01/13/2019] [Indexed: 12/11/2022]
Abstract
UNLABELLED We explored the association between gut microbiota composition and bone mineral loss in Chinese elderly people by high-throughput 16S ribosomal RNA (rRNA) gene sequencing. Compared with controls, a smaller number of operational taxonomic units (OTUs), several taxa with altered abundance, and specific functional pathways were found in individuals with low-bone mineral density (BMD). INTRODUCTION Gut microbiota plays important roles in human health and associates with a number of diseases. However, few studies explored its association with bone mineral loss in human. METHODS We collected 102 fecal samples from each eligible individual belonging to low-BMD and control groups for high-throughput 16S rRNA gene sequencing. RESULTS The low-BMD individuals had a smaller number of OTUs and bacterial taxa at each level. At the phylum level, Bacteroidetes were more abundant in the low-BMD group; Firmicutes were enriched in the control group; Firmicutes and Actinobacteria positively correlated and Bacteroidetes negatively correlated with the BMD and T-score in all subjects. At the family level, the abundance of Lachnospiraceae in low-BMD individuals reduced and positively correlated with BMD and T-score; meanwhile, BMD increased with increasing Bifidobacteriaceae. At the genus level, low-BMD individuals had decreased proportions of Roseburia compared with control ones (P < 0.05). Roseburia, Bifidobacterium, and Lactobacillus positively correlated with BMD and T-score. Furthermore, BMD increased with rising abundance of Bifidobacterium. Functional prediction revealed that 93 metabolic pathways significantly differed between the two groups (FDR-corrected P < 0.05). Most pathways, especially pathways related to LPS biosynthesis, were more abundant in low-BMD individuals than in control ones. CONCLUSIONS Several taxa with altered abundance and specific functional pathways were discovered in low-BMD individuals. Our findings provide novel epidemiologic evidence to elucidate the underlying microbiota-relevant mechanism in bone mineral loss and osteoporosis.
Collapse
Affiliation(s)
- C Li
- MOE Key Lab of Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Q Huang
- Department of Rehabilitation Medicine, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - R Yang
- Department of Health Checkup, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Y Dai
- Department of Nuclear Medicine, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Y Zeng
- Wuhan NO.1 Hospital, Wuhan, 430030, China
| | - L Tao
- MOE Key Lab of Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - X Li
- MOE Key Lab of Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - J Zeng
- Department of Health Checkup, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Q Wang
- MOE Key Lab of Environment and Health, Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.
| |
Collapse
|
26
|
Chasioti D, Yan J, Nho K, Saykin AJ. Progress in Polygenic Composite Scores in Alzheimer's and Other Complex Diseases. Trends Genet 2019; 35:371-382. [PMID: 30922659 PMCID: PMC6475476 DOI: 10.1016/j.tig.2019.02.005] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 02/12/2019] [Accepted: 02/22/2019] [Indexed: 11/25/2022]
Abstract
Advances in high-throughput genotyping and next-generation sequencing (NGS) coupled with larger sample sizes brings the realization of precision medicine closer than ever. Polygenic approaches incorporating the aggregate influence of multiple genetic variants can contribute to a better understanding of the genetic architecture of many complex diseases and facilitate patient stratification. This review addresses polygenic concepts, methodological developments, hypotheses, and key issues in study design. Polygenic risk scores (PRSs) have been applied to many complex diseases and here we focus on Alzheimer's disease (AD) as a primary exemplar. This review was designed to serve as a starting point for investigators wishing to use PRSs in their research and those interested in enhancing clinical study designs through enrichment strategies.
Collapse
Affiliation(s)
- Danai Chasioti
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA; Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | - Jingwen Yan
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA; Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | - Kwangsik Nho
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA; Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | - Andrew J Saykin
- Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| |
Collapse
|
27
|
Keaton SA, Madaj ZB, Heilman P, Smart L, Grit J, Gibbons R, Postolache T, Roaten K, Achtyes E, Brundin L. An inflammatory profile linked to increased suicide risk. J Affect Disord 2019; 247:57-65. [PMID: 30654266 PMCID: PMC6860980 DOI: 10.1016/j.jad.2018.12.100] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 11/25/2018] [Accepted: 12/24/2018] [Indexed: 12/19/2022]
Abstract
BACKGROUND Suicide risk assessments are often challenging for clinicians, and therefore, biological markers are warranted as guiding tools in these assessments. Suicidal patients display increased cytokine levels in peripheral blood, although the composite inflammatory profile in the subjects is still unknown. It is also not yet established whether certain inflammatory changes are specific to suicidal subjects. To address this, we measured 45 immunobiological factors in peripheral blood and identified the biological profiles associated with cross-diagnostic suicide risk and depression, respectively. METHODS Sixty-six women with mood and anxiety disorders underwent computerized adaptive testing for mental health, assessing depression and suicide risk. Weighted correlation network analysis was used to uncover system level associations between suicide risk, depression, and the immunobiological factors in plasma. Secondary regression models were used to establish the sensitivity of the results to potential confounders, including age, body mass index (BMI), treatment and symptoms of depression and anxiety. RESULTS The biological profile of patients assessed to be at increased suicide risk differed from that associated with depression. At the system level, a biological cluster containing increased levels of interleukin-6, lymphocytes, monocytes, white blood cell count and polymorphonuclear leukocyte count significantly impacted suicide risk, with the latter two inferring the strongest influence. The cytokine interleukin-8 was independently and negatively associated with increased suicide risk. The results remained after adjusting for confounders. LIMITATIONS This study is cross-sectional and not designed to prove causality. DISCUSSION A unique immunobiological profile was linked to increased suicide risk. The profile was different from that observed in patients with depressive symptoms, and indicates that granulocyte mediated biological mechanisms could be activated in patients at risk for suicide.
Collapse
Affiliation(s)
- Sarah A Keaton
- Department of Physiology, Michigan State University, East Lansing, MI, USA,Center for Neurodegenerative Science, Van Andel Research Institute, Grand Rapids, MI, USA
| | - Zachary B Madaj
- Bioinformatics and Biostatistics Core, Van Andel Research Institute, Grand Rapids, MI, USA
| | - Patrick Heilman
- Center for Neurodegenerative Science, Van Andel Research Institute, Grand Rapids, MI, USA
| | - LeAnn Smart
- Pine Rest Christian Mental Health Services, Grand Rapids, MI, USA
| | - Jamie Grit
- Center for Cancer and Cell Biology, Van Andel Research Institute, Grand Rapids, MI, USA
| | - Robert Gibbons
- Center for Health Statistics, Departments of Medicine and Public Health Sciences, University of Chicago, Illinois, USA
| | - Teodor Postolache
- Department of Psychiatry, University of Maryland-Baltimore School of Medicine, Baltimore, MD, USA,Rocky Mountain Mirecc, Denver, CO, USA
| | - Kimberly Roaten
- Department of Psychiatry, University of Texas Southwestern, Dallas, TX, USA
| | - Eric Achtyes
- Pine Rest Christian Mental Health Services, Grand Rapids, MI, USA,Division of Psychiatry & Behavioral Medicine, Michigan State University College of Human Medicine, Grand Rapids, Michigan, USA
| | - Lena Brundin
- Center for Neurodegenerative Science, Van Andel Research Institute, Grand Rapids, MI, USA.
| |
Collapse
|
28
|
Cherlin S, Plant D, Taylor JC, Colombo M, Spiliopoulou A, Tzanis E, Morgan AW, Barnes MR, McKeigue P, Barrett JH, Pitzalis C, Barton A, Consortium MATURA, Cordell HJ. Prediction of treatment response in rheumatoid arthritis patients using genome-wide SNP data. Genet Epidemiol 2018; 42:754-771. [PMID: 30311271 PMCID: PMC6334178 DOI: 10.1002/gepi.22159] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 07/06/2018] [Accepted: 07/28/2018] [Indexed: 01/13/2023]
Abstract
Although a number of treatments are available for rheumatoid arthritis (RA), each of them shows a significant nonresponse rate in patients. Therefore, predicting a priori the likelihood of treatment response would be of great patient benefit. Here, we conducted a comparison of a variety of statistical methods for predicting three measures of treatment response, between baseline and 3 or 6 months, using genome-wide SNP data from RA patients available from the MAximising Therapeutic Utility in Rheumatoid Arthritis (MATURA) consortium. Two different treatments and 11 different statistical methods were evaluated. We used 10-fold cross validation to assess predictive performance, with nested 10-fold cross validation used to tune the model hyperparameters when required. Overall, we found that SNPs added very little prediction information to that obtained using clinical characteristics only, such as baseline trait value. This observation can be explained by the lack of strong genetic effects and the relatively small sample sizes available; in analysis of simulated and real data, with larger effects and/or larger sample sizes, prediction performance was much improved. Overall, methods that were consistent with the genetic architecture of the trait were able to achieve better predictive ability than methods that were not. For treatment response in RA, methods that assumed a complex underlying genetic architecture achieved slightly better prediction performance than methods that assumed a simplified genetic architecture.
Collapse
Affiliation(s)
- Svetlana Cherlin
- Institute of Genetic MedicineNewcastle UniversityNewcastle upon TyneUK
| | - Darren Plant
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation TrustManchester Academic Health Science CentreManchesterUK
| | - John C. Taylor
- Leeds Institute of Cancer and PathologyUniversity of LeedsLeedsUK
- NIHR Leeds Biomedical Research CentreLeeds Teaching Hospitals NHS TrustLeedsUK
| | - Marco Colombo
- Centre for Population Health Sciences, Usher Institute of Population Health Sciences and InformaticsUniversity of EdinburghEdinburghUK
| | - Athina Spiliopoulou
- Centre for Population Health Sciences, Usher Institute of Population Health Sciences and InformaticsUniversity of EdinburghEdinburghUK
| | - Evan Tzanis
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and DentistryQueen Mary University of London and Barts Health NHS TrustLondonUK
| | - Ann W. Morgan
- NIHR Leeds Biomedical Research CentreLeeds Teaching Hospitals NHS TrustLeedsUK
- Leeds Institute of Rheumatic and Musculoskeletal MedicineUniversity of LeedsLeedsUK
| | - Michael R. Barnes
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and DentistryQueen Mary University of London and Barts Health NHS TrustLondonUK
| | - Paul McKeigue
- Centre for Population Health Sciences, Usher Institute of Population Health Sciences and InformaticsUniversity of EdinburghEdinburghUK
| | - Jennifer H. Barrett
- Leeds Institute of Cancer and PathologyUniversity of LeedsLeedsUK
- NIHR Leeds Biomedical Research CentreLeeds Teaching Hospitals NHS TrustLeedsUK
| | - Costantino Pitzalis
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and DentistryQueen Mary University of London and Barts Health NHS TrustLondonUK
| | - Anne Barton
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation TrustManchester Academic Health Science CentreManchesterUK
- Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal ResearchThe University of ManchesterManchesterUK
| | - MATURA Consortium
- Institute of Genetic MedicineNewcastle UniversityNewcastle upon TyneUK
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation TrustManchester Academic Health Science CentreManchesterUK
- Leeds Institute of Cancer and PathologyUniversity of LeedsLeedsUK
- NIHR Leeds Biomedical Research CentreLeeds Teaching Hospitals NHS TrustLeedsUK
- Centre for Population Health Sciences, Usher Institute of Population Health Sciences and InformaticsUniversity of EdinburghEdinburghUK
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and the London School of Medicine and DentistryQueen Mary University of London and Barts Health NHS TrustLondonUK
- Leeds Institute of Rheumatic and Musculoskeletal MedicineUniversity of LeedsLeedsUK
- Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal ResearchThe University of ManchesterManchesterUK
| | - Heather J. Cordell
- NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation TrustManchester Academic Health Science CentreManchesterUK
| |
Collapse
|
29
|
Jiang W, Lakshminarayanan P, Hui X, Han P, Cheng Z, Bowers M, Shpitser I, Siddiqui S, Taylor RH, Quon H, McNutt T. Machine Learning Methods Uncover Radiomorphologic Dose Patterns in Salivary Glands that Predict Xerostomia in Patients with Head and Neck Cancer. Adv Radiat Oncol 2018; 4:401-412. [PMID: 31011686 PMCID: PMC6460328 DOI: 10.1016/j.adro.2018.11.008] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Accepted: 11/14/2018] [Indexed: 01/06/2023] Open
Abstract
Purpose Patients with head-and-neck cancer (HNC) may experience xerostomia after radiation therapy (RT), which leads to compromised quality of life. The purpose of this study is to explore how the spatial pattern of radiation dose (radiomorphology) in the major salivary glands influences xerostomia in patients with HNC. Methods and materials A data-driven approach using spatially explicit dosimetric predictors, voxel dose (ie, actual radiation dose in voxels in parotid glands [PG] and submandibular glands [SMG]) was used to predict whether patients would develop xerostomia 3 months after RT. Using planned radiation dose data and other nondose covariates including baseline xerostomia grade of 427 patients with HNC in our database, the machine learning methods were used to investigate the influence of dose patterns across subvolumes in PG and SMG on xerostomia. Results Of the 3 supervised learning methods studied, ridge logistic regression yielded the best predictive performance. Ridge logistic regression was also preferred to evaluate the influence pattern of highly correlated dose on xerostomia, which showed a discriminative pattern of influence of doses in the PG and SMG on xerostomia. Moreover, the superior–anterior portion of the contralateral PG and medial portion of the ipsilateral PG were determined to be the most influential regions regarding dose effect on xerostomia. The area under the receiver operating characteristic curve from a 10-fold cross-validation was 0.70 ± 0.04. Conclusions Radiomorphology, combined with machine learning methods, is able to suggest patterns of dose in PG and SMG that are the most influential on xerostomia. The influence pattern identified by this data-driven approach and machine learning methods may help improve RT treatment planning and reduce xerostomia after treatment.
Collapse
Affiliation(s)
- Wei Jiang
- Department of Civil Engineering, Johns Hopkins System Institute, Johns Hopkins University, Baltimore, Maryland
| | | | - Xuan Hui
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland
| | - Peijin Han
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland
| | - Zhi Cheng
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland
| | - Michael Bowers
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland
| | - Ilya Shpitser
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - Sauleh Siddiqui
- Department of Civil Engineering, Johns Hopkins System Institute, Johns Hopkins University, Baltimore, Maryland
| | - Russell H Taylor
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - Harry Quon
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland
| | - Todd McNutt
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
30
|
Coupé C. Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape. Front Psychol 2018; 9:513. [PMID: 29713298 PMCID: PMC5911484 DOI: 10.3389/fpsyg.2018.00513] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2017] [Accepted: 03/27/2018] [Indexed: 11/13/2022] Open
Abstract
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.
Collapse
Affiliation(s)
- Christophe Coupé
- Laboratory Dynamique du Langage, CNRS and University of Lyon, Lyon, France
| |
Collapse
|
31
|
Abstract
PURPOSE OF REVIEW Rare large-effect genetic variants underlie monogenic dyslipidemias, whereas common small-effect genetic variants - single nucleotide polymorphisms (SNPs) - have modest influences on lipid traits. Over the past decade, these small-effect SNPs have been shown to cumulatively exert consistent effects on lipid phenotypes under a polygenic framework, which is the focus of this review. RECENT FINDINGS Several groups have reported polygenic risk scores assembled from lipid-associated SNPs, and have applied them to their respective phenotypes. For lipid traits in the normal population distribution, polygenic effects quantified by a score that integrates several common polymorphisms account for about 20-30% of genetic variation. Among individuals at the extremes of the distribution, that is, those with clinical dyslipidemia, the polygenic component includes both rare variants with large effects and common polymorphisms: depending on the trait, 20-50% of susceptibility can be accounted for by this assortment of genetic variants. SUMMARY Accounting for polygenic effects increases the numbers of dyslipidemic individuals who can be explained genetically, but a substantial proportion of susceptibility remains unexplained. Whether documenting the polygenic basis of dyslipidemia will affect outcomes in clinical trials or prospective observational studies remains to be determined.
Collapse
|
32
|
Schwabe I, Janss L, van den Berg SM. Can We Validate the Results of Twin Studies? A Census-Based Study on the Heritability of Educational Achievement. Front Genet 2017; 8:160. [PMID: 29123543 PMCID: PMC5662588 DOI: 10.3389/fgene.2017.00160] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Accepted: 10/10/2017] [Indexed: 11/13/2022] Open
Abstract
As for most phenotypes, the amount of variance in educational achievement explained by SNPs is lower than the amount of additive genetic variance estimated in twin studies. Twin-based estimates may however be biased because of self-selection and differences in cognitive ability between twins and the rest of the population. Here we compare twin registry based estimates with a census-based heritability estimate, sampling from the same Dutch birth cohort population and using the same standardized measure for educational achievement. Including important covariates (i.e., sex, migration status, school denomination, SES, and group size), we analyzed 893,127 scores from primary school children from the years 2008-2014. For genetic inference, we used pedigree information to construct an additive genetic relationship matrix. Corrected for the covariates, this resulted in an estimate of 85%, which is even higher than based on twin studies using the same cohort and same measure. We therefore conclude that the genetic variance not tagged by SNPs is not an artifact of the twin method itself.
Collapse
Affiliation(s)
- Inga Schwabe
- Department of Research Methodology, Measurement and Data Analysis (OMD), University of Twente, Enschede, Netherlands.,Department of Methodology and Statistics, Tilburg University, Tilburg, Netherlands
| | - Luc Janss
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Stéphanie M van den Berg
- Department of Research Methodology, Measurement and Data Analysis (OMD), University of Twente, Enschede, Netherlands
| |
Collapse
|
33
|
Mazo Lopera MA, Coombes BJ, de Andrade M. An Efficient Test for Gene-Environment Interaction in Generalized Linear Mixed Models with Family Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2017; 14:ijerph14101134. [PMID: 28953253 PMCID: PMC5664635 DOI: 10.3390/ijerph14101134] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2017] [Revised: 09/20/2017] [Accepted: 09/25/2017] [Indexed: 02/07/2023]
Abstract
Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma (PPARG) gene associated with diabetes.
Collapse
Affiliation(s)
- Mauricio A Mazo Lopera
- School of Statistics, National University of Colombia, Medellín, Antioquia 050022, Colombia.
- Departament of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| | - Brandon J Coombes
- Departament of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| | - Mariza de Andrade
- Departament of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.
| |
Collapse
|
34
|
Incorporating Gene Annotation into Genomic Prediction of Complex Phenotypes. Genetics 2017; 207:489-501. [PMID: 28839043 DOI: 10.1534/genetics.117.300198] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2016] [Accepted: 08/16/2017] [Indexed: 11/18/2022] Open
Abstract
Today, genomic prediction (GP) is an established technology in plant and animal breeding programs. Current standard methods are purely based on statistical considerations but do not make use of the abundant biological knowledge, which is easily available from public databases. Major questions that have to be answered before biological prior information can be used routinely in GP approaches are which types of information can be used, and at which points they can be incorporated into prediction methods. In this study, we propose a novel strategy to incorporate gene annotation into GP of complex phenotypes by defining haploblocks according to gene positions. Haplotype effects are then modeled as categorical or as numerical allele dosage variables. The underlying concept of this approach is to build the statistical model on variables representing the biologically functional units. We evaluate the new methods with data from a heterogeneous stock mouse population, the Drosophila Genetic Reference Panel (DGRP), and a rice breeding population from the Rice Diversity Panel. Our results show that using gene annotation to define haploblocks often leads to a comparable, but for some traits to a higher, predictive ability compared to SNP-based models or to haplotype models that do not use gene annotation information. Modeling gene interaction effects can further improve predictive ability. We also illustrate that the additional use of markers that have not been mapped to any gene in a second separate relatedness matrix does in many cases not lead to a relevant additional increase in predictive ability when the first matrix is based on haploblocks defined with gene annotation data, suggesting that intergenic markers only provide redundant information on the considered data sets. Therefore, gene annotation information seems to be appropriate to perceive the importance of DNA segments. Finally, we discuss the effects of gene annotation quality, marker density, and linkage disequilibrium on the performance of the new methods. To our knowledge, this is the first work that incorporates epistatic interaction or gene annotation into haplotype-based prediction approaches.
Collapse
|
35
|
Su YR, Di CZ, Hsu L. A unified powerful set-based test for sequencing data analysis of GxE interactions. Biostatistics 2016; 18:119-131. [PMID: 27474101 DOI: 10.1093/biostatistics/kxw034] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Revised: 04/27/2016] [Accepted: 06/17/2016] [Indexed: 11/13/2022] Open
Abstract
The development of next-generation sequencing technologies has allowed researchers to study comprehensively the contribution of genetic variation particularly rare variants to complex diseases. To date many sequencing analyses of rare variants have focused on marginal genetic effects and have not explored the potential role environmental factors play in modifying genetic risk. Analysis of gene-environment interaction (GxE) for rare variants poses considerable challenges because of variant rarity and paucity of subjects who carry the variants while being exposed. To tackle this challenge, we propose a hierarchical model to jointly assess the GxE effects of a set of rare variants for example, in a gene or regulatory region, leveraging the information across the variants. Under this model, GxE is modeled by two components. The first component incorporates variant functional information as weights to calculate the weighted burden of variant alleles across variants, and then assess their GxE interaction with the environmental factor. Since this information is a priori known, this component is fixed effects in the model. The second component involves residual GxE effects that have not been accounted for by the fixed effects. In this component, the residual GxE effects are postulated to follow an unspecified distribution with mean 0 and variance [Formula: see text] We develop a novel testing procedure by deriving two independent score statistics for the fixed effects and the variance component separately. We propose two data-adaptive combination approaches for combining these two score statistics and establish the asymptotic distributions. An extensive simulation study shows that the proposed approaches maintain the correct type I error and the power is comparable to or better than existing methods under a wide range of scenarios. Finally we illustrate the proposed methods by a exome-wide GxE analysis with NSAIDs use in colorectal cancer.
Collapse
Affiliation(s)
- Yu-Ru Su
- Biostatistics and Biomathematics Program, Public Health Science Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA 98109, USA
| | - Chong-Zhi Di
- Biostatistics and Biomathematics Program, Public Health Science Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA 98109, USA
| | - Li Hsu
- Biostatistics and Biomathematics Program, Public Health Science Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, Seattle, WA 98109, USA
| | | |
Collapse
|