1
|
Song S, Li Y, Qiu M, Xu N, Li B, Zhang L, Li L, Chen W, Li J, Wang T, Qiu Y, Gong M, Yu D, Dong H, Xia S, Pan Y, Yuan D, Li L. Structural variations of a new fertility restorer gene, Rf20, underlie the restoration of wild abortive-type cytoplasmic male sterility in rice. MOLECULAR PLANT 2024; 17:1272-1288. [PMID: 38956872 DOI: 10.1016/j.molp.2024.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 06/25/2024] [Accepted: 07/01/2024] [Indexed: 07/04/2024]
Abstract
The discovery of a wild abortive-type (WA) cytoplasmic male sterile (CMS) line and breeding its restorer line have led to the commercialization of three-line hybrid rice, contributing considerably to global food security. However, the molecular mechanisms underlying fertility abortion and the restoration of CMS-WA lines remain largely elusive. In this study, we cloned a restorer gene, Rf20, following a genome-wide association study analysis of the core parent lines of three-line hybrid rice. We found that Rf20 was present in all core parental lines, but different haplotypes and structural variants of its gene resulted in differences in Rf20 expression levels between sterile and restored lines. Rf20 could restore pollen fertility in the CMS-WA line and was found to be responsible for fertility restoration in some CMS lines under high temperatures. In addition, we found that Rf20 encodes a pentatricopeptide repeat protein that competes with WA352 for binding with COX11. This interaction enhances COX11's function as a scavenger of reactive oxygen species, which in turn restores pollen fertility. Collectively, our study suggests a new action mode for pentatricopeptide repeat proteins in the fertility restoration of CMS lines, providing an essential theoretical basis for breeding robust restorer lines and for overcoming high temperature-induced fertility recovery of some CMS lines.
Collapse
Affiliation(s)
- Shufeng Song
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China
| | - Yixing Li
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China
| | - Mudan Qiu
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China; College of Agronomy, Hunan Agricultural University, Changsha 410128, China
| | - Na Xu
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China; College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
| | - Bin Li
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China
| | - Longhui Zhang
- College of Tropical Crops, Hainan University, Haikou 570228, China
| | - Lei Li
- Longping Branch, College of Biology, Hunan University, Changsha 410125, China
| | - Weijun Chen
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China
| | - Jinglei Li
- Longping Branch, College of Biology, Hunan University, Changsha 410125, China
| | - Tiankang Wang
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China
| | - Yingxin Qiu
- Longping Branch, College of Biology, Hunan University, Changsha 410125, China
| | - Mengmeng Gong
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China
| | - Dong Yu
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China
| | - Hao Dong
- Longping Branch, College of Biology, Hunan University, Changsha 410125, China
| | - Siqi Xia
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China
| | - Yi Pan
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China
| | - Dingyang Yuan
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China; Longping Branch, College of Biology, Hunan University, Changsha 410125, China
| | - Li Li
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha 410125, China; Longping Branch, College of Biology, Hunan University, Changsha 410125, China.
| |
Collapse
|
2
|
Lavanchy E, Weir BS, Goudet J. Detecting inbreeding depression in structured populations. Proc Natl Acad Sci U S A 2024; 121:e2315780121. [PMID: 38687793 PMCID: PMC11087799 DOI: 10.1073/pnas.2315780121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 03/19/2024] [Indexed: 05/02/2024] Open
Abstract
Measuring inbreeding and its consequences on fitness is central for many areas in biology including human genetics and the conservation of endangered species. However, there is no consensus on the best method, neither for quantification of inbreeding itself nor for the model to estimate its effect on specific traits. We simulated traits based on simulated genomes from a large pedigree and empirical whole-genome sequences of human data from populations with various sizes and structures (from the 1,000 Genomes project). We compare the ability of various inbreeding coefficients ([Formula: see text]) to quantify the strength of inbreeding depression: allele-sharing, two versions of the correlation of uniting gametes which differ in the weight they attribute to each locus and two identical-by-descent segments-based estimators. We also compare two models: the standard linear model and a linear mixed model (LMM) including a genetic relatedness matrix (GRM) as random effect to account for the nonindependence of observations. We find LMMs give better results in scenarios with population or family structure. Within the LMM, we compare three different GRMs and show that in homogeneous populations, there is little difference among the different [Formula: see text] and GRM for inbreeding depression quantification. However, as soon as a strong population or family structure is present, the strength of inbreeding depression can be most efficiently estimated only if i) the phenotypes are regressed on [Formula: see text] based on a weighted version of the correlation of uniting gametes, giving more weight to common alleles and ii) with the GRM obtained from an allele-sharing relatedness estimator.
Collapse
Affiliation(s)
- Eléonore Lavanchy
- Department of Ecology and Evolution, University of Lausanne, Lausanne1015, Switzerland
- Population Genetics and Genomics group, Swiss Institute of Bioinformatics, University of Lausanne, LausanneCH-1015, Switzerland
| | - Bruce S. Weir
- Department of Biostatistics, University of Washington, SeattleWA98195
| | - Jérôme Goudet
- Department of Ecology and Evolution, University of Lausanne, Lausanne1015, Switzerland
- Population Genetics and Genomics group, Swiss Institute of Bioinformatics, University of Lausanne, LausanneCH-1015, Switzerland
| |
Collapse
|
3
|
Morris KM, Sutton K, Girma M, Sánchez-Molano E, Solomon B, Esatu W, Dessie T, Vervelde L, Psifidi A, Hanotte O, Banos G. Phenotypic and genomic characterisation of performance of tropically adapted chickens raised in smallholder farm conditions in Ethiopia. Front Genet 2024; 15:1383609. [PMID: 38706792 PMCID: PMC11066160 DOI: 10.3389/fgene.2024.1383609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 04/01/2024] [Indexed: 05/07/2024] Open
Abstract
Background In sub-Saharan Africa, 80% of poultry production is on smallholder village farms, where chickens are typically reared outdoors in free-ranging conditions. There is limited knowledge on chickens' phenotypic characteristics and genetics under these conditions. Objective The present is a large-scale study set out to phenotypically characterise the performance of tropically adapted commercial chickens in typical smallholder farm conditions, and to examine the genetic profile of chicken phenotypes associated with growth, meat production, immunity, and survival. Methods A total of 2,573 T451A dual-purpose Sasso chickens kept outdoors in emulated free-ranging conditions at the poultry facility of the International Livestock Research Institute in Addis Ababa, Ethiopia, were included in the study. The chickens were raised in five equally sized batches and were individually monitored and phenotyped from the age of 56 days for 8 weeks. Individual chicken data collected included weekly body weight, growth rate, body and breast meat weight at slaughter, Newcastle Disease Virus (NDV) titres and intestinal Immunoglobulin A (IgA) levels recorded at the beginning and the end of the period of study, and survival rate during the same period. Genotyping by sequencing was performed on all chickens using a low-coverage and imputation approach. Chicken phenotypes and genotypes were combined in genomic association analyses. Results We discovered that the chickens were phenotypically diverse, with extensive variance levels observed in all traits. Batch number and sex of the chicken significantly affected the studied phenotypes. Following quality assurance, genotypes consisted of 2.9 million Single Nucleotide Polymorphism markers that were used in the genomic analyses. Results revealed a largely polygenic mode of genetic control of all phenotypic traits. Nevertheless, 15 distinct markers were identified that were significantly associated with growth, carcass traits, NDV titres, IgA levels, and chicken survival. These markers were located in regions harbouring relevant annotated genes. Conclusion Results suggest that performance of chickens raised under smallholder farm conditions is amenable to genetic improvement and may inform selective breeding programmes for enhanced chicken productivity in sub-Saharan Africa.
Collapse
Affiliation(s)
- Katrina M. Morris
- The Roslin Institute, University of Edinburgh, Midlothian, United Kingdom
| | - Kate Sutton
- The Roslin Institute, University of Edinburgh, Midlothian, United Kingdom
| | - Mekonnen Girma
- International Livestock Research Institute (ILRI), Addis Ababa, Ethiopia
| | | | - Bersabhe Solomon
- International Livestock Research Institute (ILRI), Addis Ababa, Ethiopia
| | - Wondmeneh Esatu
- International Livestock Research Institute (ILRI), Addis Ababa, Ethiopia
| | - Tadelle Dessie
- International Livestock Research Institute (ILRI), Addis Ababa, Ethiopia
| | - Lonneke Vervelde
- The Roslin Institute, University of Edinburgh, Midlothian, United Kingdom
| | - Androniki Psifidi
- The Roslin Institute, University of Edinburgh, Midlothian, United Kingdom
- Royal Veterinary College, Hatfield, United Kingdom
| | - Olivier Hanotte
- International Livestock Research Institute (ILRI), Addis Ababa, Ethiopia
- School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - Georgios Banos
- The Roslin Institute, University of Edinburgh, Midlothian, United Kingdom
- Scotland’s Rural College (SRUC), Animal and Veterinary Sciences, Midlothian, United Kingdom
| |
Collapse
|
4
|
Liu Z, Turkmen AS, Lin S. Bayesian LASSO for population stratification correction in rare haplotype association studies. Stat Appl Genet Mol Biol 2024; 23:sagmb-2022-0034. [PMID: 38235525 PMCID: PMC10794901 DOI: 10.1515/sagmb-2022-0034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 12/19/2023] [Indexed: 01/19/2024]
Abstract
Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.
Collapse
Affiliation(s)
- Zilu Liu
- Department of Statistics, The Ohio State University, Columbus, OH43210, USA
| | | | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH43210, USA
| |
Collapse
|
5
|
Liu Z, Turkmen AS, Lin S. Population stratification correction using Bayesian shrinkage priors for genetic association studies. Ann Hum Genet 2023; 87:302-315. [PMID: 37771252 DOI: 10.1111/ahg.12527] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 08/20/2023] [Accepted: 08/24/2023] [Indexed: 09/30/2023]
Abstract
INTRODUCTION Population stratification (PS) is a major source of confounding in population-based genetic association studies of quantitative traits. Principal component regression (PCR) and linear mixed model (LMM) are two commonly used approaches to account for PS in association studies. Previous studies have shown that LMM can be interpreted as including all principal components (PCs) as random-effect covariates. However, including all PCs in LMM may dilute the influence of relevant PCs in some scenarios, while including only a few preselected PCs in PCR may fail to fully capture the genetic diversity. MATERIALS AND METHODS To address these shortcomings, we introduce Bayestrat-a method to detect associated variants with PS correction under the Bayesian LASSO framework. To adjust for PS, Bayestrat accommodates a large number of PCs and utilizes appropriate shrinkage priors to shrink the effects of nonassociated PCs. RESULTS Simulation results show that Bayestrat consistently controls type I error rates and achieves higher power compared to its non-shrinkage counterparts, especially when the number of PCs included in the model is large. As a demonstration of the utility of Bayestrat, we apply it to the Multi-Ethnic Study of Atherosclerosis (MESA). Variants and genes associated with serum triglyceride or HDL cholesterol are identified in our analyses. DISCUSSION The automatic and self-selection features of Bayestrat make it particularly suited in situations with complex underlying PS scenarios, where it is unknown a priori which PCs are potential confounders, yet the number that needs to be considered could be large in order to fully account for PS.
Collapse
Affiliation(s)
- Zilu Liu
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| | - Asuman S Turkmen
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
6
|
Devogel N, Auer PL, Manansala R, Wang T. On asymptotic distributions of several test statistics for familial relatedness in linear mixed models. Stat Med 2023; 42:2962-2981. [PMID: 37345498 DOI: 10.1002/sim.9762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 03/16/2023] [Accepted: 04/26/2023] [Indexed: 06/23/2023]
Abstract
In this study, the asymptotic distributions of the likelihood ratio test (LRT), the restricted likelihood ratio test (RLRT), the F and the sequence kernel association test (SKAT) statistics for testing an additive effect of the expected familial relatedness (FR) in a linear mixed model are examined based on an eigenvalue approach. First, the covariance structure for modeling the FR effect in a LMM is presented. Then, the multiplicity of eigenvalues for the log-likelihood and restricted log-likelihood is established under a replicate family setting and extended to a more general replicate family setting (GRFS) as well. After that, the asymptotic null distributions of LRT, RLRT, F and SKAT statistics under GRFS are derived. The asymptotic null distribution of SKAT for testing genetic rare variants is also constructed. In addition, a simple formula for sample size calculation is provided based on the restricted maximum likelihood estimate of the effect size for the expected FR. Finally, a power comparison of these test statistics on hypothesis test of the expected FR effect is made via simulation. The four test statistics are also applied to a data set from the UK Biobank.
Collapse
Affiliation(s)
- Nicholas Devogel
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Paul L Auer
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Regina Manansala
- Centre for Health Economics Research & Modelling Infectious Diseases, Vaccine & Infectious Disease Institute WHO Collaborating Centre, Faculty of Medicine & Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Tao Wang
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
7
|
Li Q, Chen J, Faux P, Delgado ME, Bonfante B, Fuentes-Guajardo M, Mendoza-Revilla J, Chacón-Duque JC, Hurtado M, Villegas V, Granja V, Jaramillo C, Arias W, Barquera R, Everardo-Martínez P, Sánchez-Quinto M, Gómez-Valdés J, Villamil-Ramírez H, Silva de Cerqueira CC, Hünemeier T, Ramallo V, Wu S, Du S, Giardina A, Paria SS, Khokan MR, Gonzalez-José R, Schüler-Faccini L, Bortolini MC, Acuña-Alonzo V, Canizales-Quinteros S, Gallo C, Poletti G, Rojas W, Rothhammer F, Navarro N, Wang S, Adhikari K, Ruiz-Linares A. Automatic landmarking identifies new loci associated with face morphology and implicates Neanderthal introgression in human nasal shape. Commun Biol 2023; 6:481. [PMID: 37156940 PMCID: PMC10167347 DOI: 10.1038/s42003-023-04838-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 04/12/2023] [Indexed: 05/10/2023] Open
Abstract
We report a genome-wide association study of facial features in >6000 Latin Americans based on automatic landmarking of 2D portraits and testing for association with inter-landmark distances. We detected significant associations (P-value <5 × 10-8) at 42 genome regions, nine of which have been previously reported. In follow-up analyses, 26 of the 33 novel regions replicate in East Asians, Europeans, or Africans, and one mouse homologous region influences craniofacial morphology in mice. The novel region in 1q32.3 shows introgression from Neanderthals and we find that the introgressed tract increases nasal height (consistent with the differentiation between Neanderthals and modern humans). Novel regions include candidate genes and genome regulatory elements previously implicated in craniofacial development, and show preferential transcription in cranial neural crest cells. The automated approach used here should simplify the collection of large study samples from across the world, facilitating a cosmopolitan characterization of the genetics of facial features.
Collapse
Affiliation(s)
- Qing Li
- Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, School of Life Sciences and Human Phenome Institute, Fudan University, Yangpu District, Shanghai, 200438, China
| | - Jieyi Chen
- Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, School of Life Sciences and Human Phenome Institute, Fudan University, Yangpu District, Shanghai, 200438, China
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Pierre Faux
- Aix-Marseille Université, CNRS, EFS, ADES, Marseille, 13005, France
| | - Miguel Eduardo Delgado
- Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, School of Life Sciences and Human Phenome Institute, Fudan University, Yangpu District, Shanghai, 200438, China
- División Antropología, Facultad de Ciencias Naturales y Museo, Universidad Nacional de La Plata, La Plata, República Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, CONICET, Buenos Aires, República Argentina
| | - Betty Bonfante
- Aix-Marseille Université, CNRS, EFS, ADES, Marseille, 13005, France
| | - Macarena Fuentes-Guajardo
- Departamento de Tecnología Médica, Facultad de Ciencias de la Salud, Universidad de Tarapacá, Arica, 1000000, Chile
| | - Javier Mendoza-Revilla
- Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, 31, Perú
- Unit of Human Evolutionary Genetics, Institut Pasteur, Paris, 75015, France
| | - J Camilo Chacón-Duque
- Division of Vertebrates and Anthropology, Department of Earth Sciences, Natural History Museum, London, SW7 5BD, UK
| | - Malena Hurtado
- Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, 31, Perú
| | - Valeria Villegas
- Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, 31, Perú
| | - Vanessa Granja
- Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, 31, Perú
| | - Claudia Jaramillo
- GENMOL (Genética Molecular), Universidad de Antioquia, Medellín, 5001000, Colombia
| | - William Arias
- GENMOL (Genética Molecular), Universidad de Antioquia, Medellín, 5001000, Colombia
| | - Rodrigo Barquera
- Molecular Genetics Laboratory, National School of Anthropology and History, Mexico City, 14050, Mexico, 6600, Mexico
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History (MPI-SHH), Jena, 07745, Germany
| | - Paola Everardo-Martínez
- Molecular Genetics Laboratory, National School of Anthropology and History, Mexico City, 14050, Mexico, 6600, Mexico
| | - Mirsha Sánchez-Quinto
- Forensic Science, Faculty of Medicine, UNAM (Universidad Nacional Autónoma de México), Mexico City, 06320, Mexico
| | - Jorge Gómez-Valdés
- Molecular Genetics Laboratory, National School of Anthropology and History, Mexico City, 14050, Mexico, 6600, Mexico
| | - Hugo Villamil-Ramírez
- Unidad de Genomica de Poblaciones Aplicada a la Salud, Facultad de Química, UNAM-Instituto Nacional de Medicina Genómica, Mexico City, 4510, Mexico
| | | | - Tábita Hünemeier
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, 05508-090, Brazil
| | - Virginia Ramallo
- Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, 90040-060, Brazil
- Instituto Patagónico de Ciencias Sociales y Humanas, Centro Nacional Patagónico, CONICET, Puerto Madryn, U9129ACD, Argentina
| | - Sijie Wu
- Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, School of Life Sciences and Human Phenome Institute, Fudan University, Yangpu District, Shanghai, 200438, China
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Siyuan Du
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Andrea Giardina
- School of Mathematics and Statistics, Faculty of Science, Technology, Engineering and Mathematics, The Open University, Milton Keynes, MK7 6AA, United Kingdom
| | - Soumya Subhra Paria
- School of Mathematics and Statistics, Faculty of Science, Technology, Engineering and Mathematics, The Open University, Milton Keynes, MK7 6AA, United Kingdom
| | - Mahfuzur Rahman Khokan
- School of Mathematics and Statistics, Faculty of Science, Technology, Engineering and Mathematics, The Open University, Milton Keynes, MK7 6AA, United Kingdom
| | - Rolando Gonzalez-José
- Instituto Patagónico de Ciencias Sociales y Humanas, Centro Nacional Patagónico, CONICET, Puerto Madryn, U9129ACD, Argentina
| | - Lavinia Schüler-Faccini
- Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, 90040-060, Brazil
| | - Maria-Cátira Bortolini
- Departamento de Genética, Universidade Federal do Rio Grande do Sul, Porto Alegre, 90040-060, Brazil
| | - Victor Acuña-Alonzo
- Molecular Genetics Laboratory, National School of Anthropology and History, Mexico City, 14050, Mexico, 6600, Mexico
| | - Samuel Canizales-Quinteros
- Unidad de Genomica de Poblaciones Aplicada a la Salud, Facultad de Química, UNAM-Instituto Nacional de Medicina Genómica, Mexico City, 4510, Mexico
| | - Carla Gallo
- Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, 31, Perú
| | - Giovanni Poletti
- Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, 31, Perú
| | - Winston Rojas
- GENMOL (Genética Molecular), Universidad de Antioquia, Medellín, 5001000, Colombia
| | - Francisco Rothhammer
- Instituto de Alta Investigación, Universidad de Tarapacá, Arica, Arica, 1000000, Chile
| | - Nicolas Navarro
- Biogéosciences, UMR 6282 CNRS, Université de Bourgogne, Dijon, 21000, France
- EPHE, PSL University, Paris, 75014, France
| | - Sijia Wang
- Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, School of Life Sciences and Human Phenome Institute, Fudan University, Yangpu District, Shanghai, 200438, China
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Kaustubh Adhikari
- School of Mathematics and Statistics, Faculty of Science, Technology, Engineering and Mathematics, The Open University, Milton Keynes, MK7 6AA, United Kingdom.
- Department of Genetics, Evolution and Environment, and UCL Genetics Institute, University College London, London, WC1E 6BT, UK.
| | - Andrés Ruiz-Linares
- Ministry of Education Key Laboratory of Contemporary Anthropology and Collaborative Innovation Center of Genetics and Development, School of Life Sciences and Human Phenome Institute, Fudan University, Yangpu District, Shanghai, 200438, China.
- Aix-Marseille Université, CNRS, EFS, ADES, Marseille, 13005, France.
- Department of Genetics, Evolution and Environment, and UCL Genetics Institute, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
8
|
Hou Z, Ochoa A. Genetic association models are robust to common population kinship estimation biases. Genetics 2023; 224:iyad030. [PMID: 36843304 PMCID: PMC10474929 DOI: 10.1093/genetics/iyad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 11/08/2022] [Accepted: 02/17/2023] [Indexed: 02/28/2023] Open
Abstract
Common genetic association models for structured populations, including principal component analysis (PCA) and linear mixed-effects models (LMMs), model the correlation structure between individuals using population kinship matrices, also known as genetic relatedness matrices. However, the most common kinship estimators can have severe biases that were only recently determined. Here we characterize the effect of these kinship biases on genetic association. We employ a large simulated admixed family and genotypes from the 1000 Genomes Project, both with simulated traits, to evaluate key kinship estimators. Remarkably, we find practically invariant association statistics for kinship matrices of different bias types (matching all other features). We then prove using statistical theory and linear algebra that LMM association tests are invariant to these kinship biases, and PCA approximately so. Our proof shows that the intercept and relatedness effect coefficients compensate for the kinship bias, an argument that extends to generalized linear models. As a corollary, association testing is also invariant to changing the reference ancestral population of the kinship matrix. Lastly, we observed that all kinship estimators, except for popkin ratio-of-means, can give improper non-positive semidefinite matrices, which can be problematic although some LMMs handle them surprisingly well, and condition numbers can be used to choose kinship estimators. Overall, we find that existing association studies are robust to kinship estimation bias, and our calculations may help improve association methods by taking advantage of this unexpected robustness, as well as help determine the effects of kinship bias in related problems.
Collapse
Affiliation(s)
- Zhuoran Hou
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA
| | - Alejandro Ochoa
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27705, USA
- Duke Center for Statistical Genetics and Genomics, Duke University, Durham, NC 27705, USA
| |
Collapse
|
9
|
Farooq M, van Dijk AD, Nijveen H, Mansoor S, de Ridder D. Genomic prediction in plants: opportunities for ensemble machine learning based approaches. F1000Res 2023; 11:802. [PMID: 37035464 PMCID: PMC10080209 DOI: 10.12688/f1000research.122437.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2023] [Indexed: 01/12/2023] Open
Abstract
Background: Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods, is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Methods: Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability (h2 and h2e), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Results: Decision tree based ensemble ML methods are a better choice for nonlinear phenotypes and are comparable to Bayesian methods for linear phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods. Conclusions: Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners.
Collapse
Affiliation(s)
- Muhammad Farooq
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Punjab, 38000, Pakistan
| | - Aalt D.J. van Dijk
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| | - Harm Nijveen
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| | - Shahid Mansoor
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Punjab, 38000, Pakistan
| | - Dick de Ridder
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| |
Collapse
|
10
|
Jiang W, Zhang X, Li S, Song S, Zhao H. An unbiased kinship estimation method for genetic data analysis. BMC Bioinformatics 2022; 23:525. [PMID: 36474154 PMCID: PMC9727941 DOI: 10.1186/s12859-022-05082-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 11/25/2022] [Indexed: 12/13/2022] Open
Abstract
Accurate estimate of relatedness is important for genetic data analyses, such as heritability estimation and association mapping based on data collected from genome-wide association studies. Inaccurate relatedness estimates may lead to biased heritability estimations and spurious associations. Individual-level genotype data are often used to estimate kinship coefficient between individuals. The commonly used sample correlation-based genomic relationship matrix (scGRM) method estimates kinship coefficient by calculating the average sample correlation coefficient among all single nucleotide polymorphisms (SNPs), where the observed allele frequencies are used to calculate both the expectations and variances of genotypes. Although this method is widely used, a substantial proportion of estimated kinship coefficients are negative, which are difficult to interpret. In this paper, through mathematical derivation, we show that there indeed exists bias in the estimated kinship coefficient using the scGRM method when the observed allele frequencies are regarded as true frequencies. This leads to negative bias for the average estimate of kinship among all individuals, which explains the estimated negative kinship coefficients. Based on this observation, we propose an unbiased estimation method, UKin, which can reduce kinship estimation bias. We justify our improved method with rigorous mathematical proof. We have conducted simulations as well as two real data analyses to compare UKin with scGRM and three other kinship estimating methods: rGRM, tsGRM, and KING. Our results demonstrate that both bias and root mean square error in kinship coefficient estimation could be reduced by using UKin. We further investigated the performance of UKin, KING, and three GRM-based methods in calculating the SNP-based heritability, and show that UKin can improve estimation accuracy for heritability regardless of the scale of SNP panel.
Collapse
Affiliation(s)
- Wei Jiang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, USA
| | - Xiangyu Zhang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, USA
| | - Siting Li
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, USA
| | - Shuang Song
- Center for Statistical Science, Tsinghua University, Beijing, China
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Hongyu Zhao
- Department of Biostatistics, School of Public Health, Yale University, New Haven, USA.
| |
Collapse
|
11
|
Farooq M, van Dijk AD, Nijveen H, Mansoor S, de Ridder D. Genomic prediction in plants: opportunities for ensemble machine learning based approaches. F1000Res 2022; 11:802. [PMID: 37035464 PMCID: PMC10080209 DOI: 10.12688/f1000research.122437.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/15/2022] Open
Abstract
Background: Many studies have demonstrated the utility of machine learning (ML) methods for genomic prediction (GP) of various plant traits, but a clear rationale for choosing ML over conventionally used, often simpler parametric methods, is still lacking. Predictive performance of GP models might depend on a plethora of factors including sample size, number of markers, population structure and genetic architecture. Methods: Here, we investigate which problem and dataset characteristics are related to good performance of ML methods for genomic prediction. We compare the predictive performance of two frequently used ensemble ML methods (Random Forest and Extreme Gradient Boosting) with parametric methods including genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space regression (RKHS), BayesA and BayesB. To explore problem characteristics, we use simulated and real plant traits under different genetic complexity levels determined by the number of Quantitative Trait Loci (QTLs), heritability (h2 and h2e), population structure and linkage disequilibrium between causal nucleotides and other SNPs. Results: Decision tree based ensemble ML methods are a better choice for nonlinear phenotypes and are comparable to Bayesian methods for linear phenotypes in the case of large effect Quantitative Trait Nucleotides (QTNs). Furthermore, we find that ML methods are susceptible to confounding due to population structure but less sensitive to low linkage disequilibrium than linear parametric methods. Conclusions: Overall, this provides insights into the role of ML in GP as well as guidelines for practitioners.
Collapse
Affiliation(s)
- Muhammad Farooq
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Punjab, 38000, Pakistan
| | - Aalt D.J. van Dijk
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| | - Harm Nijveen
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| | - Shahid Mansoor
- Molecular Virology and Gene Silencing Lab, Agricultural Biotechnology Division, National Institute for Biotechnology and Genetic Engineering (NIBGE), Faisalabad, Punjab, 38000, Pakistan
| | - Dick de Ridder
- Bioinformatics group, Department of Plant Science, Wageningen University and Research, Wageningen, Gelderland, 6708PB, The Netherlands
| |
Collapse
|
12
|
Eizenga GC, Kim H, Jung JKH, Greenberg AJ, Edwards JD, Naredo MEB, Banaticla-Hilario MCN, Harrington SE, Shi Y, Kimball JA, Harper LA, McNally KL, McCouch SR. Phenotypic Variation and the Impact of Admixture in the Oryza rufipogon Species Complex ( ORSC). FRONTIERS IN PLANT SCIENCE 2022; 13:787703. [PMID: 35769295 PMCID: PMC9235872 DOI: 10.3389/fpls.2022.787703] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/13/2022] [Indexed: 06/15/2023]
Abstract
Crop wild relatives represent valuable reservoirs of variation for breeding, but their populations are threatened in natural habitats, are sparsely represented in genebanks, and most are poorly characterized. The focus of this study is the Oryza rufipogon species complex (ORSC), wild progenitor of Asian rice (Oryza sativa L.). The ORSC comprises perennial, annual and intermediate forms which were historically designated as O. rufipogon, O. nivara, and O. sativa f. spontanea (or Oryza spp., an annual form of mixed O. rufipogon/O. nivara and O. sativa ancestry), respectively, based on non-standardized morphological, geographical, and/or ecologically-based species definitions and boundaries. Here, a collection of 240 diverse ORSC accessions, characterized by genotyping-by-sequencing (113,739 SNPs), was phenotyped for 44 traits associated with plant, panicle, and seed morphology in the screenhouse at the International Rice Research Institute, Philippines. These traits included heritable phenotypes often recorded as characterization data by genebanks. Over 100 of these ORSC accessions were also phenotyped in the greenhouse for 18 traits in Stuttgart, Arkansas, and 16 traits in Ithaca, New York, United States. We implemented a Bayesian Gaussian mixture model to infer accession groups from a subset of these phenotypic data and ascertained three phenotype-based group assignments. We used concordance between the genotypic subpopulations and these phenotype-based groups to identify a suite of phenotypic traits that could reliably differentiate the ORSC populations, whether measured in tropical or temperate regions. The traits provide insight into plant morphology, life history (perenniality versus annuality) and mating habit (self- versus cross-pollinated), and are largely consistent with genebank species designations. One phenotypic group contains predominantly O. rufipogon accessions characterized as perennial and largely out-crossing and one contains predominantly O. nivara accessions characterized as annual and largely inbreeding. From these groups, 42 "core" O. rufipogon and 25 "core" O. nivara accessions were identified for domestication studies. The third group, comprising 20% of our collection, has the most accessions identified as Oryza spp. (51.2%) and levels of O. sativa admixture accounting for more than 50% of the genome. This third group is potentially useful as a "pre-breeding" pool for breeders attempting to incorporate novel variation into elite breeding lines.
Collapse
Affiliation(s)
- Georgia C. Eizenga
- Dale Bumpers National Rice Research Center, USDA-ARS, Stuttgart, AR, United States
| | - HyunJung Kim
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Janelle K. H. Jung
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | | | - Jeremy D. Edwards
- Dale Bumpers National Rice Research Center, USDA-ARS, Stuttgart, AR, United States
| | | | | | - Sandra E. Harrington
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Yuxin Shi
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Jennifer A. Kimball
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Lisa A. Harper
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | | | - Susan R. McCouch
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
13
|
Seal S, Vu T, Ghosh T, Wrobel J, Ghosh D. DenVar: density-based variation analysis of multiplex imaging data. BIOINFORMATICS ADVANCES 2022; 2:vbac039. [PMID: 36699398 PMCID: PMC9710661 DOI: 10.1093/bioadv/vbac039] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/17/2022] [Accepted: 05/18/2022] [Indexed: 02/01/2023]
Abstract
Summary Multiplex imaging platforms have become popular for studying complex single-cell biology in the tumor microenvironment (TME) of cancer subjects. Studying the intensity of the proteins that regulate important cell-functions becomes extremely crucial for subject-specific assessment of risks. The conventional approach requires selection of two thresholds, one to define the cells of the TME as positive or negative for a particular protein, and the other to classify the subjects based on the proportion of the positive cells. We present a threshold-free approach in which distance between a pair of subjects is computed based on the probability density of the protein in their TMEs. The distance matrix can either be used to classify the subjects into meaningful groups or can directly be used in a kernel machine regression framework for testing association with clinical outcomes. The method gets rid of the subjectivity bias of the thresholding-based approach, enabling easier but interpretable analysis. We analyze a lung cancer dataset, finding the difference in the density of protein HLA-DR to be significantly associated with the overall survival and a triple-negative breast cancer dataset, analyzing the effects of multiple proteins on survival and recurrence. The reliability of our method is demonstrated through extensive simulation studies. Availability and implementation The associated R package can be found here, https://github.com/sealx017/DenVar. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Souvik Seal
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA,To whom correspondence should be addressed.
| | - Thao Vu
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA
| | - Tusharkanti Ghosh
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA
| | - Julia Wrobel
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, University of Colorado CU Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
14
|
Abstract
Motivated by empirical arguments that are well known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate single nucleotide polymorphism (SNP) in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification to this technique to trade off velocity against veracity. Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods. We consider two sources of confounding factors-population stratification and environmental confounding factors-and study how different methods that are commonly used in practice trade off these two confounding factors differently.
Collapse
Affiliation(s)
- Haohan Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Bryon Aragam
- Booth School of Business, University of Chicago, Chicago, Illinois, USA
| | - Eric P. Xing
- School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
15
|
Yang JJ, Luo X, Trucco EM, Buu A. Polygenic risk prediction based on singular value decomposition with applications to alcohol use disorder. BMC Bioinformatics 2022; 23:28. [PMID: 35012447 PMCID: PMC8744290 DOI: 10.1186/s12859-022-04566-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 01/05/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND/AIM The polygenic risk score (PRS) shows promise as a potentially effective approach to summarize genetic risk for complex diseases such as alcohol use disorder that is influenced by a combination of multiple variants, each of which has a very small effect. Yet, conventional PRS methods tend to over-adjust confounding factors in the discovery sample and thus have low power to predict the phenotype in the target sample. This study aims to address this important methodological issue. METHODS This study proposed a new method to construct PRS by (1) approximating the polygenic model using a few principal components selected based on eigen-correlation in the discovery data; and (2) conducting principal component projection on the target data. Secondary data analysis was conducted on two large scale databases: the Study of Addiction: Genetics and Environment (SAGE; discovery data) and the National Longitudinal Study of Adolescent to Adult Health (Add Health; target data) to compare performance of the conventional and proposed methods. RESULT AND CONCLUSION The results show that the proposed method has higher prediction power and can handle participants from different ancestry backgrounds. We also provide practical recommendations for setting the linkage disequilibrium (LD) and p value thresholds.
Collapse
Affiliation(s)
- James J. Yang
- grid.267308.80000 0000 9206 2401Department of Biostatistics and Data Science, University of Texas Health Science Center, Houston, USA
| | - Xi Luo
- grid.267308.80000 0000 9206 2401Department of Biostatistics and Data Science, University of Texas Health Science Center, Houston, USA
| | - Elisa M. Trucco
- grid.65456.340000 0001 2110 1845Department of Psychology, Florida International University, Miami, USA ,grid.214458.e0000000086837370Department of Psychiatry, University of Michigan, Ann Arbor, USA
| | - Anne Buu
- grid.267308.80000 0000 9206 2401Department of Health Promotion and Behavioral Sciences, University of Texas Health Science Center, Houston, USA
| |
Collapse
|
16
|
Semagn K, Iqbal M, Alachiotis N, N'Diaye A, Pozniak C, Spaner D. Genetic diversity and selective sweeps in historical and modern Canadian spring wheat cultivars using the 90K SNP array. Sci Rep 2021; 11:23773. [PMID: 34893626 PMCID: PMC8664822 DOI: 10.1038/s41598-021-02666-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/22/2021] [Indexed: 12/14/2022] Open
Abstract
Previous molecular characterization studies conducted in Canadian wheat cultivars shed some light on the impact of plant breeding on genetic diversity, but the number of varieties and markers used was small. Here, we used 28,798 markers of the wheat 90K single nucleotide polymorphisms to (a) assess the extent of genetic diversity, relationship, population structure, and divergence among 174 historical and modern Canadian spring wheat varieties registered from 1905 to 2018 and 22 unregistered lines (hereinafter referred to as cultivars), and (b) identify genomic regions that had undergone selection. About 91% of the pairs of cultivars differed by 20-40% of the scored alleles, but only 7% of the pairs had kinship coefficients of < 0.250, suggesting the presence of a high proportion of redundancy in allelic composition. Although the 196 cultivars represented eight wheat classes, our results from phylogenetic, principal component, and the model-based population structure analyses revealed three groups, with no clear structure among most wheat classes, breeding programs, and breeding periods. FST statistics computed among different categorical variables showed little genetic differentiation (< 0.05) among breeding periods and breeding programs, but a diverse level of genetic differentiation among wheat classes and predicted groups. Diversity indices were the highest and lowest among cultivars registered from 1970 to 1980 and from 2011 to 2018, respectively. Using two outlier detection methods, we identified from 524 to 2314 SNPs and 41 selective sweeps of which some are close to genes with known phenotype, including plant height, photoperiodism, vernalization, gluten strength, and disease resistance.
Collapse
Affiliation(s)
- Kassa Semagn
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada.
| | - Muhammad Iqbal
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada
| | - Nikolaos Alachiotis
- Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 3230, Enschede, OV, The Netherlands
| | - Amidou N'Diaye
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Curtis Pozniak
- Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon, SK, S7N 5A8, Canada
| | - Dean Spaner
- Department of Agricultural, Food, and Nutritional Science, 4-10 Agriculture-Forestry Centre, University of Alberta, Edmonton, AB, T6G 2P5, Canada.
| |
Collapse
|
17
|
Khan N, Essemine J, Hamdani S, Qu M, Lyu MJA, Perveen S, Stirbet A, Govindjee G, Zhu XG. Natural variation in the fast phase of chlorophyll a fluorescence induction curve (OJIP) in a global rice minicore panel. PHOTOSYNTHESIS RESEARCH 2021; 150:137-158. [PMID: 33159615 DOI: 10.1007/s11120-020-00794-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Accepted: 10/26/2020] [Indexed: 06/11/2023]
Abstract
Photosynthesis can be probed through Chlorophyll a fluorescence induction (FI), which provides detailed insight into the electron transfer process in Photosystem II, and beyond. Here, we have systematically studied the natural variation of the fast phase of the FI, i.e. the OJIP phase, in rice. The OJIP phase of the Chl a fluorescence induction curve is referred to as "fast transient" lasting for less than a second; it is obtained after a dark-adapted sample is exposed to saturating light. In the OJIP curve, "O" stands for "origin" (minimal fluorescence), "P" for "peak" (maximum fluorescence), and J and I for inflection points between the O and P levels. Further, Fo is the fluorescence intensity at the "O" level, whereas Fm is the intensity at the P level, and Fv (= Fm - Fo) is the variable fluorescence. We surveyed a set of quantitative parameters derived from the FI curves of 199 rice accessions, grown under both field condition (FC) and growth room condition (GC). Our results show a significant variation between Japonica (JAP) and Indica (IND) subgroups, under both the growth conditions, in almost all the parameters derived from the OJIP curves. The ratio of the variable to the maximum (Fv/Fm) and of the variable to the minimum (Fv/Fo) fluorescence, the performance index (PIabs), as well as the amplitude of the I-P phase (AI-P) show higher values in JAP compared to that in the IND subpopulation. In contrast, the amplitude of the O-J phase (AO-J) and the normalized area above the OJIP curve (Sm) show an opposite trend. The performed genetic analysis shows that plants grown under GC appear much more affected by environmental factors than those grown in the field. We further conducted a genome-wide association study (GWAS) using 11 parameters derived from plants grown in the field. In total, 596 non-unique significant loci based on these parameters were identified by GWAS. Several photosynthesis-related proteins were identified to be associated with different OJIP parameters. We found that traits with high correlation are usually associated with similar genomic regions. Specifically, the thermal phase of FI, which includes the amplitudes of the J-I and I-P subphases (AJ-I and AI-P) of the OJIP curve, is, in turn, associated with certain common genomic regions. Our study is the first one dealing with the natural variations in rice, with the aim to characterize potential candidate genes controlling the magnitude and half-time of each of the phases in the OJIP FI curve.
Collapse
Affiliation(s)
- Naveed Khan
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Institute of Nutrition and Health, University of Chinese Academy of Science, Chinese Academy of Sciences, Shanghai, 200031, China
- State Key Laboratory for Plant Molecular Genetics and Center of Excellence for Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Jemaa Essemine
- State Key Laboratory for Plant Molecular Genetics and Center of Excellence for Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Saber Hamdani
- State Key Laboratory for Plant Molecular Genetics and Center of Excellence for Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Mingnan Qu
- State Key Laboratory for Plant Molecular Genetics and Center of Excellence for Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Ming-Ju Amy Lyu
- State Key Laboratory for Plant Molecular Genetics and Center of Excellence for Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Shahnaz Perveen
- State Key Laboratory for Plant Molecular Genetics and Center of Excellence for Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | | | - Govindjee Govindjee
- Department of Plant Biology, Department of Biochemistry, and Center of Biophysics & Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Xin-Guang Zhu
- State Key Laboratory for Plant Molecular Genetics and Center of Excellence for Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
18
|
Fu L, Wang Y, Li T, Hu YQ. A Novel Approach Integrating Hierarchical Clustering and Weighted Combination for Association Study of Multiple Phenotypes and a Genetic Variant. Front Genet 2021; 12:654804. [PMID: 34220938 PMCID: PMC8249926 DOI: 10.3389/fgene.2021.654804] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 04/20/2021] [Indexed: 11/26/2022] Open
Abstract
As a pivotal research tool, genome-wide association study has successfully identified numerous genetic variants underlying distinct diseases. However, these identified genetic variants only explain a small proportion of the phenotypic variation for certain diseases, suggesting that there are still more genetic signals to be detected. One of the reasons may be that one-phenotype one-variant association study is not so efficient in detecting variants of weak effects. Nowadays, it is increasingly worth noting that joint analysis of multiple phenotypes may boost the statistical power to detect pathogenic variants with weak genetic effects on complex diseases, providing more clues for their underlying biology mechanisms. So a Weighted Combination of multiple phenotypes following Hierarchical Clustering method (WCHC) is proposed for simultaneously analyzing multiple phenotypes in association studies. A series of simulations are conducted, and the results show that WCHC is either the most powerful method or comparable with the most powerful competitor in most of the simulation scenarios. Additionally, we evaluated the performance of WCHC in its application to the obesity-related phenotypes from Atherosclerosis Risk in Communities, and several associated variants are reported.
Collapse
Affiliation(s)
- Liwan Fu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Center for Non-communicable Disease Management, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, China
| | - Yuquan Wang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingting Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
19
|
Reisetter AC, Breheny P. Penalized linear mixed models for structured genetic data. Genet Epidemiol 2021; 45:427-444. [PMID: 33998038 DOI: 10.1002/gepi.22384] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 03/19/2021] [Accepted: 03/29/2021] [Indexed: 11/12/2022]
Abstract
Many genetic studies that aim to identify genetic variants associated with complex phenotypes are subject to unobserved confounding factors arising from environmental heterogeneity. This poses a challenge to detecting associations of interest and is known to induce spurious associations when left unaccounted for. Penalized linear mixed models (LMMs) are an attractive method to correct for unobserved confounding. These methods correct for varying levels of relatedness and population structure by modeling it as a random effect with a covariance structure estimated from observed genetic data. Despite an extensive literature on penalized regression and LMMs separately, the two are rarely discussed together. The aim of this review is to do so while examining the statistical properties of penalized LMMs in the genetic association setting. Specifically, the ability of penalized LMMs to accurately estimate genetic effects in the presence of environmental confounding has not been well studied. To clarify the important yet subtle distinction between population structure and environmental heterogeneity, we present a detailed review of relevant concepts and methods. In addition, we evaluate the performance of penalized LMMs and competing methods in terms of estimation and selection accuracy in the presence of a number of confounding structures.
Collapse
Affiliation(s)
- Anna C Reisetter
- Department of Biostatistics, University of Iowa, Iowa City, Iowa, USA
| | - Patrick Breheny
- Department of Biostatistics, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
20
|
Hoffman GE, Roussos P. Dream: powerful differential expression analysis for repeated measures designs. Bioinformatics 2021; 37:192-201. [PMID: 32730587 DOI: 10.1093/bioinformatics/btaa687] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 07/13/2020] [Accepted: 07/23/2020] [Indexed: 01/08/2023] Open
Abstract
SUMMARY Large-scale transcriptome studies with multiple samples per individual are widely used to study disease biology. Yet, current methods for differential expression are inadequate for cross-individual testing for these repeated measures designs. Most problematic, we observe across multiple datasets that current methods can give reproducible false-positive findings that are driven by genetic regulation of gene expression, yet are unrelated to the trait of interest. Here, we introduce a statistical software package, dream, that increases power, controls the false positive rate, enables multiple types of hypothesis tests, and integrates with standard workflows. In 12 analyses in 6 independent datasets, dream yields biological insight not found with existing software while addressing the issue of reproducible false-positive findings. AVAILABILITY AND IMPLEMENTATION Dream is available within the variancePartition Bioconductor package at http://bioconductor.org/packages/variancePartition. CONTACT gabriel.hoffman@mssm.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel E Hoffman
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Panos Roussos
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.,Mental Illness Research, Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY 10468, USA
| |
Collapse
|
21
|
Depardieu C, Gérardi S, Nadeau S, Parent GJ, Mackay J, Lenz P, Lamothe M, Girardin MP, Bousquet J, Isabel N. Connecting tree-ring phenotypes, genetic associations and transcriptomics to decipher the genomic architecture of drought adaptation in a widespread conifer. Mol Ecol 2021; 30:3898-3917. [PMID: 33586257 PMCID: PMC8451828 DOI: 10.1111/mec.15846] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 01/15/2021] [Accepted: 01/27/2021] [Indexed: 01/02/2023]
Abstract
As boreal forests face significant threats from climate change, understanding evolutionary trajectories of coniferous species has become fundamental to adapting management and conservation to a drying climate. We examined the genomic architecture underlying adaptive variation related to drought tolerance in 43 populations of a widespread boreal conifer, white spruce (Piceaglauca [Moench] Voss), by combining genotype–environment associations, genotype–phenotype associations, and transcriptomics. Adaptive genetic variation was identified by correlating allele frequencies for 6,153 single nucleotide polymorphisms from 2,606 candidate genes with temperature, precipitation and aridity gradients, and testing for significant associations between genotypes and 11 dendrometric and drought‐related traits (i.e., anatomical, growth response and climate‐sensitivity traits) using a polygenic model. We identified a set of 285 genes significantly associated with a climatic factor or a phenotypic trait, including 110 that were differentially expressed in response to drought under greenhouse‐controlled conditions. The interlinked phenotype–genotype–environment network revealed eight high‐confidence genes involved in white spruce adaptation to drought, of which four were drought‐responsive in the expression analysis. Our findings represent a significant step toward the characterization of the genomic basis of drought tolerance and adaptation to climate in conifers, which is essential to enable the establishment of resilient forests in view of new climate conditions. see also the Perspective by Lars Opgenoorth and Christian Rellstab
Collapse
Affiliation(s)
- Claire Depardieu
- Canada Research Chair in Forest GenomicsInstitute for Systems and Integrative BiologyUniversité LavalQuébecQCCanada
- Centre for Forest ResearchDépartement des sciences du bois et de la forêtUniversité LavalQuébecQCCanada
- Natural Resources CanadaCanadian Forest ServiceLaurentian Forestry CenterQuébecQCCanada
| | - Sébastien Gérardi
- Canada Research Chair in Forest GenomicsInstitute for Systems and Integrative BiologyUniversité LavalQuébecQCCanada
- Centre for Forest ResearchDépartement des sciences du bois et de la forêtUniversité LavalQuébecQCCanada
| | - Simon Nadeau
- Natural Resources CanadaCanadian Forest ServiceCanadian Wood Fibre CenterQuébecQCCanada
| | - Geneviève J. Parent
- Laboratory of GenomicsMaurice‐Lamontagne Institute, Fisheries and Oceans CanadaMont‐JoliQCCanada
| | - John Mackay
- Canada Research Chair in Forest GenomicsInstitute for Systems and Integrative BiologyUniversité LavalQuébecQCCanada
- Department of Plant SciencesUniversity of OxfordOxfordUK
| | - Patrick Lenz
- Canada Research Chair in Forest GenomicsInstitute for Systems and Integrative BiologyUniversité LavalQuébecQCCanada
- Natural Resources CanadaCanadian Forest ServiceCanadian Wood Fibre CenterQuébecQCCanada
| | - Manuel Lamothe
- Canada Research Chair in Forest GenomicsInstitute for Systems and Integrative BiologyUniversité LavalQuébecQCCanada
- Natural Resources CanadaCanadian Forest ServiceLaurentian Forestry CenterQuébecQCCanada
| | - Martin P. Girardin
- Natural Resources CanadaCanadian Forest ServiceLaurentian Forestry CenterQuébecQCCanada
- Centre for Forest ResearchUniversité du Québec à MontréalMontréalQCCanada
| | - Jean Bousquet
- Canada Research Chair in Forest GenomicsInstitute for Systems and Integrative BiologyUniversité LavalQuébecQCCanada
- Centre for Forest ResearchDépartement des sciences du bois et de la forêtUniversité LavalQuébecQCCanada
| | - Nathalie Isabel
- Canada Research Chair in Forest GenomicsInstitute for Systems and Integrative BiologyUniversité LavalQuébecQCCanada
- Centre for Forest ResearchDépartement des sciences du bois et de la forêtUniversité LavalQuébecQCCanada
- Natural Resources CanadaCanadian Forest ServiceLaurentian Forestry CenterQuébecQCCanada
| |
Collapse
|
22
|
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Duroux D, Gusareva ES, Wei Z, Hakonarson H, Van Steen K. Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure. BioData Min 2021; 14:16. [PMID: 33608043 PMCID: PMC7893746 DOI: 10.1186/s13040-021-00247-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 02/07/2021] [Indexed: 12/15/2022] Open
Abstract
Background In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity. Supplementary Information The online version contains supplementary material available at 10.1186/s13040-021-00247-w.
Collapse
Affiliation(s)
- Fentaw Abegaz
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.
| | | | | | | | - Archana Bhardwaj
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Diane Duroux
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Elena S Gusareva
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.,Department of Pediatrics, Division of Human Genetics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristel Van Steen
- GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium.,WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liège, Liège, Belgium
| |
Collapse
|
23
|
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
24
|
DeVogel N, Auer PL, Manansala R, Rau A, Wang T. A unified linear mixed model for familial relatedness and population structure in genetic association studies. Genet Epidemiol 2020; 45:305-315. [PMID: 33175443 DOI: 10.1002/gepi.22371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 09/14/2020] [Accepted: 10/20/2020] [Indexed: 11/10/2022]
Abstract
Familial relatedness (FR) and population structure (PS) are two major sources for genetic correlation. In the human population, both FR and PS can further break down into additive and dominant components to account for potential additive and dominant genetic effects. In this study, besides the classical additive genomic relationship matrix, a dominant genomic relationship matrix is introduced. A link between the additive/dominant genomic relationship matrices and the coancestry (or kinship)/double coancestry coefficients is also established. In addition, a way to separate the FR and PS correlations based on the estimates of coancestry and double coancestry coefficients from the genomic relationship matrices is proposed. A unified linear mixed model is also developed, which can account for both the additive and dominance effects of FR and PS correlations as well as their possible random interactions. Finally, this unified linear mixed model is applied to analyze two study cohorts from UK Biobank.
Collapse
Affiliation(s)
- Nicholas DeVogel
- Division of Biostatistics, Institute for Health and Equity, Milwaukee, Wisconsin, USA
| | - Paul L Auer
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
| | - Regina Manansala
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
| | - Andrea Rau
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA.,INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas, France
| | - Tao Wang
- Division of Biostatistics, Institute for Health and Equity, Milwaukee, Wisconsin, USA
| |
Collapse
|
25
|
Wan Y, Wick RR, Zobel J, Ingle DJ, Inouye M, Holt KE. GeneMates: an R package for detecting horizontal gene co-transfer between bacteria using gene-gene associations controlled for population structure. BMC Genomics 2020; 21:658. [PMID: 32972363 PMCID: PMC7513276 DOI: 10.1186/s12864-020-07019-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 08/20/2020] [Indexed: 12/15/2022] Open
Abstract
Background Horizontal gene transfer contributes to bacterial evolution through mobilising genes across various taxonomical boundaries. It is frequently mediated by mobile genetic elements (MGEs), which may capture, maintain, and rearrange mobile genes and co-mobilise them between bacteria, causing horizontal gene co-transfer (HGcoT). This physical linkage between mobile genes poses a great threat to public health as it facilitates dissemination and co-selection of clinically important genes amongst bacteria. Although rapid accumulation of bacterial whole-genome sequencing data since the 2000s enables study of HGcoT at the population level, results based on genetic co-occurrence counts and simple association tests are usually confounded by bacterial population structure when sampled bacteria belong to the same species, leading to spurious conclusions. Results We have developed a network approach to explore WGS data for evidence of intraspecies HGcoT and have implemented it in R package GeneMates (github.com/wanyuac/GeneMates). The package takes as input an allelic presence-absence matrix of interested genes and a matrix of core-genome single-nucleotide polymorphisms, performs association tests with linear mixed models controlled for population structure, produces a network of significantly associated alleles, and identifies clusters within the network as plausible co-transferred alleles. GeneMates users may choose to score consistency of allelic physical distances measured in genome assemblies using a novel approach we have developed and overlay scores to the network for further evidence of HGcoT. Validation studies of GeneMates on known acquired antimicrobial resistance genes in Escherichia coli and Salmonella Typhimurium show advantages of our network approach over simple association analysis: (1) distinguishing between allelic co-occurrence driven by HGcoT and that driven by clonal reproduction, (2) evaluating effects of population structure on allelic co-occurrence, and (3) direct links between allele clusters in the network and MGEs when physical distances are incorporated. Conclusion GeneMates offers an effective approach to detection of intraspecies HGcoT using WGS data.
Collapse
Affiliation(s)
- Yu Wan
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, 3010, Victoria, Australia.
| | - Ryan R Wick
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, 3010, Victoria, Australia.,Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia
| | - Justin Zobel
- School of Computing and Information Systems, University of Melbourne, Parkville, 3010, Victoria, Australia
| | - Danielle J Ingle
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at The Peter Doherty Institute for Infection and Immunity, Parkville, 3010, Victoria, Australia.,National Centre for Epidemiology and Population Health, Australian National University, Canberra, 2601, Australian Capital Territory, Australia
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, 3004, Victoria, Australia.,Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, CB1 8RN, England, UK
| | - Kathryn E Holt
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, 3004, Victoria, Australia.,Department of Infection Biology, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK
| |
Collapse
|
26
|
Abegaz F, Chaichoompu K, Génin E, Fardo DW, König IR, Mahachie John JM, Van Steen K. Principals about principal components in statistical genetics. Brief Bioinform 2020; 20:2200-2216. [PMID: 30219892 DOI: 10.1093/bib/bby081] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 07/21/2018] [Accepted: 08/12/2018] [Indexed: 12/13/2022] Open
Abstract
Principal components (PCs) are widely used in statistics and refer to a relatively small number of uncorrelated variables derived from an initial pool of variables, while explaining as much of the total variance as possible. Also in statistical genetics, principal component analysis (PCA) is a popular technique. To achieve optimal results, a thorough understanding about the different implementations of PCA is required and their impact on study results, compared to alternative approaches. In this review, we focus on the possibilities, limitations and role of PCs in ancestry prediction, genome-wide association studies, rare variants analyses, imputation strategies, meta-analysis and epistasis detection. We also describe several variations of classic PCA that deserve increased attention in statistical genetics applications.
Collapse
|
27
|
Hines O, Diaz-Ordaz K, Vansteelandt S, Jamshidi Y. Causal graphs for the analysis of genetic cohort data. Physiol Genomics 2020; 52:369-378. [PMID: 32687429 DOI: 10.1152/physiolgenomics.00115.2019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The increasing availability of genetic cohort data has led to many genome-wide association studies (GWAS) successfully identifying genetic associations with an ever-expanding list of phenotypic traits. Association, however, does not imply causation, and therefore methods have been developed to study the issue of causality. Under additional assumptions, Mendelian randomization (MR) studies have proved popular in identifying causal effects between two phenotypes, often using GWAS summary statistics. Given the widespread use of these methods, it is more important than ever to understand, and communicate, the causal assumptions upon which they are based, so that methods are transparent, and findings are clinically relevant. Causal graphs can be used to represent causal assumptions graphically and provide insights into the limitations associated with different analysis methods. Here we review GWAS and MR from a causal perspective, to build up intuition for causal diagrams in genetic problems. We also examine issues of confounding by ancestry and comment on approaches for dealing with such confounding, as well as discussing approaches for dealing with selection biases arising from study design.
Collapse
Affiliation(s)
- Oliver Hines
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom.,Molecular and Clinical Sciences Institute, St. George's, University of London, London, United Kingdom
| | - Karla Diaz-Ordaz
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Stijn Vansteelandt
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom.,Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Yalda Jamshidi
- Molecular and Clinical Sciences Institute, St. George's, University of London, London, United Kingdom
| |
Collapse
|
28
|
Low Additive Genetic Variation in a Trait Under Selection in Domesticated Rice. G3-GENES GENOMES GENETICS 2020; 10:2435-2443. [PMID: 32439738 PMCID: PMC7341149 DOI: 10.1534/g3.120.401194] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Quantitative traits are important targets of both natural and artificial selection. The genetic architecture of these traits and its change during the adaptive process is thus of fundamental interest. The fate of the additive effects of variants underlying a trait receives particular attention because they constitute the genetic variation component that is transferred from parents to offspring and thus governs the response to selection. While estimation of this component of phenotypic variation is challenging, the increasing availability of dense molecular markers puts it within reach. Inbred plant species offer an additional advantage because phenotypes of genetically identical individuals can be measured in replicate. This makes it possible to estimate marker effects separately from the contribution of the genetic background not captured by genotyped loci. We focused on root growth in domesticated rice, Oryza sativa, under normal and aluminum (Al) stress conditions, a trait under recent selection because it correlates with survival under drought. A dense single nucleotide polymorphism (SNP) map is available for all accessions studied. Taking advantage of this map and a set of Bayesian models, we assessed additive marker effects. While total genetic variation accounted for a large proportion of phenotypic variance, marker effects contributed little information, particularly in the Al-tolerant tropical japonica population of rice. We were unable to identify any loci associated with root growth in this population. Models estimating the aggregate effects of all measured genotypes likewise produced low estimates of marker heritability and were unable to predict total genetic values accurately. Our results support the long-standing conjecture that additive genetic variation is depleted in traits under selection. We further provide evidence that this depletion is due to the prevalence of low-frequency alleles that underlie the trait.
Collapse
|
29
|
Whole-genome genotyping and resequencing reveal the association of a deletion in the complex interferon alpha gene cluster with hypothyroidism in dogs. BMC Genomics 2020; 21:307. [PMID: 32299354 PMCID: PMC7160888 DOI: 10.1186/s12864-020-6700-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/24/2020] [Indexed: 12/30/2022] Open
Abstract
Background Hypothyroidism is a common complex endocrinopathy that typically has an autoimmune etiology, and it affects both humans and dogs. Genetic and environmental factors are both known to play important roles in the disease development. In this study, we sought to identify the genetic risk factors potentially involved in the susceptibility to the disease in the high-risk Giant Schnauzer dog breed. Results By employing genome-wide association followed by fine-mapping (top variant p-value = 5.7 × 10− 6), integrated with whole-genome resequencing and copy number variation analysis, we detected a ~ 8.9 kbp deletion strongly associated (p-value = 0.0001) with protection against development of hypothyroidism. The deletion is located between two predicted Interferon alpha (IFNA) genes and it may eliminate functional elements potentially involved in the transcriptional regulation of these genes. Remarkably, type I IFNs have been extensively associated to human autoimmune hypothyroidism and general autoimmunity. Nonetheless, the extreme genomic complexity of the associated region on CFA11 warrants further long-read sequencing and annotation efforts in order to ascribe functions to the identified deletion and to characterize the canine IFNA gene cluster in more detail. Conclusions Our results expand the current knowledge on genetic determinants of canine hypothyroidism by revealing a significant link with the human counterpart disease, potentially translating into better diagnostic tools across species, and may contribute to improved canine breeding strategies.
Collapse
|
30
|
Hussain W, Campbell MT, Jarquin D, Walia H, Morota G. Variance heterogeneity genome-wide mapping for cadmium in bread wheat reveals novel genomic loci and epistatic interactions. THE PLANT GENOME 2020; 13:e20011. [PMID: 33016629 DOI: 10.1002/tpg2.20011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 01/22/2020] [Indexed: 06/11/2023]
Abstract
Genome-wide association mapping identifies quantitative trait loci (QTL) that influence the mean differences between the marker genotypes for a given trait. While most loci influence the mean value of a trait, certain loci, known as variance heterogeneity QTL (vQTL) determine the variability of the trait instead of the mean trait value (mQTL). In the present study, we performed a variance heterogeneity genome-wide association study (vGWAS) for grain cadmium (Cd) concentration in bread wheat. We used double generalized linear model and hierarchical generalized linear model to identify vQTL associated with grain Cd. We identified novel vQTL regions on chromosomes 2A and 2B that contribute to the Cd variation and loci that affect both mean and variance heterogeneity (mvQTL) on chromosome 5A. In addition, our results demonstrated the presence of epistatic interactions between vQTL and mvQTL, which could explain variance heterogeneity. Overall, we provide novel insights into the genetic architecture of grain Cd concentration and report the first application of vGWAS in wheat. Moreover, our findings indicated that epistasis is an important mechanism underlying natural variation for grain Cd concentration.
Collapse
Affiliation(s)
- Waseem Hussain
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68583, USA
| | - Malachy T Campbell
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Diego Jarquin
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68583, USA
| | - Harkamal Walia
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68583, USA
| | - Gota Morota
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| |
Collapse
|
31
|
Lawson DJ, Davies NM, Haworth S, Ashraf B, Howe L, Crawford A, Hemani G, Davey Smith G, Timpson NJ. Is population structure in the genetic biobank era irrelevant, a challenge, or an opportunity? Hum Genet 2020; 139:23-41. [PMID: 31030318 PMCID: PMC6942007 DOI: 10.1007/s00439-019-02014-8] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2018] [Accepted: 04/12/2019] [Indexed: 12/11/2022]
Abstract
Replicable genetic association signals have consistently been found through genome-wide association studies in recent years. The recent dramatic expansion of study sizes improves power of estimation of effect sizes, genomic prediction, causal inference, and polygenic selection, but it simultaneously increases susceptibility of these methods to bias due to subtle population structure. Standard methods using genetic principal components to correct for structure might not always be appropriate and we use a simulation study to illustrate when correction might be ineffective for avoiding biases. New methods such as trans-ethnic modeling and chromosome painting allow for a richer understanding of the relationship between traits and population structure. We illustrate the arguments using real examples (stroke and educational attainment) and provide a more nuanced understanding of population structure, which is set to be revisited as a critical aspect of future analyses in genetic epidemiology. We also make simple recommendations for how problems can be avoided in the future. Our results have particular importance for the implementation of GWAS meta-analysis, for prediction of traits, and for causal inference.
Collapse
Affiliation(s)
- Daniel John Lawson
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK.
| | - Neil Martin Davies
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| | - Simon Haworth
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| | - Bilal Ashraf
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| | - Laurence Howe
- Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, Gower Street, London, WC1E 6BT, UK
| | - Andrew Crawford
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| | - Gibran Hemani
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| | - George Davey Smith
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| | - Nicholas John Timpson
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| |
Collapse
|
32
|
Genome-wide association mapping for adult resistance to powdery mildew in common wheat. Mol Biol Rep 2019; 47:1241-1256. [PMID: 31813131 DOI: 10.1007/s11033-019-05225-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 12/04/2019] [Indexed: 12/23/2022]
Abstract
Blumeria graminis f. sp. tritici, the causal agent of wheat powdery mildew disease, can occur at all stages of the crop and constantly threatens wheat production. To identify candidate resistance genes for powdery mildew, we performed GWAS (genome-wide association studies) on a total set of 329 wheat varieties obtained from different origins. These wheat materials were genotyped using wheat 90K SNP array and evaluated for their resistance in either field or glasshouse condition from 2016 to 2018. Using a mixed linear model, 33 SNP markers of which 14 QTL (quantitative trait loci) were later defined were observed to associate with powdery mildew resistance. Among these, QTL on chromosome 3A, 3B, 6D and 7D were concluded as potentially new QTL. Exploration of candidate genes for new QTL suggested roles of these genes involved in encoding disease resistance and defence-related proteins, and regulating early immune response to the pathogen. Overall, the results reveal that GWAS can be an effective means of identifying marker-trait associations, though further functional validation and fine-mapping of gene candidates are required before creating opportunities for developing new resistant genotypes.
Collapse
|
33
|
Zhang W, Dai X, Xu S, Zhao PX. GPU empowered pipelines for calculating genome-wide kinship matrices with ultra-high dimensional genetic variants and facilitating 1D and 2D GWAS. NAR Genom Bioinform 2019; 2:lqz009. [PMID: 33575561 PMCID: PMC7671369 DOI: 10.1093/nargab/lqz009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/22/2019] [Accepted: 09/25/2019] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association study (GWAS) is a powerful approach that has revolutionized the field of quantitative genetics. Two-dimensional GWAS that accounts for epistatic genetic effects needs to consider the effects of marker pairs, thus quadratic genetic variants, compared to one-dimensional GWAS that accounts for individual genetic variants. Calculating genome-wide kinship matrices in GWAS that account for relationships among individuals represented by ultra-high dimensional genetic variants is computationally challenging. Fortunately, kinship matrix calculation involves pure matrix operations and the algorithms can be parallelized, particular on graphics processing unit (GPU)-empowered high-performance computing (HPC) architectures. We have devised a new method and two pipelines: KMC1D and KMC2D for kinship matrix calculation with high-dimensional genetic variants, respectively, facilitating 1D and 2D GWAS analyses. We first divide the ultra-high-dimensional markers and marker pairs into successive blocks. We then calculate the kinship matrix for each block and merge together the block-wise kinship matrices to form the genome-wide kinship matrix. All the matrix operations have been parallelized using GPU kernels on our NVIDIA GPU-accelerated server platform. The performance analyses show that the calculation speed of KMC1D and KMC2D can be accelerated by 100–400 times over the conventional CPU-based computing.
Collapse
Affiliation(s)
- Wenchao Zhang
- Noble Research Institute, LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Xinbin Dai
- Noble Research Institute, LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | - Patrick X Zhao
- Noble Research Institute, LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| |
Collapse
|
34
|
Bustos‐Korts D, Dawson IK, Russell J, Tondelli A, Guerra D, Ferrandi C, Strozzi F, Nicolazzi EL, Molnar‐Lang M, Ozkan H, Megyeri M, Miko P, Çakır E, Yakışır E, Trabanco N, Delbono S, Kyriakidis S, Booth A, Cammarano D, Mascher M, Werner P, Cattivelli L, Rossini L, Stein N, Kilian B, Waugh R, van Eeuwijk FA. Exome sequences and multi-environment field trials elucidate the genetic basis of adaptation in barley. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 99:1172-1191. [PMID: 31108005 PMCID: PMC6851764 DOI: 10.1111/tpj.14414] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Revised: 04/30/2019] [Accepted: 05/13/2019] [Indexed: 05/25/2023]
Abstract
Broadening the genetic base of crops is crucial for developing varieties to respond to global agricultural challenges such as climate change. Here, we analysed a diverse panel of 371 domesticated lines of the model crop barley to explore the genetics of crop adaptation. We first collected exome sequence data and phenotypes of key life history traits from contrasting multi-environment common garden trials. Then we applied refined statistical methods, including some based on exomic haplotype states, for genotype-by-environment (G×E) modelling. Sub-populations defined from exomic profiles were coincident with barley's biology, geography and history, and explained a high proportion of trial phenotypic variance. Clear G×E interactions indicated adaptation profiles that varied for landraces and cultivars. Exploration of circadian clock-related genes, associated with the environmentally adaptive days to heading trait (crucial for the crop's spread from the Fertile Crescent), illustrated complexities in G×E effect directions, and the importance of latitudinally based genic context in the expression of large-effect alleles. Our analysis supports a gene-level scientific understanding of crop adaption and leads to practical opportunities for crop improvement, allowing the prioritisation of genomic regions and particular sets of lines for breeding efforts seeking to cope with climate change and other stresses.
Collapse
Affiliation(s)
- Daniela Bustos‐Korts
- BiometrisWageningen University and Research CentrePO Box 166700 ACWageningenThe Netherlands
| | - Ian K. Dawson
- Cell and Molecular SciencesJames Hutton InstituteInvergowrie, DundeeUK
| | - Joanne Russell
- Cell and Molecular SciencesJames Hutton InstituteInvergowrie, DundeeUK
| | - Alessandro Tondelli
- CREA – Research Centre for Genomics and BioinformaticsVia S. Protaso 30229017Fiorenzuola d'ArdaItaly
| | - Davide Guerra
- CREA – Research Centre for Genomics and BioinformaticsVia S. Protaso 30229017Fiorenzuola d'ArdaItaly
| | - Chiara Ferrandi
- PTP Science ParkVia Einstein, Loc. Cascina Codazza26900LodiItaly
| | | | | | - Marta Molnar‐Lang
- Agricultural InstituteCentre for Agricultural ResearchHungarian Academy of Sciences2462MartonvásárHungary
| | - Hakan Ozkan
- University of ÇukurovaFaculty of AgricultureDepartment of Field Crops01330AdanaTurkey
| | - Maria Megyeri
- Agricultural InstituteCentre for Agricultural ResearchHungarian Academy of Sciences2462MartonvásárHungary
| | - Peter Miko
- Agricultural InstituteCentre for Agricultural ResearchHungarian Academy of Sciences2462MartonvásárHungary
| | - Esra Çakır
- University of ÇukurovaFaculty of AgricultureDepartment of Field Crops01330AdanaTurkey
| | - Enes Yakışır
- Bahri Dagdas International Agricultural Research InstituteKonyaTurkey
| | - Noemi Trabanco
- Università degli Studi di Milano – DiSAAVia Celoria 220133MilanoItaly
| | - Stefano Delbono
- CREA – Research Centre for Genomics and BioinformaticsVia S. Protaso 30229017Fiorenzuola d'ArdaItaly
| | | | - Allan Booth
- Cell and Molecular SciencesJames Hutton InstituteInvergowrie, DundeeUK
| | - Davide Cammarano
- Cell and Molecular SciencesJames Hutton InstituteInvergowrie, DundeeUK
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)06466SeelandGermany
| | - Peter Werner
- KWS UK Ltd56 Church StreetThriplow, RoystonSG8 7REUK
| | - Luigi Cattivelli
- CREA – Research Centre for Genomics and BioinformaticsVia S. Protaso 30229017Fiorenzuola d'ArdaItaly
| | - Laura Rossini
- Università degli Studi di Milano – DiSAAVia Celoria 220133MilanoItaly
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)06466SeelandGermany
| | - Benjamin Kilian
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)06466SeelandGermany
- Present address:
Global Crop Diversity TrustPlatz der Vereinten Nationen 753113BonnGermany
| | - Robbie Waugh
- Cell and Molecular SciencesJames Hutton InstituteInvergowrie, DundeeUK
- Division of Plant SciencesSchool of Life SciencesUniversity of DundeeDow StreetDundeeDD1 5EHUK
| | - Fred A. van Eeuwijk
- BiometrisWageningen University and Research CentrePO Box 166700 ACWageningenThe Netherlands
| |
Collapse
|
35
|
Guo Y, Wu C, Guo M, Zou Q, Liu X, Keinan A. Combining Sparse Group Lasso and Linear Mixed Model Improves Power to Detect Genetic Variants Underlying Quantitative Traits. Front Genet 2019; 10:271. [PMID: 31024614 PMCID: PMC6469383 DOI: 10.3389/fgene.2019.00271] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2018] [Accepted: 03/12/2019] [Indexed: 11/13/2022] Open
Abstract
Genome-Wide association studies (GWAS), based on testing one single nucleotide polymorphism (SNP) at a time, have revolutionized our understanding of the genetics of complex traits. In GWAS, there is a need to consider confounding effects such as due to population structure, and take groups of SNPs into account simultaneously due to the “polygenic” attribute of complex quantitative traits. In this paper, we propose a new approach SGL-LMM that puts together sparse group lasso (SGL) and linear mixed model (LMM) for multivariate associations of quantitative traits. LMM, as has been often used in GWAS, controls for confounders, while SGL maintains sparsity of the underlying multivariate regression model. SGL-LMM first sets a fixed zero effect to learn the parameters of random effects using LMM, and then estimates fixed effects using SGL regularization. We present efficient algorithms for hyperparameter tuning and feature selection using stability selection. While controlling for confounders and constraining for sparse solutions, SGL-LMM also provides a natural framework for incorporating prior biological information into the group structure underlying the model. Results based on both simulated and real data show SGL-LMM outperforms previous approaches in terms of power to detect associations and accuracy of quantitative trait prediction.
Collapse
Affiliation(s)
- Yingjie Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.,Department of Computational Biology, Cornell University, Ithaca, NY, United States
| | - Chenxi Wu
- Department of Mathematics, Rutgers University, Piscataway, NJ, United States
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.,School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Alon Keinan
- Department of Computational Biology, Cornell University, Ithaca, NY, United States.,Cornell Center for Comparative and Population Genomics, Center for Vertebrate Genomics, and Center for Enervating Neuroimmune Disease, Cornell University, Ithaca, NY, United States
| |
Collapse
|
36
|
Gianola D, Fernando RL, Garrick DJ. A certain invariance property of BLUE in a whole-genome regression context. J Anim Breed Genet 2019; 136:113-117. [PMID: 30614572 PMCID: PMC6850311 DOI: 10.1111/jbg.12378] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 12/03/2018] [Accepted: 12/06/2018] [Indexed: 11/30/2022]
Abstract
A curious result from mixed linear models applied to genome-wide association studies was expanded. In particular, a model in which one or more markers are considered as fixed but are allowed to contribute to the covariance structure by treating such markers as random as well was examined. The best linear unbiased estimator of marker effects is invariant with respect to whether those markers are employed in constructing a genomic relationship matrix or are ignored, provided marker effects are uncorrelated with those not being tested. Also, the implications of regarding some marker effects as fixed when, in fact, these possess a non-trivial covariance structure with those declared as random were examined.
Collapse
Affiliation(s)
- Daniel Gianola
- Department of Animal Science, Iowa State University, Ames, Iowa.,Departments of Animal Sciences and Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| | - Rohan L Fernando
- Departments of Animal Sciences and Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| | - Dorian J Garrick
- AL Rae Centre of Genetics and Breeding, Massey University, Palmerston North, New Zealand
| |
Collapse
|
37
|
Wang H, Aragam B, Xing EP. Variable selection in heterogeneous datasets: A truncated-rank sparse linear mixed model with applications to genome-wide association studies. Methods 2018; 145:2-9. [PMID: 29705212 DOI: 10.1016/j.ymeth.2018.04.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 04/14/2018] [Accepted: 04/23/2018] [Indexed: 10/17/2022] Open
Abstract
A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and human, and discuss the knowledge we discover with our method.
Collapse
Affiliation(s)
- Haohan Wang
- Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Bryon Aragam
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Eric P Xing
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
38
|
Zhu H, Zhang S, Sha Q. A novel method to test associations between a weighted combination of phenotypes and genetic variants. PLoS One 2018; 13:e0190788. [PMID: 29329304 PMCID: PMC5766098 DOI: 10.1371/journal.pone.0190788] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2017] [Accepted: 12/20/2017] [Indexed: 11/18/2022] Open
Abstract
Many complex diseases like diabetes, hypertension, metabolic syndrome, et cetera, are measured by multiple correlated phenotypes. However, most genome-wide association studies (GWAS) focus on one phenotype of interest or study multiple phenotypes separately for identifying genetic variants associated with complex diseases. Analyzing one phenotype or the related phenotypes separately may lose power due to ignoring the information obtained by combining phenotypes, such as the correlation between phenotypes. In order to increase statistical power to detect genetic variants associated with complex diseases, we develop a novel method to test a weighted combination of multiple phenotypes (WCmulP). We perform extensive simulation studies as well as real data (COPDGene) analysis to evaluate the performance of the proposed method. Our simulation results show that WCmulP has correct type I error rates and is either the most powerful test or comparable to the most powerful test among the methods we compared. WCmulP also has an outstanding performance for identifying single-nucleotide polymorphisms (SNPs) associated with COPD-related phenotypes.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
- * E-mail:
| |
Collapse
|
39
|
Fonseca PAS, Leal TP, Santos FC, Gouveia MH, Id-Lahoucine S, Rosse IC, Ventura RV, Bruneli FAT, Machado MA, Peixoto MGCD, Tarazona-Santos E, Carvalho MRS. Reducing cryptic relatedness in genomic data sets via a central node exclusion algorithm. Mol Ecol Resour 2017; 18:435-447. [PMID: 29271609 DOI: 10.1111/1755-0998.12746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Revised: 12/04/2017] [Accepted: 12/14/2017] [Indexed: 11/30/2022]
Abstract
Cryptic relatedness is a confounding factor in genetic diversity and genetic association studies. Development of strategies to reduce cryptic relatedness in a sample is a crucial step for downstream genetic analyses. This study uses a node selection algorithm, based on network degrees of centrality, to evaluate its applicability and impact on evaluation of genetic diversity and population stratification. 1,036 Guzerá (Bos indicus) females were genotyped using Illumina Bovine SNP50 v2 BeadChip. Four strategies were compared. The first and second strategies consist on a iterative exclusion of most related individuals based on PLINK kinship coefficient (φij) and VanRaden's φij, respectively. The third and fourth strategies were based on a node selection algorithm. The fourth strategy, Network G matrix, preserved the larger number of individuals with a better diversity and representation from the initial sample. Determining the most probable number of populations was directly affected by the kinship metric. Network G matrix was the better strategy for reducing relatedness due to producing a larger sample, with more distant individuals, a more similar distribution when compared with the full data set in the MDS plots and keeping a better representation of the population structure. Resampling strategies using VanRaden's φij as a relationship metric was better to infer the relationships among individuals. Moreover, the resampling strategies directly impact the genomic inflation values in genomewide association studies. The use of the node selection algorithm also implies better selection of the most central individuals to be removed, providing a more representative sample.
Collapse
Affiliation(s)
- Pablo A S Fonseca
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Thiago P Leal
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Fernanda C Santos
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Mateus H Gouveia
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Samir Id-Lahoucine
- Center for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada
| | - Izinara C Rosse
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Ricardo V Ventura
- Center for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, Canada.,Beef Improvement Opportunities, Guelph, ON, Canada
| | | | | | | | - Eduardo Tarazona-Santos
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Maria Raquel S Carvalho
- Departamento de Biologia Geral, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| |
Collapse
|
40
|
Wang H, Aragam B, Xing EP. Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2017; 2017:431-438. [PMID: 29629235 DOI: 10.1109/bibm.2017.8217687] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of individual relationships in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and humans, and discuss the knowledge we discover with our model.
Collapse
Affiliation(s)
- Haohan Wang
- Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
| | - Bryon Aragam
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
| | - Eric P Xing
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
| |
Collapse
|
41
|
Meyers-Wallen VN, Boyko AR, Danko CG, Grenier JK, Mezey JG, Hayward JJ, Shannon LM, Gao C, Shafquat A, Rice EJ, Pujar S, Eggers S, Ohnesorg T, Sinclair AH. XX Disorder of Sex Development is associated with an insertion on chromosome 9 and downregulation of RSPO1 in dogs (Canis lupus familiaris). PLoS One 2017; 12:e0186331. [PMID: 29053721 PMCID: PMC5650465 DOI: 10.1371/journal.pone.0186331] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 09/28/2017] [Indexed: 12/15/2022] Open
Abstract
Remarkable progress has been achieved in understanding the mechanisms controlling sex determination, yet the cause for many Disorders of Sex Development (DSD) remains unknown. Of particular interest is a rare XX DSD subtype in which individuals are negative for SRY, the testis determining factor on the Y chromosome, yet develop testes or ovotestes, and both of these phenotypes occur in the same family. This is a naturally occurring disorder in humans (Homo sapiens) and dogs (C. familiaris). Phenotypes in the canine XX DSD model are strikingly similar to those of the human XX DSD subtype. The purposes of this study were to identify 1) a variant associated with XX DSD in the canine model and 2) gene expression alterations in canine embryonic gonads that could be informative to causation. Using a genome wide association study (GWAS) and whole genome sequencing (WGS), we identified a variant on C. familiaris autosome 9 (CFA9) that is associated with XX DSD in the canine model and in affected purebred dogs. This is the first marker identified for inherited canine XX DSD. It lies upstream of SOX9 within the canine ortholog for the human disorder, which resides on 17q24. Inheritance of this variant indicates that XX DSD is a complex trait in which breed genetic background affects penetrance. Furthermore, the homozygous variant genotype is associated with embryonic lethality in at least one breed. Our analysis of gene expression studies (RNA-seq and PRO-seq) in embryonic gonads at risk of XX DSD from the canine model identified significant RSPO1 downregulation in comparison to XX controls, without significant upregulation of SOX9 or other known testis pathway genes. Based on these data, a novel mechanism is proposed in which molecular lesions acting upstream of RSPO1 induce epigenomic gonadal mosaicism.
Collapse
Affiliation(s)
- Vicki N. Meyers-Wallen
- Baker Institute for Animal Health, Cornell University, Ithaca, NY, United States of America
- Department of Biomedical Sciences, Cornell University, Ithaca, NY, United States of America
- * E-mail:
| | - Adam R. Boyko
- Department of Biomedical Sciences, Cornell University, Ithaca, NY, United States of America
| | - Charles G. Danko
- Baker Institute for Animal Health, Cornell University, Ithaca, NY, United States of America
- Department of Biomedical Sciences, Cornell University, Ithaca, NY, United States of America
| | - Jennifer K. Grenier
- Department of Biomedical Sciences, Cornell University, Ithaca, NY, United States of America
| | - Jason G. Mezey
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, United States of America
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Jessica J. Hayward
- Department of Biomedical Sciences, Cornell University, Ithaca, NY, United States of America
| | - Laura M. Shannon
- Department of Biomedical Sciences, Cornell University, Ithaca, NY, United States of America
| | - Chuan Gao
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, United States of America
| | - Afrah Shafquat
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, United States of America
| | - Edward J. Rice
- Baker Institute for Animal Health, Cornell University, Ithaca, NY, United States of America
| | - Shashikant Pujar
- Baker Institute for Animal Health, Cornell University, Ithaca, NY, United States of America
| | - Stefanie Eggers
- Murdoch Children’s Research Institute, Royal Children's Hospital, Melbourne, VIC, Australia
| | - Thomas Ohnesorg
- Murdoch Children’s Research Institute, Royal Children's Hospital, Melbourne, VIC, Australia
| | - Andrew H. Sinclair
- Murdoch Children’s Research Institute, Royal Children's Hospital, Melbourne, VIC, Australia
- Department of Paediatrics, University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
42
|
Ju JH, Shenoy SA, Crystal RG, Mezey JG. An independent component analysis confounding factor correction framework for identifying broad impact expression quantitative trait loci. PLoS Comput Biol 2017; 13:e1005537. [PMID: 28505156 PMCID: PMC5448815 DOI: 10.1371/journal.pcbi.1005537] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 05/30/2017] [Accepted: 04/28/2017] [Indexed: 11/19/2022] Open
Abstract
Genome-wide expression Quantitative Trait Loci (eQTL) studies in humans have provided numerous insights into the genetics of both gene expression and complex diseases. While the majority of eQTL identified in genome-wide analyses impact a single gene, eQTL that impact many genes are particularly valuable for network modeling and disease analysis. To enable the identification of such broad impact eQTL, we introduce CONFETI: Confounding Factor Estimation Through Independent component analysis. CONFETI is designed to address two conflicting issues when searching for broad impact eQTL: the need to account for non-genetic confounding factors that can lower the power of the analysis or produce broad impact eQTL false positives, and the tendency of methods that account for confounding factors to model broad impact eQTL as non-genetic variation. The key advance of the CONFETI framework is the use of Independent Component Analysis (ICA) to identify variation likely caused by broad impact eQTL when constructing the sample covariance matrix used for the random effect in a mixed model. We show that CONFETI has better performance than other mixed model confounding factor methods when considering broad impact eQTL recovery from synthetic data. We also used the CONFETI framework and these same confounding factor methods to identify eQTL that replicate between matched twin pair datasets in the Multiple Tissue Human Expression Resource (MuTHER), the Depression Genes Networks study (DGN), the Netherlands Study of Depression and Anxiety (NESDA), and multiple tissue types in the Genotype-Tissue Expression (GTEx) consortium. These analyses identified both cis-eQTL and trans-eQTL impacting individual genes, and CONFETI had better or comparable performance to other mixed model confounding factor analysis methods when identifying such eQTL. In these analyses, we were able to identify and replicate a few broad impact eQTL although the overall number was small even when applying CONFETI. In light of these results, we discuss the broad impact eQTL that have been previously reported from the analysis of human data and suggest that considerable caution should be exercised when making biological inferences based on these reported eQTL.
Collapse
Affiliation(s)
- Jin Hyun Ju
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Sushila A. Shenoy
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Ronald G. Crystal
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
| | - Jason G. Mezey
- Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, United States of America
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY, United States of America
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, United States of America
- * E-mail:
| |
Collapse
|
43
|
N’Diaye A, Haile JK, Cory AT, Clarke FR, Clarke JM, Knox RE, Pozniak CJ. Single Marker and Haplotype-Based Association Analysis of Semolina and Pasta Colour in Elite Durum Wheat Breeding Lines Using a High-Density Consensus Map. PLoS One 2017; 12:e0170941. [PMID: 28135299 PMCID: PMC5279799 DOI: 10.1371/journal.pone.0170941] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 01/12/2017] [Indexed: 12/30/2022] Open
Abstract
Association mapping is usually performed by testing the correlation between a single marker and phenotypes. However, because patterns of variation within genomes are inherited as blocks, clustering markers into haplotypes for genome-wide scans could be a worthwhile approach to improve statistical power to detect associations. The availability of high-density molecular data allows the possibility to assess the potential of both approaches to identify marker-trait associations in durum wheat. In the present study, we used single marker- and haplotype-based approaches to identify loci associated with semolina and pasta colour in durum wheat, the main objective being to evaluate the potential benefits of haplotype-based analysis for identifying quantitative trait loci. One hundred sixty-nine durum lines were genotyped using the Illumina 90K Infinium iSelect assay, and 12,234 polymorphic single nucleotide polymorphism (SNP) markers were generated and used to assess the population structure and the linkage disequilibrium (LD) patterns. A total of 8,581 SNPs previously localized to a high-density consensus map were clustered into 406 haplotype blocks based on the average LD distance of 5.3 cM. Combining multiple SNPs into haplotype blocks increased the average polymorphism information content (PIC) from 0.27 per SNP to 0.50 per haplotype. The haplotype-based analysis identified 12 loci associated with grain pigment colour traits, including the five loci identified by the single marker-based analysis. Furthermore, the haplotype-based analysis resulted in an increase of the phenotypic variance explained (50.4% on average) and the allelic effect (33.7% on average) when compared to single marker analysis. The presence of multiple allelic combinations within each haplotype locus offers potential for screening the most favorable haplotype series and may facilitate marker-assisted selection of grain pigment colour in durum wheat. These results suggest a benefit of haplotype-based analysis over single marker analysis to detect loci associated with colour traits in durum wheat.
Collapse
Affiliation(s)
- Amidou N’Diaye
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Jemanesh K. Haile
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Aron T. Cory
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Fran R. Clarke
- Semiarid Prairie Agricultural Research Centre, Agriculture and Agri-Food Canada, Swift Current, Saskatchewan, Canada
| | - John M. Clarke
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Ron E. Knox
- Semiarid Prairie Agricultural Research Centre, Agriculture and Agri-Food Canada, Swift Current, Saskatchewan, Canada
| | - Curtis J. Pozniak
- Department of Plant Sciences and Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| |
Collapse
|
44
|
Abstract
We give a short but detailed review of the methods used to deal with linear mixed models (restricted likelihood, AIREML algorithm, best linear unbiased predictors, etc.), with a few original points. Then we describe three common applications of the linear mixed model in contemporary human genetics: association testing (pathways analysis or rare variants association tests), genomic heritability estimates, and correction for population stratification in genome-wide association studies. We also consider the performance of best linear unbiased predictors for prediction in this context, through a simulation study for rare variants in a short genomic region, and through a short theoretical development for genome-wide data. For each of these applications, we discuss the relevance and the impact of modeling genetic effects as random effects.
Collapse
|
45
|
dos Santos JPR, Pires LPM, de Castro Vasconcellos RC, Pereira GS, Von Pinho RG, Balestre M. Genomic selection to resistance to Stenocarpella maydis in maize lines using DArTseq markers. BMC Genet 2016; 17:86. [PMID: 27316946 PMCID: PMC4912722 DOI: 10.1186/s12863-016-0392-3] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Accepted: 06/07/2016] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The identification of lines resistant to ear diseases is of great importance in maize breeding because such diseases directly interfere with kernel quality and yield. Among these diseases, ear rot disease is widely relevant due to significant decrease in grain yield. Ear rot may be caused by the fungus Stenocarpella maydi; however, little information about genetic resistance to this pathogen is available in maize, mainly related to candidate genes in genome. In order to exploit this genome information we used 23.154 Dart-seq markers in 238 lines and apply genome-wide selection to select resistance genotypes. We divide the lines into clusters to identify groups related to resistance to Stenocarpella maydi and use Bayesian stochastic search variable approach and rr-BLUP methods to comparate their selection results. RESULTS Through a principal component analysis (PCA) and hierarchical clustering, it was observed that the three main genetic groups (Stiff Stalk Synthetic, Non-Stiff Stalk Synthetic and Tropical) were clustered in a consistent manner, and information on the resistance sources could be obtained according to the line of origin where populations derived from genetic subgroup Suwan presenting higher levels of resistance. The ridge regression best linear unbiased prediction (rr-BLUP) and Bayesian stochastic search variable (BSSV) models presented equivalent abilities regarding predictive processes. CONCLUSION Our work showed that is possible to select maize lines presenting a high resistance to Stenocarpella maydis. This claim is based on the acceptable level of predictive accuracy obtained by Genome-wide Selection (GWS) using different models. Furthermore, the lines related to background Suwan present a higher level of resistance than lines related to other groups.
Collapse
Affiliation(s)
| | | | | | | | | | - Marcio Balestre
- />Department of Exact Science, Federal University of Lavras, Lavras, MG CP 3037 Brazil
| |
Collapse
|
46
|
Phelan J, Coll F, McNerney R, Ascher DB, Pires DEV, Furnham N, Coeck N, Hill-Cawthorne GA, Nair MB, Mallard K, Ramsay A, Campino S, Hibberd ML, Pain A, Rigouts L, Clark TG. Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance. BMC Med 2016; 14:31. [PMID: 27005572 PMCID: PMC4804620 DOI: 10.1186/s12916-016-0575-9] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 02/02/2016] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance. METHODS To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods. RESULTS The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites. CONCLUSIONS Using the TDR resource, we demonstrate the usefulness of whole genome association and convergent evolution approaches to detect known and potentially novel mutations associated with drug resistance. Further, protein structural modelling could provide a means of predicting the impact of polymorphisms on drug efficacy in the absence of phenotypic data. These approaches could ultimately lead to novel resistance mutations to improve the design of tuberculosis control measures, such as diagnostics, and inform patient management.
Collapse
Affiliation(s)
- Jody Phelan
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Francesc Coll
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Ruth McNerney
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.,University of Cape Town Lung Institute, Lung Infection & Immunity Unit, Old Main Building, Groote Schuur Hospital, Observatory, Cape Town, 7925, South Africa
| | - David B Ascher
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge, CB2 1GA, UK
| | - Douglas E V Pires
- Centro de Pesquisas René Rachou, Fundação Oswaldo Cruz, Avenida Augusto de Lima 1715, Belo Horizonte, 30190-002, Brazil
| | - Nick Furnham
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Nele Coeck
- Mycobacteriology Unit, Institute of Tropical Medicine, Antwerp, Belgium
| | - Grant A Hill-Cawthorne
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.,Sydney Emerging Infections and Biosecurity Institute and School of Public Health, Sydney Medical School, University of Sydney, Sydney, NSW, 2006, Australia
| | - Mridul B Nair
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Kim Mallard
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Andrew Ramsay
- Special Programme for Research and Training in Tropical Diseases (TDR), World Health Organisation, Geneva, Switzerland
| | - Susana Campino
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Martin L Hibberd
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Arnab Pain
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Leen Rigouts
- Mycobacteriology Unit, Institute of Tropical Medicine, Antwerp, Belgium.,Department of Biomedical Sciences, Antwerp University, Antwerp, Belgium
| | - Taane G Clark
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK. .,Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK. .,Department of Pathogen Molecular Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, UK.
| |
Collapse
|
47
|
Bianchi M, Dahlgren S, Massey J, Dietschi E, Kierczak M, Lund-Ziener M, Sundberg K, Thoresen SI, Kämpe O, Andersson G, Ollier WER, Hedhammar Å, Leeb T, Lindblad-Toh K, Kennedy LJ, Lingaas F, Rosengren Pielberg G. A Multi-Breed Genome-Wide Association Analysis for Canine Hypothyroidism Identifies a Shared Major Risk Locus on CFA12. PLoS One 2015; 10:e0134720. [PMID: 26261983 PMCID: PMC4532498 DOI: 10.1371/journal.pone.0134720] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 07/13/2015] [Indexed: 01/12/2023] Open
Abstract
Hypothyroidism is a complex clinical condition found in both humans and dogs, thought to be caused by a combination of genetic and environmental factors. In this study we present a multi-breed analysis of predisposing genetic risk factors for hypothyroidism in dogs using three high-risk breeds—the Gordon Setter, Hovawart and the Rhodesian Ridgeback. Using a genome-wide association approach and meta-analysis, we identified a major hypothyroidism risk locus shared by these breeds on chromosome 12 (p = 2.1x10-11). Further characterisation of the candidate region revealed a shared ~167 kb risk haplotype (4,915,018–5,081,823 bp), tagged by two SNPs in almost complete linkage disequilibrium. This breed-shared risk haplotype includes three genes (LHFPL5, SRPK1 and SLC26A8) and does not extend to the dog leukocyte antigen (DLA) class II gene cluster located in the vicinity. These three genes have not been identified as candidate genes for hypothyroid disease previously, but have functions that could potentially contribute to the development of the disease. Our results implicate the potential involvement of novel genes and pathways for the development of canine hypothyroidism, raising new possibilities for screening, breeding programmes and treatments in dogs. This study may also contribute to our understanding of the genetic etiology of human hypothyroid disease, which is one of the most common endocrine disorders in humans.
Collapse
Affiliation(s)
- Matteo Bianchi
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Stina Dahlgren
- Department of Basic Sciences and Aquatic Medicine, Norwegian University of Life Sciences, Oslo, Norway
| | - Jonathan Massey
- Centre for Integrated Genomic Medical Research, The University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Elisabeth Dietschi
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Marcin Kierczak
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Martine Lund-Ziener
- Department of Basic Sciences and Aquatic Medicine, Norwegian University of Life Sciences, Oslo, Norway
| | - Katarina Sundberg
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Stein Istre Thoresen
- Department of Basic Sciences and Aquatic Medicine, Norwegian University of Life Sciences, Oslo, Norway
| | - Olle Kämpe
- Department of Medicine (Solna), Karolinska Institutet, Stockholm, Sweden
| | - Göran Andersson
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - William E. R. Ollier
- Centre for Integrated Genomic Medical Research, The University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Åke Hedhammar
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Tosso Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, Bern, Switzerland
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Lorna J. Kennedy
- Centre for Integrated Genomic Medical Research, The University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Frode Lingaas
- Department of Basic Sciences and Aquatic Medicine, Norwegian University of Life Sciences, Oslo, Norway
| | - Gerli Rosengren Pielberg
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
48
|
Zhang Y, Pan W. Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements? Genet Epidemiol 2014; 39:149-55. [PMID: 25536929 DOI: 10.1002/gepi.21879] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Revised: 11/11/2014] [Accepted: 11/11/2014] [Indexed: 11/10/2022]
Abstract
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
Collapse
Affiliation(s)
- Yiwei Zhang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | | |
Collapse
|
49
|
Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, Listgarten J, Heckerman D. Further improvements to linear mixed models for genome-wide association studies. Sci Rep 2014; 4:6874. [PMID: 25387525 PMCID: PMC4230738 DOI: 10.1038/srep06874] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Accepted: 10/14/2014] [Indexed: 11/09/2022] Open
Abstract
We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.
Collapse
Affiliation(s)
- Christian Widmer
- eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite
PH1, Los Angeles, CA, 90024, United States
| | - Christoph Lippert
- eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite
PH1, Los Angeles, CA, 90024, United States
| | - Omer Weissbrod
- Computer Science Department, Technion - Israel Institute of
Technology, Haifa 32000, Israel
| | - Nicolo Fusi
- eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite
PH1, Los Angeles, CA, 90024, United States
| | - Carl Kadie
- eScience Group, Microsoft Research, One Microsoft Way, Redmond,
WA, 98052, United States
| | - Robert Davidson
- eScience Group, Microsoft Research, One Microsoft Way, Redmond,
WA, 98052, United States
| | - Jennifer Listgarten
- eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite
PH1, Los Angeles, CA, 90024, United States
| | - David Heckerman
- eScience Group, Microsoft Research, 1100 Glendon Avenue, Suite
PH1, Los Angeles, CA, 90024, United States
| |
Collapse
|
50
|
Hoffman GE, Mezey JG, Schadt EE. lrgpr: interactive linear mixed model analysis of genome-wide association studies with composite hypothesis testing and regression diagnostics in R. ACTA ACUST UNITED AC 2014; 30:3134-5. [PMID: 25035399 DOI: 10.1093/bioinformatics/btu435] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
UNLABELLED The linear mixed model is the state-of-the-art method to account for the confounding effects of kinship and population structure in genome-wide association studies (GWAS). Current implementations test the effect of one or more genetic markers while including prespecified covariates such as sex. Here we develop an efficient implementation of the linear mixed model that allows composite hypothesis tests to consider genotype interactions with variables such as other genotypes, environment, sex or ancestry. Our R package, lrgpr, allows interactive model fitting and examination of regression diagnostics to facilitate exploratory data analysis in the context of the linear mixed model. By leveraging parallel and out-of-core computing for datasets too large to fit in main memory, lrgpr is applicable to large GWAS datasets and next-generation sequencing data. AVAILABILITY AND IMPLEMENTATION lrgpr is an R package available from lrgpr.r-forge.r-project.org.
Collapse
Affiliation(s)
- Gabriel E Hoffman
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA and Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, USA Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA and Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, USA Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA and Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Jason G Mezey
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA and Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, USA Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA and Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA and Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, USA Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA, Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, USA and Department of Genetic Medicine, Weill Cornell Medical College, New York, NY, USA
| |
Collapse
|