1
|
Zhao L, Batta I, Matloff W, O'Driscoll C, Hobel S, Toga AW. Neuroimaging PheWAS (Phenome-Wide Association Study): A Free Cloud-Computing Platform for Big-Data, Brain-Wide Imaging Association Studies. Neuroinformatics 2021; 19:285-303. [PMID: 32822005 DOI: 10.1007/s12021-020-09486-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Large-scale, case-control genome-wide association studies (GWASs) have revealed genetic variations associated with diverse neurological and psychiatric disorders. Recent advances in neuroimaging and genomic databases of large healthy and diseased cohorts have empowered studies to characterize effects of the discovered genetic factors on brain structure and function, implicating neural pathways and genetic mechanisms in the underlying biology. However, the unprecedented scale and complexity of the imaging and genomic data requires new advanced biomedical data science tools to manage, process and analyze the data. In this work, we introduce Neuroimaging PheWAS (phenome-wide association study): a web-based system for searching over a wide variety of brain-wide imaging phenotypes to discover true system-level gene-brain relationships using a unified genotype-to-phenotype strategy. This design features a user-friendly graphical user interface (GUI) for anonymous data uploading, study definition and management, and interactive result visualizations as well as a cloud-based computational infrastructure and multiple state-of-art methods for statistical association analysis and multiple comparison correction. We demonstrated the potential of Neuroimaging PheWAS with a case study analyzing the influences of the apolipoprotein E (APOE) gene on various brain morphological properties across the brain in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort. Benchmark tests were performed to evaluate the system's performance using data from UK Biobank. The Neuroimaging PheWAS system is freely available. It simplifies the execution of PheWAS on neuroimaging data and provides an opportunity for imaging genetics studies to elucidate routes at play for specific genetic variants on diseases in the context of detailed imaging phenotypic data.
Collapse
Affiliation(s)
- Lu Zhao
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Ishaan Batta
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - William Matloff
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Caroline O'Driscoll
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Samuel Hobel
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA
| | - Arthur W Toga
- Laboratory of Neuro Imaging, USC Mark and Mary Stevens Neuroimaging and Informatics Institute, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Garreta L, Cerón‐Souza I, Palacio MR, Reyes‐Herrera PH. MultiGWAS: An integrative tool for Genome Wide Association Studies in tetraploid organisms. Ecol Evol 2021; 11:7411-7426. [PMID: 34188823 PMCID: PMC8216910 DOI: 10.1002/ece3.7572] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 03/22/2021] [Accepted: 03/23/2021] [Indexed: 12/27/2022] Open
Abstract
The genome-wide association studies (GWASs) are essential to determine the genetic bases of either ecological or economic phenotypic variation across individuals within populations of the model and nonmodel organisms. For this research question, the GWAS replication testing different parameters and models to validate the results' reproducibility is common. However, straightforward methodologies that manage both replication and tetraploid data are still missing. To solve this problem, we designed the MultiGWAS, a tool that does GWAS for diploid and tetraploid organisms by executing in parallel four software packages, two designed for polyploid data (GWASpoly and SHEsis) and two designed for diploid data (GAPIT and TASSEL). MultiGWAS has several advantages. It runs either in the command line or in a graphical interface; it manages different genotype formats, including VCF. Moreover, it allows control for population structure, relatedness, and several quality control checks on genotype data. Besides, MultiGWAS can test for additive and dominant gene action models, and, through a proprietary scoring function, select the best model to report its associations. Finally, it generates several reports that facilitate identifying false associations from both the significant and the best-ranked association Single Nucleotide Polymorphisms (SNPs) among the four software packages. We tested MultiGWAS with public tetraploid potato data for tuber shape and several simulated data under both additive and dominant models. These tests demonstrated that MultiGWAS is better at detecting reliable associations than using each of the four software packages individually. Moreover, the parallel analysis of polyploid and diploid software that only offers MultiGWAS demonstrates its utility in understanding the best genetic model behind the SNP association in tetraploid organisms. Therefore, MultiGWAS probed to be an excellent alternative for wrapping GWAS replication in diploid and tetraploid organisms in a single analysis environment.
Collapse
Affiliation(s)
- Luis Garreta
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)CI TibaitatáBogotaColombia
| | - Ivania Cerón‐Souza
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)CI TibaitatáBogotaColombia
| | | | - Paula H. Reyes‐Herrera
- Corporación Colombiana de Investigación Agropecuaria (AGROSAVIA)CI TibaitatáBogotaColombia
| |
Collapse
|
3
|
Genze N, Bharti R, Grieb M, Schultheiss SJ, Grimm DG. Accurate machine learning-based germination detection, prediction and quality assessment of three grain crops. PLANT METHODS 2020; 16:157. [PMID: 33353559 PMCID: PMC7754596 DOI: 10.1186/s13007-020-00699-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 12/11/2020] [Indexed: 05/03/2023]
Abstract
BACKGROUND Assessment of seed germination is an essential task for seed researchers to measure the quality and performance of seeds. Usually, seed assessments are done manually, which is a cumbersome, time consuming and error-prone process. Classical image analyses methods are not well suited for large-scale germination experiments, because they often rely on manual adjustments of color-based thresholds. We here propose a machine learning approach using modern artificial neural networks with region proposals for accurate seed germination detection and high-throughput seed germination experiments. RESULTS We generated labeled imaging data of the germination process of more than 2400 seeds for three different crops, Zea mays (maize), Secale cereale (rye) and Pennisetum glaucum (pearl millet), with a total of more than 23,000 images. Different state-of-the-art convolutional neural network (CNN) architectures with region proposals have been trained using transfer learning to automatically identify seeds within petri dishes and to predict whether the seeds germinated or not. Our proposed models achieved a high mean average precision (mAP) on a hold-out test data set of approximately 97.9%, 94.2% and 94.3% for Zea mays, Secale cereale and Pennisetum glaucum respectively. Further, various single-value germination indices, such as Mean Germination Time and Germination Uncertainty, can be computed more accurately with the predictions of our proposed model compared to manual countings. CONCLUSION Our proposed machine learning-based method can help to speed up the assessment of seed germination experiments for different seed cultivars. It has lower error rates and a higher performance compared to conventional and manual methods, leading to more accurate germination indices and quality assessments of seeds.
Collapse
Affiliation(s)
- Nikita Genze
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Schulgasse 22, 94315, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Petersgasse 18, 94315, Straubing, Germany
| | - Richa Bharti
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Schulgasse 22, 94315, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Petersgasse 18, 94315, Straubing, Germany
| | - Michael Grieb
- Technology and Support Centre in the Centre of Excellence for Renewable Resources (TFZ), Schulgasse 20, 94315, Straubing, Germany
| | | | - Dominik G Grimm
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Schulgasse 22, 94315, Straubing, Germany.
- Weihenstephan-Triesdorf University of Applied Sciences, Petersgasse 18, 94315, Straubing, Germany.
- Department of Informatics, Technical University of Munich, Boltzmannstr. 3, 85748, Garching, Germany.
| |
Collapse
|
4
|
Togninalli M, Seren Ü, Freudenthal JA, Monroe JG, Meng D, Nordborg M, Weigel D, Borgwardt K, Korte A, Grimm DG. AraPheno and the AraGWAS Catalog 2020: a major database update including RNA-Seq and knockout mutation data for Arabidopsis thaliana. Nucleic Acids Res 2020; 48:D1063-D1068. [PMID: 31642487 PMCID: PMC7145550 DOI: 10.1093/nar/gkz925] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 09/26/2019] [Accepted: 10/08/2019] [Indexed: 12/23/2022] Open
Abstract
Genome-wide association studies (GWAS) are integral for studying genotype-phenotype relationships and gaining a deeper understanding of the genetic architecture underlying trait variation. A plethora of genetic associations between distinct loci and various traits have been successfully discovered and published for the model plant Arabidopsis thaliana. This success and the free availability of full genomes and phenotypic data for more than 1,000 different natural inbred lines led to the development of several data repositories. AraPheno (https://arapheno.1001genomes.org) serves as a central repository of population-scale phenotypes in A. thaliana, while the AraGWAS Catalog (https://aragwas.1001genomes.org) provides a publicly available, manually curated and standardized collection of marker-trait associations for all available phenotypes from AraPheno. In this major update, we introduce the next generation of both platforms, including new data, features and tools. We included novel results on associations between knockout-mutations and all AraPheno traits. Furthermore, AraPheno has been extended to display RNA-Seq data for hundreds of accessions, providing expression information for over 28 000 genes for these accessions. All data, including the imputed genotype matrix used for GWAS, are easily downloadable via the respective databases.
Collapse
Affiliation(s)
- Matteo Togninalli
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Ümit Seren
- Gregor Mendel Institute of Molecular Plant Biology, Vienna, Austria
| | - Jan A Freudenthal
- Center for Computational and Theoretical Biology, University Würzburg, Würzburg, Germany
| | - J Grey Monroe
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Dazhe Meng
- Gregor Mendel Institute of Molecular Plant Biology, Vienna, Austria
- Google, Mountain View, USA
| | - Magnus Nordborg
- Gregor Mendel Institute of Molecular Plant Biology, Vienna, Austria
| | - Detlef Weigel
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Karsten Borgwardt
- Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Arthur Korte
- Center for Computational and Theoretical Biology, University Würzburg, Würzburg, Germany
| | - Dominik G Grimm
- Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Straubing, Germany
| |
Collapse
|
5
|
Fishman CE, Mohebnasab M, van Setten J, Zanoni F, Wang C, Deaglio S, Amoroso A, Callans L, van Gelder T, Lee S, Kiryluk K, Lanktree MB, Keating BJ. Genome-Wide Study Updates in the International Genetics and Translational Research in Transplantation Network (iGeneTRAiN). Front Genet 2019; 10:1084. [PMID: 31803228 PMCID: PMC6873800 DOI: 10.3389/fgene.2019.01084] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 10/09/2019] [Indexed: 12/14/2022] Open
Abstract
The prevalence of end-stage renal disease (ESRD) and the number of kidney transplants performed continues to rise every year, straining the procurement of deceased and living kidney allografts and health systems. Genome-wide genotyping and sequencing of diseased populations have uncovered genetic contributors in substantial proportions of ESRD patients. A number of these discoveries are beginning to be utilized in risk stratification and clinical management of patients. Specifically, genetics can provide insight into the primary cause of chronic kidney disease (CKD), the risk of progression to ESRD, and post-transplant outcomes, including various forms of allograft rejection. The International Genetics & Translational Research in Transplantation Network (iGeneTRAiN), is a multi-site consortium that encompasses >45 genetic studies with genome-wide genotyping from over 51,000 transplant samples, including genome-wide data from >30 kidney transplant cohorts (n = 28,015). iGeneTRAiN is statistically powered to capture both rare and common genetic contributions to ESRD and post-transplant outcomes. The primary cause of ESRD is often difficult to ascertain, especially where formal biopsy diagnosis is not performed, and is unavailable in ∼2% to >20% of kidney transplant recipients in iGeneTRAiN studies. We overview our current copy number variant (CNV) screening approaches from genome-wide genotyping datasets in iGeneTRAiN, in attempts to discover and validate genetic contributors to CKD and ESRD. Greater aggregation and analyses of well phenotyped patients with genome-wide datasets will undoubtedly yield insights into the underlying pathophysiological mechanisms of CKD, leading the way to improved diagnostic precision in nephrology.
Collapse
Affiliation(s)
- Claire E Fishman
- Division of Transplantation Department of Surgery, University of Pennsylvania, Philadelphia, PA, United States
| | - Maede Mohebnasab
- Division of Transplantation Department of Surgery, University of Pennsylvania, Philadelphia, PA, United States
| | - Jessica van Setten
- Department of Cardiology, University Medical Center Utrecht, University of Utrecht, Utrecht, Netherlands
| | - Francesca Zanoni
- Department of Medicine, Division of Nephrology, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, United States
| | - Chen Wang
- Department of Medicine, Division of Nephrology, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, United States
| | - Silvia Deaglio
- Immunogenetics and Biology of Transplantation, Città della Salute e della Scienza, University Hospital of Turin, Turin, Italy.,Medical Genetics, Department of Medical Sciences, University Turin, Turin, Italy
| | - Antonio Amoroso
- Immunogenetics and Biology of Transplantation, Città della Salute e della Scienza, University Hospital of Turin, Turin, Italy.,Medical Genetics, Department of Medical Sciences, University Turin, Turin, Italy
| | - Lauren Callans
- Division of Transplantation Department of Surgery, University of Pennsylvania, Philadelphia, PA, United States
| | - Teun van Gelder
- Department of Hospital Pharmacy, University Medical Center Rotterdam, Rotterdam, Netherlands
| | - Sangho Lee
- Department of Nephrology, Khung Hee University, Seoul, South Korea
| | - Krzysztof Kiryluk
- Department of Medicine, Division of Nephrology, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, United States
| | - Matthew B Lanktree
- Division of Nephrology, St. Joseph's Healthcare Hamilton, McMaster University, Hamilton, ON, Canada
| | - Brendan J Keating
- Division of Transplantation Department of Surgery, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
6
|
Muhamed B, Parks T, Sliwa K. Genetics of rheumatic fever and rheumatic heart disease. Nat Rev Cardiol 2019; 17:145-154. [PMID: 31519994 DOI: 10.1038/s41569-019-0258-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/09/2019] [Indexed: 12/13/2022]
Abstract
Rheumatic heart disease (RHD) is a complication of group A streptococcal infection that results from a complex interaction between the genetic make-up of the host, the infection itself and several other environmental factors, largely reflecting poverty. RHD is estimated to affect 33.4 million people and results in 10.5 million disability-adjusted life-years lost globally. The disease has long been considered heritable but still little is known about the host genetic factors that increase or reduce the risk of developing RHD. In the 1980s and 1990s, several reports linked the disease to the human leukocyte antigen (HLA) locus on chromosome 6, followed in the 2000s by reports implicating additional candidate regions elsewhere in the genome. Subsequently, the search for susceptibility loci has been reinvigorated by the use of genome-wide association studies (GWAS) through which millions of variants can be tested for association in thousands of individuals. Early findings implicate not only HLA, particularly the HLA-DQA1 to HLA-DQB1 region, but also the immunoglobulin heavy chain locus, including the IGHV4-61 gene segment, on chromosome 14. In this Review, we assess the emerging role of GWAS in assessing RHD, outlining both the advantages and disadvantages of this approach. We also highlight the potential use of large-scale, publicly available data and the value of international collaboration to facilitate comprehensive studies that produce findings that have implications for clinical practice.
Collapse
Affiliation(s)
- Babu Muhamed
- Hatter Institute for Cardiovascular Diseases Research in Africa, Department of Medicine, University of Cape Town, Cape Town, South Africa
| | - Tom Parks
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK.,Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Karen Sliwa
- Hatter Institute for Cardiovascular Diseases Research in Africa, Department of Medicine, University of Cape Town, Cape Town, South Africa.
| |
Collapse
|