1
|
O’Farrell F, Jiang X, Aljifri S, Pazoki R. Molecular Alterations Caused by Alcohol Consumption in the UK Biobank: A Mendelian Randomisation Study. Nutrients 2022; 14:2943. [PMID: 35889900 PMCID: PMC9317105 DOI: 10.3390/nu14142943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 06/28/2022] [Accepted: 07/14/2022] [Indexed: 02/01/2023] Open
Abstract
Alcohol consumption is associated with the development of cardiovascular diseases, cancer, and liver disease. The biological mechanisms are still largely unclear. Here, we aimed to use an agnostic approach to identify phenotypes mediating the effect of alcohol on various diseases. METHODS We performed an agnostic association analysis between alcohol consumption (red and white wine, beer/cider, fortified wine, and spirits) with over 7800 phenotypes from the UK biobank comprising 223,728 participants. We performed Mendelian randomisation analysis to infer causality. We additionally performed a Phenome-wide association analysis and a mediation analysis between alcohol consumption as exposure, phenotypes in a causal relationship with alcohol consumption as mediators, and various diseases as the outcome. RESULTS Of 45 phenotypes in association with alcohol consumption, 20 were in a causal relationship with alcohol consumption. Gamma glutamyltransferase (GGT; β = 9.44; 95% CI = 5.94, 12.93; Pfdr = 9.04 × 10-7), mean sphered cell volume (β = 0.189; 95% CI = 0.11, 0.27; Pfdr = 1.00 × 10-4), mean corpuscular volume (β = 0.271; 95% CI = 0.19, 0.35; Pfdr = 7.09 × 10-10) and mean corpuscular haemoglobin (β = 0.278; 95% CI = 0.19, 0.36; Pfdr = 1.60 × 10-6) demonstrated the strongest causal relationships. We also identified GGT and physical inactivity as mediators in the pathway between alcohol consumption, liver cirrhosis and alcohol dependence. CONCLUSION Our study provides evidence of causality between alcohol consumption and 20 phenotypes and a mediation effect for physical activity on health consequences of alcohol consumption.
Collapse
Affiliation(s)
- Felix O’Farrell
- Division of Biomedical Sciences, Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK; (F.O.); (X.J.); (S.A.)
| | - Xiyun Jiang
- Division of Biomedical Sciences, Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK; (F.O.); (X.J.); (S.A.)
| | - Shahad Aljifri
- Division of Biomedical Sciences, Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK; (F.O.); (X.J.); (S.A.)
| | - Raha Pazoki
- Division of Biomedical Sciences, Department of Life Sciences, College of Health and Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK; (F.O.); (X.J.); (S.A.)
- Department of Epidemiology and Biostatistics, School of Public Health, St Mary’s Campus, Norfolk Place, London W2 1PG, UK
| |
Collapse
|
2
|
Bastarache L, Brown JS, Cimino JJ, Dorr DA, Embi PJ, Payne PR, Wilcox AB, Weiner MG. Developing real-world evidence from real-world data: Transforming raw data into analytical datasets. Learn Health Syst 2022; 6:e10293. [PMID: 35036557 PMCID: PMC8753316 DOI: 10.1002/lrh2.10293] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/10/2021] [Accepted: 09/21/2021] [Indexed: 11/25/2022] Open
Abstract
Development of evidence-based practice requires practice-based evidence, which can be acquired through analysis of real-world data from electronic health records (EHRs). The EHR contains volumes of information about patients-physical measurements, diagnoses, exposures, and markers of health behavior-that can be used to create algorithms for risk stratification or to gain insight into associations between exposures, interventions, and outcomes. But to transform real-world data into reliable real-world evidence, one must not only choose the correct analytical methods but also have an understanding of the quality, detail, provenance, and organization of the underlying source data and address the differences in these characteristics across sites when conducting analyses that span institutions. This manuscript explores the idiosyncrasies inherent in the capture, formatting, and standardization of EHR data and discusses the clinical domain and informatics competencies required to transform the raw clinical, real-world data into high-quality, fit-for-purpose analytical data sets used to generate real-world evidence.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - Jeffrey S. Brown
- Department of Population MedicineHarvard Medical School and Harvard Pilgrim Health Care InstituteBostonMassachusettsUSA
| | - James J. Cimino
- Informatics Institute, University of Alabama at BirminghamBirminghamAlabamaUSA
| | - David A. Dorr
- Department of Medical Informatics and Clinical EpidemiologyOregon Health Sciences UniversityPortlandOregonUSA
| | - Peter J. Embi
- Center for Biomedical InformaticsRegenstrief InstituteIndianapolisIndianaUSA
| | - Philip R.O. Payne
- Institute for Informatics, Washington University in St. LouisSt. LouisMissouriUSA
| | - Adam B. Wilcox
- Institute for InformaticsWashington University in St. Louis School of MedicineSt. LouisMissouriUSA
| | - Mark G. Weiner
- Department of Population Health SciencesWeill Cornell MedicineNew YorkNew YorkUSA
| |
Collapse
|
3
|
HCLC-FC: A novel statistical method for phenome-wide association studies. PLoS One 2022; 17:e0276646. [PMID: 36350801 PMCID: PMC9645610 DOI: 10.1371/journal.pone.0276646] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 10/11/2022] [Indexed: 11/11/2022] Open
Abstract
The emergence of genetic data coupled to longitudinal electronic medical records (EMRs) offers the possibility of phenome-wide association studies (PheWAS). In PheWAS, the whole phenome can be divided into numerous phenotypic categories according to the genetic architecture across phenotypes. Currently, statistical analyses for PheWAS are mainly univariate analyses, which test the association between one genetic variant and one phenotype at a time. In this article, we derived a novel and powerful multivariate method for PheWAS. The proposed method involves three steps. In the first step, we apply the bottom-up hierarchical clustering method to partition a large number of phenotypes into disjoint clusters within each phenotypic category. In the second step, the clustering linear combination method is used to combine test statistics within each category based on the phenotypic clusters and obtain p-values from each phenotypic category. In the third step, we propose a new false discovery rate (FDR) control approach. We perform extensive simulation studies to compare the performance of our method with that of other existing methods. The results show that our proposed method controls FDR very well and outperforms other methods we compared with. We also apply the proposed approach to a set of EMR-based phenotypes across more than 300,000 samples from the UK Biobank. We find that the proposed approach not only can well-control FDR at a nominal level but also successfully identify 1,244 significant SNPs that are reported to be associated with some phenotypes in the GWAS catalog. Our open-access tools and instructions on how to implement HCLC-FC are available at https://github.com/XiaoyuLiang/HCLCFC.
Collapse
|
4
|
Abdelrahman D, Hasan W, Da'as SI. Microinjection quality control in zebrafish model for genetic manipulations. MethodsX 2021; 8:101418. [PMID: 34430313 PMCID: PMC8374492 DOI: 10.1016/j.mex.2021.101418] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/16/2021] [Indexed: 12/11/2022] Open
Abstract
Microinjection technique is one of the essential methodologies that are used widely in zebrafish research. Microinjection is utilized to perform genetic manipulations within the developing zebrafish model. Further, this technique is used to study a wide range of genetic diseases and gene of interest role in early developmental processes. Thus, quality control for microinjection is an essential factor to ensure experimental reproducibility and consistency. In this technical note, in vitro transcribed synthetic mRNA encoding green fluorescence protein (eGFP), and red fluorescent protein (m-cherry) as well as fluorescein and rhodamine fluorescent dyes were injected into a single-cell zebrafish embryo for volume quality control. Given the importance of having quality control system and methodology to yield similar genetic manipulation within the zebrafish embryo:We aimed to establish the unified delivery of injected material into zebrafish one cell stage embryo. We aimed to establish consistency of the injected volume into mineral oil droplets that will serve as a quality control parameter to conforms a quality control practice to ensure the reproducibility of the microinjection technique. The calibration of microinjection droplet size resulted in the visualization of fluorescent protein and dyes in the zebrafish embryo with precise volumes of delivered materials under the control of needle opening, injection pressure and time.
Collapse
Affiliation(s)
- Doua Abdelrahman
- Integrated Genomics Services, Translational Research, Research Branch, Sidra Medicine, Doha, Qatar
- Department of Human Genetics, Sidra Medicine, Doha, Qatar
| | - Waseem Hasan
- Integrated Genomics Services, Translational Research, Research Branch, Sidra Medicine, Doha, Qatar
- Department of Human Genetics, Sidra Medicine, Doha, Qatar
| | - Sahar I. Da'as
- Integrated Genomics Services, Translational Research, Research Branch, Sidra Medicine, Doha, Qatar
- Department of Human Genetics, Sidra Medicine, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Doha, Qatar
- Corresponding author at: Integrated Genomics Services, Translational Research, Research Branch, Sidra Medicine, Doha, Qatar.
| |
Collapse
|
5
|
Abstract
Electronic health records (EHRs) are a rich source of data for researchers, but extracting meaningful information out of this highly complex data source is challenging. Phecodes represent one strategy for defining phenotypes for research using EHR data. They are a high-throughput phenotyping tool based on ICD (International Classification of Diseases) codes that can be used to rapidly define the case/control status of thousands of clinically meaningful diseases and conditions. Phecodes were originally developed to conduct phenome-wide association studies to scan for phenotypic associations with common genetic variants. Since then, phecodes have been used to support a wide range of EHR-based phenotyping methods, including the phenotype risk score. This review aims to comprehensively describe the development, validation, and applications of phecodes and suggest some future directions for phecodes and high-throughput phenotyping.
Collapse
Affiliation(s)
- Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA;
| |
Collapse
|
6
|
Wang L, Zhang X, Meng X, Koskeridis F, Georgiou A, Yu L, Campbell H, Theodoratou E, Li X. Methodology in phenome-wide association studies: a systematic review. J Med Genet 2021; 58:720-728. [PMID: 34272311 DOI: 10.1136/jmedgenet-2021-107696] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 05/27/2021] [Indexed: 11/04/2022]
Abstract
Phenome-wide association study (PheWAS) has been increasingly used to identify novel genetic associations across a wide spectrum of phenotypes. This systematic review aims to summarise the PheWAS methodology, discuss the advantages and challenges of PheWAS, and provide potential implications for future PheWAS studies. Medical Literature Analysis and Retrieval System Online (MEDLINE) and Excerpta Medica Database (EMBASE) databases were searched to identify all published PheWAS studies up until 24 April 2021. The PheWAS methodology incorporating how to perform PheWAS analysis and which software/tool could be used, were summarised based on the extracted information. A total of 1035 studies were identified and 195 eligible articles were finally included. Among them, 137 (77.0%) contained 10 000 or more study participants, 164 (92.1%) defined the phenome based on electronic medical records data, 140 (78.7%) used genetic variants as predictors, and 73 (41.0%) conducted replication analysis to validate PheWAS findings and almost all of them (94.5%) received consistent results. The methodology applied in these PheWAS studies was dissected into several critical steps, including quality control of the phenome, selecting predictors, phenotyping, statistical analysis, interpretation and visualisation of PheWAS results, and the workflow for performing a PheWAS was established with detailed instructions on each step. This study provides a comprehensive overview of PheWAS methodology to help practitioners achieve a better understanding of the PheWAS design, to detect understudied or overstudied outcomes, and to direct their research by applying the most appropriate software and online tools for their study data structure.
Collapse
Affiliation(s)
- Lijuan Wang
- School of Public Health and the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Xiaomeng Zhang
- Centre for Global Health, The University of Edinburgh Usher Institute of Population Health Sciences and Informatics, Edinburgh, UK
| | - Xiangrui Meng
- Vanke School of Public Health, Tsinghua University, Beijing, China
| | - Fotios Koskeridis
- Department of Hygiene and Epidemiology, University of Ioannina, Ioannina, Epirus, Greece
| | - Andrea Georgiou
- Department of Hygiene and Epidemiology, University of Ioannina, Ioannina, Epirus, Greece
| | - Lili Yu
- School of Public Health and the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| | - Harry Campbell
- Centre for Global Health, The University of Edinburgh Usher Institute of Population Health Sciences and Informatics, Edinburgh, UK
| | - Evropi Theodoratou
- Centre for Global Health, The University of Edinburgh Usher Institute of Population Health Sciences and Informatics, Edinburgh, UK.,Cancer Research UK Edinburgh Centre, The University of Edinburgh MRC Institute of Genetics and Molecular Medicine, Edinburgh, UK
| | - Xue Li
- School of Public Health and the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, Zhejiang, China
| |
Collapse
|
7
|
Graf WD, Shprintzen RJ. "Retrofitting" established genetic disorders and diseases through big data and phenomics. Neurol Clin Pract 2020; 10:375-376. [PMID: 33304644 PMCID: PMC7717638 DOI: 10.1212/cpj.0000000000000784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- William D Graf
- Connecticut Children's (WDG), Farmington; and The Virtual Center for Velo-Cardio-Facial Syndrome (RJS), Manlius, NY
| | - Robert J Shprintzen
- Connecticut Children's (WDG), Farmington; and The Virtual Center for Velo-Cardio-Facial Syndrome (RJS), Manlius, NY
| |
Collapse
|
8
|
Lau A, So HC. Turning genome-wide association study findings into opportunities for drug repositioning. Comput Struct Biotechnol J 2020; 18:1639-1650. [PMID: 32670504 PMCID: PMC7334463 DOI: 10.1016/j.csbj.2020.06.015] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Revised: 06/05/2020] [Accepted: 06/05/2020] [Indexed: 02/02/2023] Open
Abstract
Drug development is a very costly and lengthy process, while repositioned or repurposed drugs could be brought into clinical practice within a shorter time-frame and at a much reduced cost. Numerous computational approaches to drug repositioning have been developed, but methods utilizing genome-wide association studies (GWASs) data are less explored. The past decade has observed a massive growth in the amount of data from GWAS; the rich information contained in GWAS has great potential to guide drug repositioning or discovery. While multiple tools are available for finding the most relevant genes from GWAS hits, searching for top susceptibility genes is only one way to guide repositioning, which has its own limitations. Here we provide a comprehensive review of different computational approaches that employ GWAS data to guide drug repositioning. These methods include selecting top candidate genes from GWAS as drug targets, deducing drug candidates based on drug-drug and disease-disease similarities, searching for reversed expression profiles between drugs and diseases, pathway-based methods as well as approaches based on analysis of biological networks. Each method is illustrated with examples, and their respective strengths and limitations are discussed. We also discussed several areas for future research.
Collapse
Affiliation(s)
- Alexandria Lau
- School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Hon-Cheong So
- School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
- KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Zoology Institute of Zoology and The Chinese University of Hong Kong, Hong Kong SAR, China
- Department of Psychiatry, The Chinese University of Hong Kong, Hong Kong SAR, China
- Margaret K.L. Cheung Research Centre for Management of Parkinsonism, The Chinese University of Hong Kong, Hong Kong SAR, China
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China
- Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong SAR, China
- Hong Kong Branch of the Chinese Academy of Sciences Center for Excellence in Animal Evolution and Genetics, The Chinese University of Hong Kong, Hong Kong SAR, China
- Corresponding author at: School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
9
|
Genetic contributions to NAFLD: leveraging shared genetics to uncover systems biology. Nat Rev Gastroenterol Hepatol 2020; 17:40-52. [PMID: 31641249 DOI: 10.1038/s41575-019-0212-0] [Citation(s) in RCA: 184] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/05/2019] [Indexed: 12/14/2022]
Abstract
Nonalcoholic fatty liver disease (NAFLD) affects around a quarter of the global population, paralleling worldwide increases in obesity and metabolic syndrome. NAFLD arises in the context of systemic metabolic dysfunction that concomitantly amplifies the risk of cardiovascular disease and diabetes. These interrelated conditions have long been recognized to have a heritable component, and advances using unbiased association studies followed by functional characterization have created a paradigm for unravelling the genetic architecture of these conditions. A novel perspective is to characterize the shared genetic basis of NAFLD and other related disorders. This information on shared genetic risks and their biological overlap should in future enable the development of precision medicine approaches through better patient stratification, and enable the identification of preventive and therapeutic strategies. In this Review, we discuss current knowledge of the genetic basis of NAFLD and of possible pleiotropy between NAFLD and other liver diseases as well as other related metabolic disorders. We also discuss evidence of causality in NAFLD and other related diseases and the translational significance of such evidence, and future challenges from the study of genetic pleiotropy.
Collapse
|
10
|
Eslam M, George J. Genetic Insights for Drug Development in NAFLD. Trends Pharmacol Sci 2019; 40:506-516. [PMID: 31160124 DOI: 10.1016/j.tips.2019.05.002] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 04/10/2019] [Accepted: 05/06/2019] [Indexed: 12/21/2022]
Abstract
Drug development is a costly, time-consuming, and challenging endeavour, with only a few agents reaching the threshold of approval for clinical use. Therefore, approaches to more efficiently identify targets that are likely to translate to clinical benefit are required. Interrogation of the human genome in large patient cohorts has rapidly advanced our knowledge of the genetic architecture and underlying mechanisms of many diseases, including nonalcoholic fatty liver disease (NAFLD). There are no approved pharmacotherapies for NAFLD currently. Genetic insights provide a powerful and new approach to infer and prioritise candidate drugs, with such selection avoiding myriad pitfalls, while defining likely benefits. In this review, we discuss the prospects and challenges for the optimal utilisation of genetic findings for improving and accelerating the NAFLD drug discovery pipeline.
Collapse
Affiliation(s)
- Mohammed Eslam
- Storr Liver Centre, Westmead Institute for Medical Research, Westmead Hospital and University of Sydney, Westmead, NSW, Australia.
| | - Jacob George
- Storr Liver Centre, Westmead Institute for Medical Research, Westmead Hospital and University of Sydney, Westmead, NSW, Australia.
| |
Collapse
|
11
|
Zhang X, Basile AO, Pendergrass SA, Ritchie MD. Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico. BMC Bioinformatics 2019; 20:46. [PMID: 30669967 PMCID: PMC6343276 DOI: 10.1186/s12859-018-2591-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/26/2018] [Indexed: 11/11/2022] Open
Abstract
Background The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. Results We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. Conclusions Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses. Electronic supplementary material The online version of this article (10.1186/s12859-018-2591-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xinyuan Zhang
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anna O Basile
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, USA
| | - Marylyn D Ritchie
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. .,Department of Genetics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|
12
|
Mochida K, Koda S, Inoue K, Hirayama T, Tanaka S, Nishii R, Melgani F. Computer vision-based phenotyping for improvement of plant productivity: a machine learning perspective. Gigascience 2019; 8:5232233. [PMID: 30520975 PMCID: PMC6312910 DOI: 10.1093/gigascience/giy153] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 09/06/2018] [Accepted: 11/24/2018] [Indexed: 11/29/2022] Open
Abstract
Employing computer vision to extract useful information from images and videos is becoming a key technique for identifying phenotypic changes in plants. Here, we review the emerging aspects of computer vision for automated plant phenotyping. Recent advances in image analysis empowered by machine learning-based techniques, including convolutional neural network-based modeling, have expanded their application to assist high-throughput plant phenotyping. Combinatorial use of multiple sensors to acquire various spectra has allowed us to noninvasively obtain a series of datasets, including those related to the development and physiological responses of plants throughout their life. Automated phenotyping platforms accelerate the elucidation of gene functions associated with traits in model plants under controlled conditions. Remote sensing techniques with image collection platforms, such as unmanned vehicles and tractors, are also emerging for large-scale field phenotyping for crop breeding and precision agriculture. Computer vision-based phenotyping will play significant roles in both the nowcasting and forecasting of plant traits through modeling of genotype/phenotype relationships.
Collapse
Affiliation(s)
- Keiichi Mochida
- Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Microalgae Production Control Technology Laboratory, RIKEN Baton Zone Program, RIKEN Cluster for Science, Technology and Innovation Hub, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Institute of Plant Science and Resources, Okayama University, 2-20-1 Chuo, Kurashiki, Okayama 710-0046, Japan
- Kihara Institute for Biological Research, Yokohama City University, 641-12 Maioka-cho, Totsuka-ku, Yokohama, Kanagawa 244–0813, Japan
- Graduate School of Nanobioscience, Yokohama City University, 22-2 Seto, Kanazawa-ku, Yokohama, Kanagawa 236-0027, Japan
| | - Satoru Koda
- Graduate School of Mathematics, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Komaki Inoue
- Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Takashi Hirayama
- Institute of Plant Science and Resources, Okayama University, 2-20-1 Chuo, Kurashiki, Okayama 710-0046, Japan
| | - Shojiro Tanaka
- Hiroshima University of Economics, 5-37-1, Gion, Asaminami, Hiroshima-shi Hiroshima 731-0138, Japan
| | - Ryuei Nishii
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Farid Melgani
- Department of Information Engineering and Computer Science, University of Trento, Via Sommarive 9, 38123 Trento, Italy
| |
Collapse
|
13
|
Mochida K, Koda S, Inoue K, Hirayama T, Tanaka S, Nishii R, Melgani F. Computer vision-based phenotyping for improvement of plant productivity: a machine learning perspective. Gigascience 2019. [PMID: 30520975 DOI: 10.1093/gigascience/giy153/5232233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023] Open
Abstract
Employing computer vision to extract useful information from images and videos is becoming a key technique for identifying phenotypic changes in plants. Here, we review the emerging aspects of computer vision for automated plant phenotyping. Recent advances in image analysis empowered by machine learning-based techniques, including convolutional neural network-based modeling, have expanded their application to assist high-throughput plant phenotyping. Combinatorial use of multiple sensors to acquire various spectra has allowed us to noninvasively obtain a series of datasets, including those related to the development and physiological responses of plants throughout their life. Automated phenotyping platforms accelerate the elucidation of gene functions associated with traits in model plants under controlled conditions. Remote sensing techniques with image collection platforms, such as unmanned vehicles and tractors, are also emerging for large-scale field phenotyping for crop breeding and precision agriculture. Computer vision-based phenotyping will play significant roles in both the nowcasting and forecasting of plant traits through modeling of genotype/phenotype relationships.
Collapse
Affiliation(s)
- Keiichi Mochida
- Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Microalgae Production Control Technology Laboratory, RIKEN Baton Zone Program, RIKEN Cluster for Science, Technology and Innovation Hub, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- Institute of Plant Science and Resources, Okayama University, 2-20-1 Chuo, Kurashiki, Okayama 710-0046, Japan
- Kihara Institute for Biological Research, Yokohama City University, 641-12 Maioka-cho, Totsuka-ku, Yokohama, Kanagawa 244-0813, Japan
- Graduate School of Nanobioscience, Yokohama City University, 22-2 Seto, Kanazawa-ku, Yokohama, Kanagawa 236-0027, Japan
| | - Satoru Koda
- Graduate School of Mathematics, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Komaki Inoue
- Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Takashi Hirayama
- Institute of Plant Science and Resources, Okayama University, 2-20-1 Chuo, Kurashiki, Okayama 710-0046, Japan
| | - Shojiro Tanaka
- Hiroshima University of Economics, 5-37-1, Gion, Asaminami, Hiroshima-shi Hiroshima 731-0138, Japan
| | - Ryuei Nishii
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Farid Melgani
- Department of Information Engineering and Computer Science, University of Trento, Via Sommarive 9, 38123 Trento, Italy
| |
Collapse
|
14
|
Fuentes R, Letelier J, Tajer B, Valdivia LE, Mullins MC. Fishing forward and reverse: Advances in zebrafish phenomics. Mech Dev 2018; 154:296-308. [PMID: 30130581 PMCID: PMC6289646 DOI: 10.1016/j.mod.2018.08.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Revised: 08/06/2018] [Accepted: 08/17/2018] [Indexed: 12/15/2022]
Abstract
Understanding how the genome instructs the phenotypic characteristics of an organism is one of the major scientific endeavors of our time. Advances in genetics have progressively deciphered the inheritance, identity and biological relevance of genetically encoded information, contributing to the rise of several, complementary omic disciplines. One of them is phenomics, an emergent area of biology dedicated to the systematic multi-scale analysis of phenotypic traits. This discipline provides valuable gene function information to the rapidly evolving field of genetics. Current molecular tools enable genome-wide analyses that link gene sequence to function in multi-cellular organisms, illuminating the genome-phenome relationship. Among vertebrates, zebrafish has emerged as an outstanding model organism for high-throughput phenotyping and modeling of human disorders. Advances in both systematic mutagenesis and phenotypic analyses of embryonic and post-embryonic stages in zebrafish have revealed the function of a valuable collection of genes and the general structure of several complex traits. In this review, we summarize multiple large-scale genetic efforts addressing parental, embryonic, and adult phenotyping in the zebrafish. The genetic and quantitative tools available in the zebrafish model, coupled with the broad spectrum of phenotypes that can be assayed, make it a powerful model for phenomics, well suited for the dissection of genotype-phenotype associations in development, physiology, health and disease.
Collapse
Affiliation(s)
- Ricardo Fuentes
- Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Joaquín Letelier
- Centro Andaluz de Biología del Desarrollo (CSIC/UPO/JA), Seville, Spain; Center for Integrative Biology, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
| | - Benjamin Tajer
- Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Leonardo E Valdivia
- Center for Integrative Biology, Facultad de Ciencias, Universidad Mayor, Santiago, Chile.
| | - Mary C Mullins
- Department of Cell and Developmental Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
15
|
Abstract
Biomedical data science has experienced an explosion of new data over the past decade. Abundant genetic and genomic data are increasingly available in large, diverse data sets due to the maturation of modern molecular technologies. Along with these molecular data, dense, rich phenotypic data are also available on comprehensive clinical data sets from health care provider organizations, clinical trials, population health registries, and epidemiologic studies. The methods and approaches for interrogating these large genetic/genomic and clinical data sets continue to evolve rapidly, as our understanding of the questions and challenges continue to emerge. In this review, the state-of-the-art methodologies for genetic/genomic analysis along with complex phenomics will be discussed. This field is changing and adapting to the novel data types made available, as well as technological advances in computation and machine learning. Thus, I will also discuss the future challenges in this exciting and innovative space. The promises of precision medicine rely heavily on the ability to marry complex genetic/genomic data with clinical phenotypes in meaningful ways.
Collapse
Affiliation(s)
- Marylyn D. Ritchie
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
16
|
Verma A, Bradford Y, Dudek S, Lucas AM, Verma SS, Pendergrass SA, Ritchie MD. A simulation study investigating power estimates in phenome-wide association studies. BMC Bioinformatics 2018; 19:120. [PMID: 29618318 PMCID: PMC5885318 DOI: 10.1186/s12859-018-2135-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 03/26/2018] [Indexed: 01/01/2023] Open
Abstract
Background Phenome-wide association studies (PheWAS) are a high-throughput approach to evaluate comprehensive associations between genetic variants and a wide range of phenotypic measures. PheWAS has varying sample sizes for quantitative traits, and variable numbers of cases and controls for binary traits across the many phenotypes of interest, which can affect the statistical power to detect associations. The motivation of this study is to investigate the various parameters which affect the estimation of statistical power in PheWAS, including sample size, case-control ratio, minor allele frequency, and disease penetrance. Results We performed a PheWAS simulation study, where we investigated variations in statistical power based on different parameters, such as overall sample size, number of cases, case-control ratio, minor allele frequency, and disease penetrance. The simulation was performed on both binary and quantitative phenotypic measures. Our simulation on binary traits suggests that the number of cases has more impact on statistical power than the case to control ratio; also, we found that a sample size of 200 cases or more maintains the statistical power to identify associations for common variants. For quantitative traits, a sample size of 1000 or more individuals performed best in the power calculations. We focused on common genetic variants (MAF > 0.01) in this study; however, in future studies, we will be extending this effort to perform similar simulations on rare variants. Conclusions This study provides a series of PheWAS simulation analyses that can be used to estimate statistical power for some potential scenarios. These results can be used to provide guidelines for appropriate study design for future PheWAS analyses. Electronic supplementary material The online version of this article (10.1186/s12859-018-2135-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Anurag Verma
- Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.,The Huck Institutes of the Life Science, Pennsylvania State University, University Park, PA, USA
| | - Yuki Bradford
- Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Scott Dudek
- Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Anastasia M Lucas
- Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA
| | - Shefali S Verma
- Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA.,The Huck Institutes of the Life Science, Pennsylvania State University, University Park, PA, USA
| | | | - Marylyn D Ritchie
- Department of Genetics and Institute for Biomedical Informatics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA, USA. .,The Huck Institutes of the Life Science, Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|