1
|
Albert EA, Kondratieva OA, Baranova EE, Sagaydak OV, Belenikin MS, Zobkova GY, Kuznetsova ES, Deviatkin AA, Zhurov AA, Karpulevich EA, Volchkov PY, Vorontsova MV. Transferability of the PRS estimates for height and BMI obtained from the European ethnic groups to the Western Russian populations. Front Genet 2023; 14:1086709. [PMID: 36726807 PMCID: PMC9885218 DOI: 10.3389/fgene.2023.1086709] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/05/2023] [Indexed: 01/17/2023] Open
Abstract
Genetic data plays an increasingly important role in modern medicine. Decrease in the cost of sequencing with subsequent increase in imputation accuracy, and the accumulation of large amounts of high-quality genetic data enable the creation of polygenic risk scores (PRSs) to perform genotype-phenotype associations. The accuracy of phenotype prediction primarily depends on the overall trait heritability, Genome-wide association studies cohort size, and the similarity of genetic background between the base and the target cohort. Here we utilized 8,664 high coverage genomic samples collected across Russia by "Evogen", a Russian biomedical company, to evaluate the predictive power of PRSs based on summary statistics established on cohorts of European ancestry for basic phenotypic traits, namely height and BMI. We have demonstrated that the PRSs calculated for selected traits in three distinct Russian populations, recapitulate the predictive power from the original studies. This is evidence that GWAS summary statistics calculated on cohorts of European ancestry are transferable onto at least some ethnic groups in Russia.
Collapse
Affiliation(s)
- E. A. Albert
- National Medical Research Center for Endocrinology, Moscow, Russia,Life Sciences Research Center, Moscow Institute of Physics and Technology, Dolgoprudniy, Russia,*Correspondence: E. A. Albert,
| | - O. A. Kondratieva
- Department of Information Systems, Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia
| | | | | | | | | | | | - A. A. Deviatkin
- National Medical Research Center for Endocrinology, Moscow, Russia,Life Sciences Research Center, Moscow Institute of Physics and Technology, Dolgoprudniy, Russia
| | - A. A. Zhurov
- National Medical Research Center for Endocrinology, Moscow, Russia
| | - E. A. Karpulevich
- Department of Information Systems, Ivannikov Institute for System Programming of the Russian Academy of Sciences, Moscow, Russia
| | - P. Y. Volchkov
- National Medical Research Center for Endocrinology, Moscow, Russia,Life Sciences Research Center, Moscow Institute of Physics and Technology, Dolgoprudniy, Russia
| | - M. V. Vorontsova
- National Medical Research Center for Endocrinology, Moscow, Russia
| |
Collapse
|
2
|
Abstract
Maximal information coefficient (MIC) explores the associations between pairwise variables in complex relationships. It approaches the correlation by optimized partition on the axis. However, when the relationships meet special noise, MIC may overestimate the correlated value, which leads to the misidentification of the relationship without noiseless. In this article, a novel method of weighted information coefficient mean (WICM) is proposed to detect unbiased associations in large data sets. First, we mathematically analyze the cause of giving an abnormal correlation value to a noisy relationship. Then, the WICM is presented in two core steps. One is to detect the potential overestimation from the relationships with high value, and the other is to rectify the overestimation by calculating information coefficient mean instead of just selecting the maximum element in the characteristic matrix. Finally, experiments in functional relationships and real-world data relationships show that the overestimation can be solved by WICM with both feasibility and effectiveness.
Collapse
Affiliation(s)
- Chuanlu Liu
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Shuliang Wang
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Institute of E-Government, Beijing Institute of Technology, Beijing, China
| | - Hanning Yuan
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Xiaojia Liu
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
3
|
Pang CNI, Ballouz S, Weissberger D, Thibaut LM, Hamey JJ, Gillis J, Wilkins MR, Hart-Smith G. Analytical Guidelines for co-fractionation Mass Spectrometry Obtained through Global Profiling of Gold Standard Saccharomyces cerevisiae Protein Complexes. Mol Cell Proteomics 2020; 19:1876-1895. [PMID: 32817346 PMCID: PMC7664123 DOI: 10.1074/mcp.ra120.002154] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 07/14/2020] [Indexed: 11/06/2022] Open
Abstract
Co-fractionation MS (CF-MS) is a technique with potential to characterize endogenous and unmanipulated protein complexes on an unprecedented scale. However this potential has been offset by a lack of guidelines for best-practice CF-MS data collection and analysis. To obtain such guidelines, this study thoroughly evaluates novel and published Saccharomyces cerevisiae CF-MS data sets using very high proteome coverage libraries of yeast gold standard complexes. A new method for identifying gold standard complexes in CF-MS data, Reference Complex Profiling, and the Extending 'Guilt-by-Association' by Degree (EGAD) R package are used for these evaluations, which are verified with concurrent analyses of published human data. By evaluating data collection designs, which involve fractionation of cell lysates, it is found that near-maximum recall of complexes can be achieved with fewer samples than published studies. Distributing sample collection across orthogonal fractionation methods, rather than a single high resolution data set, leads to particularly efficient recall. By evaluating 17 different similarity scoring metrics, which are central to CF-MS data analysis, it is found that two metrics rarely used in past CF-MS studies - Spearman and Kendall correlations - and the recently introduced Co-apex metric frequently maximize recall, whereas a popular metric-Euclidean distance-delivers poor recall. The common practice of integrating external genomic data into CF-MS data analysis is also evaluated, revealing that this practice may improve the precision and recall of known complexes but is generally unsuitable for predicting novel complexes in model organisms. If studying nonmodel organisms using orthologous genomic data, it is found that particular subsets of fractionation profiles (e.g. the lowest abundance quartile) should be excluded to minimize false discovery. These assessments are summarized in a series of universally applicable guidelines for precise, sensitive and efficient CF-MS studies of known complexes, and effective predictions of novel complexes for orthogonal experimental validation.
Collapse
Affiliation(s)
- Chi Nam Ignatius Pang
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Sara Ballouz
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia
| | - Daniel Weissberger
- School of Chemistry, University of New South Wales, Sydney, New South Wales, Australia
| | - Loïc M Thibaut
- School of Mathematics and Statistics, University of New South Wales, Sydney, New South Wales, Australia
| | - Joshua J Hamey
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Woodbury, New York, USA
| | - Marc R Wilkins
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Gene Hart-Smith
- Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales, Australia; Department of Molecular Sciences, Macquarie University, Sydney, New South Wales, Australia.
| |
Collapse
|
4
|
Curreri F, Graziani S, Xibilia MG. Input selection methods for data-driven Soft sensors design: Application to an industrial process. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.05.028] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
5
|
Uzma, Halim Z. Optimizing the DNA fragment assembly using metaheuristic-based overlap layout consensus approach. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106256] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
6
|
Reshef YA, Reshef DN, Sabeti PC, Mitzenmacher M. Equitability, Interval Estimation, and Statistical Power. Stat Sci 2020. [DOI: 10.1214/19-sts719] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|