1
|
Zhurenkov KE, Lobov AA, Bildyug NB, Alexander-Sinclair EI, Darvish DM, Lomert EV, Kriger DV, Zainullina BR, Chabina AS, Khorolskaya JI, Perepletchikova DA, Blinova MI, Mikhailova NA. Focal Adhesion Maturation Responsible for Behavioral Changes in Human Corneal Stromal Fibroblasts on Fibrillar Substrates. Int J Mol Sci 2024; 25:8601. [PMID: 39201288 PMCID: PMC11354758 DOI: 10.3390/ijms25168601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/05/2024] [Accepted: 08/05/2024] [Indexed: 09/02/2024] Open
Abstract
The functioning of the human cornea heavily relies on the maintenance of its extracellular matrix (ECM) mechanical properties. Within this context, corneal stromal fibroblasts (CSFs) are essential, as they are responsible for remodeling the corneal ECM. In this study, we used a decellularized human amniotic membrane (dHAM) and a custom fibrillar collagen film (FCF) to explore the effects of fibrillar materials on human CSFs. Our findings indicate that substrates like FCF can enhance the early development of focal adhesions (FAs), leading to the activation and propagation of mechanotransduction signals. This is primarily achieved through FAK autophosphorylation and YAP1 nuclear translocation pathways. Remarkably, inhibiting FAK autophosphorylation negated the observed changes. Proteome analysis further confirmed the central role of FAs in mechanotransduction propagation in CSFs cultured on FCF. This analysis also highlighted complex signaling pathways, including chromatin epigenetic modifications, in response to fibrillar substrates. Overall, our research highlights the potential pathways through which CSFs undergo behavioral changes when exposed to fibrillar substrates, identifying FAs as essential mechanotransducers.
Collapse
Affiliation(s)
- Kirill E Zhurenkov
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
- Department of Cytology and Histology, St. Petersburg State University, St. Petersburg 199032, Russia
| | - Arseniy A Lobov
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Natalya B Bildyug
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
| | | | - Diana M Darvish
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Ekaterina V Lomert
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Daria V Kriger
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Bozhana R Zainullina
- Centre for Molecular and Cell Technologies, St. Petersburg State University, St. Petersburg 199032, Russia
| | - Alina S Chabina
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Julia I Khorolskaya
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
| | | | - Miralda I Blinova
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
| | - Natalia A Mikhailova
- Institute of Cytology Russian Academy of Sciences, St. Petersburg 194064, Russia
| |
Collapse
|
2
|
Kourakos G, Pauloo R, Harter T. An Imputation Method for Simulating 3D Well Screen Locations from Limited Regional Well Log Data. GROUND WATER 2024. [PMID: 38934581 DOI: 10.1111/gwat.13424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 04/24/2024] [Accepted: 06/03/2024] [Indexed: 06/28/2024]
Abstract
In groundwater modeling studies, accurate spatial and intensity identification of water sources and sinks is of critical importance. Precise construction data about wells (water sinks) are particularly difficult to obtain. The collection of well log data is expensive and laborious, and government records of historic well log data are often imprecise and incomplete with respect to the precise location or pumping rate. In many groundwater modeling studies, such as groundwater quality assessments, a precise representation of the horizontal and vertical distribution of well screens is required to accurately estimate contaminant breakthrough curves. The number of wells under consideration may be very large, for example, in the assessment of nonpoint source pollution. In this paper, we propose an imputation framework that allows for proper reconstruction of missing well data. Our approach exploits available information and tolerates data gaps and imprecisions. We demonstrate the value of this method for a subregion of the Central Valley aquifer (California, USA). We show that our framework imputes missing values that preserve statistical properties of available data and that remain consistent with the known spatial distribution of well screens and pumping rates in the three-dimensional aquifer system.
Collapse
Affiliation(s)
| | - Rich Pauloo
- Department of Land, Air, and Water Resources One Shields Avenue, University of California, Davis, CA, 95616-8628, USA
| | - Thomas Harter
- Department of Land, Air, and Water Resources One Shields Avenue, University of California, Davis, CA, 95616-8628, USA
| |
Collapse
|
3
|
Peng H, Wang H, Kong W, Li J, Goh WWB. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat Commun 2024; 15:3922. [PMID: 38724498 PMCID: PMC11082229 DOI: 10.1038/s41467-024-47899-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Identification of differentially expressed proteins in a proteomics workflow typically encompasses five key steps: raw data quantification, expression matrix construction, matrix normalization, missing value imputation (MVI), and differential expression analysis. The plethora of options in each step makes it challenging to identify optimal workflows that maximize the identification of differentially expressed proteins. To identify optimal workflows and their common properties, we conduct an extensive study involving 34,576 combinatoric experiments on 24 gold standard spike-in datasets. Applying frequent pattern mining techniques to top-ranked workflows, we uncover high-performing rules that demonstrate optimality has conserved properties. Via machine learning, we confirm optimal workflows are indeed predictable, with average cross-validation F1 scores and Matthew's correlation coefficients surpassing 0.84. We introduce an ensemble inference to integrate results from individual top-performing workflows for expanding differential proteome coverage and resolve inconsistencies. Ensemble inference provides gains in pAUC (up to 4.61%) and G-mean (up to 11.14%) and facilitates effective aggregation of information across varied quantification approaches such as topN, directLFQ, MaxLFQ intensities, and spectral counts. However, further development and evaluation are needed to establish acceptable frameworks for conducting ensemble inference on multiple proteomics workflows.
Collapse
Affiliation(s)
- Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - He Wang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Jinyan Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore.
- Center of AI in Medicine, Nanyang Technological University, Singapore, Singapore.
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
4
|
Jonak K, Suppanz I, Bender J, Chacinska A, Warscheid B, Topf U. Ageing-dependent thiol oxidation reveals early oxidation of proteins with core proteostasis functions. Life Sci Alliance 2024; 7:e202302300. [PMID: 38383455 PMCID: PMC10881836 DOI: 10.26508/lsa.202302300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/08/2024] [Accepted: 02/09/2024] [Indexed: 02/23/2024] Open
Abstract
Oxidative post-translational modifications of protein thiols are well recognized as a readily occurring alteration of proteins, which can modify their function and thus control cellular processes. The development of techniques enabling the site-specific assessment of protein thiol oxidation on a proteome-wide scale significantly expanded the number of known oxidation-sensitive protein thiols. However, lacking behind are large-scale data on the redox state of proteins during ageing, a physiological process accompanied by increased levels of endogenous oxidants. Here, we present the landscape of protein thiol oxidation in chronologically aged wild-type Saccharomyces cerevisiae in a time-dependent manner. Our data determine early-oxidation targets in key biological processes governing the de novo production of proteins, protein folding, and degradation, and indicate a hierarchy of cellular responses affected by a reversible redox modification. Comparison with existing datasets in yeast, nematode, fruit fly, and mouse reveals the evolutionary conservation of these oxidation targets. To facilitate accessibility, we integrated the cross-species comparison into the newly developed OxiAge Database.
Collapse
Affiliation(s)
- Katarzyna Jonak
- https://ror.org/034tvp782 Laboratory of Molecular Basis of Aging and Rejuvenation, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| | - Ida Suppanz
- CIBSS Centre for Integrative Biological Signalling Research, University of Freiburg, Freiburg, Germany
| | - Julian Bender
- https://ror.org/00fbnyb24 Biochemistry II, Theodor Boveri-Institute, Biocenter, University of Würzburg, Würzburg, Germany
| | | | - Bettina Warscheid
- CIBSS Centre for Integrative Biological Signalling Research, University of Freiburg, Freiburg, Germany
- https://ror.org/00fbnyb24 Biochemistry II, Theodor Boveri-Institute, Biocenter, University of Würzburg, Würzburg, Germany
| | - Ulrike Topf
- https://ror.org/034tvp782 Laboratory of Molecular Basis of Aging and Rejuvenation, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| |
Collapse
|
5
|
Shao R, Suzuki T, Suyama M, Tsukada Y. The impact of selective HDAC inhibitors on the transcriptome of early mouse embryos. BMC Genomics 2024; 25:143. [PMID: 38317092 PMCID: PMC10840191 DOI: 10.1186/s12864-024-10029-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 01/18/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND Histone acetylation, which is regulated by histone acetyltransferases (HATs) and histone deacetylases (HDACs), plays a crucial role in the control of gene expression. HDAC inhibitors (HDACi) have shown potential in cancer therapy; however, the specific roles of HDACs in early embryos remain unclear. Moreover, although some pan-HDACi have been used to maintain cellular undifferentiated states in early embryos, the specific mechanisms underlying their effects remain unknown. Thus, there remains a significant knowledge gap regarding the application of selective HDACi in early embryos. RESULTS To address this gap, we treated early embryos with two selective HDACi (MGCD0103 and T247). Subsequently, we collected and analyzed their transcriptome data at different developmental stages. Our findings unveiled a significant effect of HDACi treatment during the crucial 2-cell stage of zygotes, leading to a delay in embryonic development after T247 and an arrest at 2-cell stage after MGCD0103 administration. Furthermore, we elucidated the regulatory targets underlying this arrested embryonic development, which pinpointed the G2/M phase as the potential period of embryonic development arrest caused by MGCD0103. Moreover, our investigation provided a comprehensive profile of the biological processes that are affected by HDACi, with their main effects being predominantly localized in four aspects of zygotic gene activation (ZGA): RNA splicing, cell cycle regulation, autophagy, and transcription factor regulation. By exploring the transcriptional regulation and epigenetic features of the genes affected by HDACi, we made inferences regarding the potential main pathways via which HDACs affect gene expression in early embryos. Notably, Hdac7 exhibited a distinct response, highlighting its potential as a key player in early embryonic development. CONCLUSIONS Our study conducted a comprehensive analysis of the effects of HDACi on early embryonic development at the transcriptional level. The results demonstrated that HDACi significantly affected ZGA in embryos, elucidated the distinct actions of various selective HDACi, and identified specific biological pathways and mechanisms via which these inhibitors modulated early embryonic development.
Collapse
Affiliation(s)
- Ruiqi Shao
- Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, 812-8582, Fukuoka, Japan
| | - Takayoshi Suzuki
- SANKEN, Osaka University, 8-1 Mihogaoka, 567-0047, Ibaraki, Osaka, Japan
| | - Mikita Suyama
- Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-ku, 812-8582, Fukuoka, Japan.
| | - Yuichi Tsukada
- Advanced Biological Information Research Division, INAMORI Frontier Research Center, Kyushu University, 744 Motooka, Nishi-ku, 819-0395, Fukuoka, Japan.
| |
Collapse
|
6
|
Remodeling of algal photosystem I through phosphorylation. Biosci Rep 2023; 43:232211. [PMID: 36477263 PMCID: PMC9874419 DOI: 10.1042/bsr20220369] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/29/2022] [Accepted: 12/07/2022] [Indexed: 12/12/2022] Open
Abstract
Photosystem I (PSI) with its associated light-harvesting system is the most important generator of reducing power in photosynthesis. The PSI core complex is highly conserved, whereas peripheral subunits as well as light-harvesting proteins (LHCI) reveal a dynamic plasticity. Moreover, in green alga, PSI-LHCI complexes are found as monomers, dimers, and state transition complexes, where two LHCII trimers are associated. Herein, we show light-dependent phosphorylation of PSI subunits PsaG and PsaH as well as Lhca6. Potential consequences of the dynamic phosphorylation of PsaG and PsaH are structurally analyzed and discussed in regard to the formation of the monomeric, dimeric, and LHCII-associated PSI-LHCI complexes.
Collapse
|
7
|
Liu H, Xing K, Jiang Y, Liu Y, Wang C, Ding X. Using Machine Learning to Identify Biomarkers Affecting Fat Deposition in Pigs by Integrating Multisource Transcriptome Information. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2022; 70:10359-10370. [PMID: 35953074 PMCID: PMC9413214 DOI: 10.1021/acs.jafc.2c03339] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 07/27/2022] [Accepted: 07/29/2022] [Indexed: 06/15/2023]
Abstract
Fat deposition in pigs is not only closely related to pig production efficiency and pork quality but also an ideal model for human obesity. Transcriptome sequencing is widely used to study fat deposition. However, due to small sample sizes, high false positive rates, and poor consistency of results from different studies, new strategies are urgently needed. Machine learning, a new analysis method, can effectively fit complex data and accurately identify samples and genes. In this study, 36 samples of adipose tissue, muscle tissue, and liver tissue were collected from Songliao black pigs and Landrace pigs, and the mRNA of all the samples was sequenced. In addition, we collected transcriptome data for 64 samples in the GEO database from four different sources. After standardization and imputation of missing values in the data set comprising 100 samples, traditional differential expression analysis was carried out, and different numbers of expressed genes were selected as features for the training model of eight machine learning methods. In the 1000 replications of fourfold cross validation with 100 samples, AdaBoost performed best, with an average prediction accuracy greater than 93% and the highest mean area under the curve in predicting the high- and low-fat content groups among the eight ML methods. According to their performance-based ranks inferred by AdaBoost, 12 genes related to fat deposition were identified; among them, FASN and APOD were specifically expressed in adipose tissue, and APOA1 was specifically expressed in the liver, which could be important candidate biomarkers affecting fat deposition.
Collapse
|
8
|
Aftab W, Lahiri S, Imhof A. ImShot: An Open-Source Software for Probabilistic Identification of Proteins In Situ and Visualization of Proteomics Data. Mol Cell Proteomics 2022; 21:100242. [PMID: 35569805 PMCID: PMC9194865 DOI: 10.1016/j.mcpro.2022.100242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 04/08/2022] [Accepted: 05/10/2022] [Indexed: 11/19/2022] Open
Abstract
Imaging mass spectrometry (IMS) has developed into a powerful tool allowing label-free detection of numerous biomolecules in situ. In contrast to shotgun proteomics, proteins/peptides can be detected directly from biological tissues and correlated to its morphology leading to a gain of crucial clinical information. However, direct identification of the detected molecules is currently challenging for MALDI-IMS, thereby compelling researchers to use complementary techniques and resource intensive experimental setups. Despite these strategies, sufficient information could not be extracted because of lack of an optimum data combination strategy/software. Here, we introduce a new open-source software ImShot that aims at identifying peptides obtained in MALDI-IMS. This is achieved by combining information from IMS and shotgun proteomics (LC-MS) measurements of serial sections of the same tissue. The software takes advantage of a two-group comparison to determine the search space of IMS masses after deisotoping the corresponding spectra. Ambiguity in annotations of IMS peptides is eliminated by introduction of a novel scoring system that identifies the most likely parent protein of a detected peptide in the corresponding IMS dataset. Thanks to its modular structure, the software can also handle LC-MS data separately and display interactive enrichment plots and enriched Gene Ontology terms or cellular pathways. The software has been built as a desktop application with a conveniently designed graphic user interface to provide users with a seamless experience in data analysis. ImShot can run on all the three major desktop operating systems and is freely available under Massachusetts Institute of Technology license.
Collapse
Affiliation(s)
- Wasim Aftab
- Biomedical Center, Protein Analysis Unit, Faculty of Medicine, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany; Graduate School for Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität Munich, Munich, Germany
| | - Shibojyoti Lahiri
- Biomedical Center, Protein Analysis Unit, Faculty of Medicine, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| | - Axel Imhof
- Biomedical Center, Protein Analysis Unit, Faculty of Medicine, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| |
Collapse
|
9
|
|
10
|
Alsaber A, Al-Herz A, Pan J, Al-Sultan AT, Mishra D. Handling missing data in a rheumatoid arthritis registry using random forest approach. Int J Rheum Dis 2021; 24:1282-1293. [PMID: 34382756 DOI: 10.1111/1756-185x.14203] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/13/2021] [Accepted: 07/23/2021] [Indexed: 12/01/2022]
Abstract
Missing data in clinical epidemiological research violate the intention-to-treat principle, reduce the power of statistical analysis, and can introduce bias if the cause of missing data is related to a patient's response to treatment. Multiple imputation provides a solution to predict the values of missing data. The main objective of this study is to estimate and impute missing values in patient records. The data from the Kuwait Registry for Rheumatic Diseases was used to deal with missing values among patient records. A number of methods were implemented to deal with missing data; however, choosing the best imputation method was judged by the lowest root mean square error (RMSE). Among 1735 rheumatoid arthritis patients, we found missing values vary from 5% to 65.5% of the total observations. The results show that sequential random forest method can estimate these missing values with a high level of accuracy. The RMSE varied between 2.5 and 5.0. missForest had the lowest imputation error for both continuous and categorical variables under each missing data rate (10%, 20%, and 30%) and had the smallest prediction error difference when the models used the imputed laboratory values.
Collapse
Affiliation(s)
- Ahmad Alsaber
- Department of Mathematics and Statistics, University of Strathclyde, Glasgow, UK
| | - Adeeba Al-Herz
- Department of Rheumatology, Al-Amiri Hospital, Kuwait City, Kuwait
| | - Jiazhu Pan
- Department of Mathematics and Statistics, University of Strathclyde, Glasgow, UK
| | - Ahmad T Al-Sultan
- Department of Community Medicine and Behavioral Sciences, Kuwait University, Kuwait City, Kuwait
| | - Divya Mishra
- Department of Plant Pathology, Kansas State University, Kansas, MN, USA
| | -
- Department of Rheumatology, Al-Amiri Hospital, Kuwait City, Kuwait
| |
Collapse
|
11
|
Alsaber AR, Pan J, Al-Hurban A. Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018). INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18031333. [PMID: 33540610 PMCID: PMC7908071 DOI: 10.3390/ijerph18031333] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 01/26/2021] [Accepted: 01/27/2021] [Indexed: 11/16/2022]
Abstract
In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.
Collapse
Affiliation(s)
- Ahmad R. Alsaber
- Department of Mathematics and Statistics, University of Strathclyde, Glasgow G1 1XH, UK;
- Correspondence:
| | - Jiazhu Pan
- Department of Mathematics and Statistics, University of Strathclyde, Glasgow G1 1XH, UK;
| | - Adeeba Al-Hurban
- Department of Earth and Environmental Sciences, Faculty of Science, Kuwait University, P.O. Box 5969, Safat 13060, Kuwait;
| |
Collapse
|
12
|
Wang S, Li W, Hu L, Cheng J, Yang H, Liu Y. NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses. Nucleic Acids Res 2020; 48:e83. [PMID: 32526036 PMCID: PMC7641313 DOI: 10.1093/nar/gkaa498] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 04/20/2020] [Accepted: 06/08/2020] [Indexed: 02/05/2023] Open
Abstract
Mass spectrometry (MS)-based quantitative proteomics experiments frequently generate data with missing values, which may profoundly affect downstream analyses. A wide variety of imputation methods have been established to deal with the missing-value issue. To date, however, there is a scarcity of efficient, systematic, and easy-to-handle tools that are tailored for proteomics community. Herein, we developed a user-friendly and powerful stand-alone software, NAguideR, to enable implementation and evaluation of different missing value methods offered by 23 widely used missing-value imputation algorithms. NAguideR further evaluates data imputation results through classic computational criteria and, unprecedentedly, proteomic empirical criteria, such as quantitative consistency between different charge-states of the same peptide, different peptides belonging to the same proteins, and individual proteins participating protein complexes and functional interactions. We applied NAguideR into three label-free proteomic datasets featuring peptide-level, protein-level, and phosphoproteomic variables respectively, all generated by data independent acquisition mass spectrometry (DIA-MS) with substantial biological replicates. The results indicate that NAguideR is able to discriminate the optimal imputation methods that are facilitating DIA-MS experiments over those sub-optimal and low-performance algorithms. NAguideR further provides downloadable tables and figures supporting flexible data analysis and interpretation. NAguideR is freely available at http://www.omicsolution.org/wukong/NAguideR/ and the source code: https://github.com/wangshisheng/NAguideR/.
Collapse
Affiliation(s)
- Shisheng Wang
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Wenxue Li
- Yale Cancer Biology Institute, Yale University, West Haven, CT 06516, USA
| | - Liqiang Hu
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jingqiu Cheng
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Hao Yang
- West China-Washington Mitochondria and Metabolism Research Center; Key Lab of Transplant Engineering and Immunology, MOH, Regenerative Medicine Research Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Yansheng Liu
- Yale Cancer Biology Institute, Yale University, West Haven, CT 06516, USA.,Department of Pharmacology, Yale University School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
13
|
Frahm G, Nordhausen K, Oja H. M-estimation with incomplete and dependent multivariate data. J MULTIVARIATE ANAL 2020. [DOI: 10.1016/j.jmva.2019.104569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
Zhong C, Pedrycz W, Wang D, Li L, Li Z. Granular data imputation: A framework of Granular Computing. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2016.05.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
15
|
Van Aelst S. Comments on: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. TEST-SPAIN 2015. [DOI: 10.1007/s11749-015-0456-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
16
|
Templ M, Alfons A, Filzmoser P. Exploring incomplete data using visualization techniques. ADV DATA ANAL CLASSI 2011. [DOI: 10.1007/s11634-011-0102-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
17
|
Todorov V, Templ M, Filzmoser P. Detection of multivariate outliers in business survey data with incomplete information. ADV DATA ANAL CLASSI 2010. [DOI: 10.1007/s11634-010-0075-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recognit Lett 2010. [DOI: 10.1016/j.patrec.2010.05.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
19
|
Aittokallio T. Dealing with missing values in large-scale studies: microarray data imputation and beyond. Brief Bioinform 2009; 11:253-64. [DOI: 10.1093/bib/bbp059] [Citation(s) in RCA: 109] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
20
|
Botella C, Ferré J, Boqué R. Classification from microarray data using probabilistic discriminant partial least squares with reject option. Talanta 2009; 80:321-8. [DOI: 10.1016/j.talanta.2009.06.072] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2009] [Revised: 06/18/2009] [Accepted: 06/30/2009] [Indexed: 10/20/2022]
|