1
|
Iravani S, Conrad TOF. An Interpretable Deep Learning Approach for Biomarker Detection in LC-MS Proteomics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:151-161. [PMID: 35007196 DOI: 10.1109/tcbb.2022.3141656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Analyzing mass spectrometry-based proteomics data with deep learning (DL) approaches poses several challenges due to the high dimensionality, low sample size, and high level of noise. Additionally, DL-based workflows are often hindered to be integrated into medical settings due to the lack of interpretable explanation. We present DLearnMS, a DL biomarker detection framework, to address these challenges on proteomics instances of liquid chromatography-mass spectrometry (LC-MS) - a well-established tool for quantifying complex protein mixtures. Our DLearnMS framework learns the clinical state of LC-MS data instances using convolutional neural networks. Based on the trained neural networks, we show how biomarkers can be identified using layer-wise relevance propagation. This enables detecting discriminating regions of the data and the design of more robust networks. One of the main advantages over other established methods is that no explicit preprocessing step is needed in our DLearnMS framework. Our evaluation shows that DLearnMS outperforms conventional LC-MS biomarker detection approaches in identifying fewer false positive peaks while maintaining a comparable amount of true positives peaks. Code availability: The code is available from the following GIT repository: https://github.com/SaharIravani/DlearnMS.
Collapse
|
2
|
Leonova T, Ihling C, Saoud M, Frolova N, Rennert R, Wessjohann LA, Frolov A. Does filter-aided sample preparation provide sufficient method linearity for quantitative plant shotgun proteomics? FRONTIERS IN PLANT SCIENCE 2022; 13:874761. [PMID: 36507396 PMCID: PMC9728026 DOI: 10.3389/fpls.2022.874761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Accepted: 10/26/2022] [Indexed: 06/17/2023]
Abstract
Due to its outstanding throughput and analytical resolution, gel-free LC-based shotgun proteomics represents the gold standard of proteome analysis. Thereby, the efficiency of sample preparation dramatically affects the correctness and reliability of protein quantification. Thus, the steps of protein isolation, solubilization, and proteolysis represent the principal bottleneck of shotgun proteomics. The desired performance of the sample preparation protocols can be achieved by the application of detergents. However, these compounds ultimately compromise reverse-phase chromatographic separation and disrupt electrospray ionization. Filter-aided sample preparation (FASP) represents an elegant approach to overcome these limitations. Although this method is comprehensively validated for cell proteomics, its applicability to plants and compatibility with plant-specific protein isolation protocols remain to be confirmed. Thereby, the most important gap is the absence of the data on the linearity of underlying protein quantification methods for plant matrices. To fill this gap, we address here the potential of FASP in combination with two protein isolation protocols for quantitative analysis of pea (Pisum sativum) seed and Arabidopsis thaliana leaf proteomes by the shotgun approach. For this aim, in comprehensive spiking experiments with bovine serum albumin (BSA), we evaluated the linear dynamic range (LDR) of protein quantification in the presence of plant matrices. Furthermore, we addressed the interference of two different plant matrices in quantitative experiments, accomplished with two alternative sample preparation workflows in comparison to conventional FASP-based digestion of cell lysates, considered here as a reference. The spiking experiments revealed high sensitivities (LODs of up to 4 fmol) for spiked BSA and LDRs of at least 0.6 × 102. Thereby, phenol extraction yielded slightly better recoveries, whereas the detergent-based method showed better linearity. Thus, our results indicate the very good applicability of FASP to quantitative plant proteomics with only limited impact of the protein isolation technique on the method's overall performance.
Collapse
Affiliation(s)
- Tatiana Leonova
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Halle (Saale), Germany
- Department of Biochemistry, St Petersburg State University, St Petersburg, Russia
| | - Christian Ihling
- Institute of Pharmacy, Department of Pharmaceutical Chemistry and Bioanalytics, Martin-Luther Universität Halle-Wittenberg, Halle (Saale), Germany
| | - Mohamad Saoud
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Halle (Saale), Germany
| | - Nadezhda Frolova
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Halle (Saale), Germany
- Department of Biochemistry, St Petersburg State University, St Petersburg, Russia
| | - Robert Rennert
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Halle (Saale), Germany
| | - Ludger A. Wessjohann
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Halle (Saale), Germany
| | - Andrej Frolov
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Halle (Saale), Germany
- Department of Biochemistry, St Petersburg State University, St Petersburg, Russia
| |
Collapse
|
3
|
Chen S, Qin R, Mahal LK. Sweet systems: technologies for glycomic analysis and their integration into systems biology. Crit Rev Biochem Mol Biol 2021; 56:301-320. [PMID: 33820453 DOI: 10.1080/10409238.2021.1908953] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Found in virtually every organism, glycans are essential molecules that play important roles in almost every aspect of biology. The composition of glycome, the repertoire of glycans in an organism or a biological sample, is often found altered in many diseases, including cancer, infectious diseases, metabolic and developmental disorders. Understanding how glycosylation and glycomic changes enriches our knowledge of the mechanisms of disease progression and sheds light on the development of novel therapeutics. However, the inherent diversity of glycan structures imposes challenges on the experimental characterization of glycomes. Advances in high-throughput glycomic technologies enable glycomic analysis in a rapid and comprehensive manner. In this review, we discuss the analytical methods currently used in high-throughput glycomics, including mass spectrometry, liquid chromatography and lectin microarray. Concomitant with the technical advances is the integration of glycomics into systems biology in the recent years. Herein we elaborate on some representative works from this recent trend to underline the important role of glycomics in such integrated approaches to disease.
Collapse
Affiliation(s)
- Shuhui Chen
- Department of Chemistry, New York University, New York City, NY, USA
| | - Rui Qin
- Department of Chemistry, University of Alberta, Edmonton, AB, Canada
| | - Lara K Mahal
- Department of Chemistry, New York University, New York City, NY, USA.,Department of Chemistry, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
4
|
Tariq MU, Haseeb M, Aledhari M, Razzak R, Parizi RM, Saeed F. Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 9:5497-5516. [PMID: 33537181 PMCID: PMC7853650 DOI: 10.1109/access.2020.3047588] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process a small-scale dataset of one million spectra against a genome of 3 GB. In this article, we provide an up-to-date review of tools that can analyze proteogenomic datasets, providing a critical analysis of the techniques' relative merits and potential pitfalls. We also point out potential bottlenecks and recommendations that can be incorporated in the future design of these workflows to ensure scalability with the increasing size of proteogenomic data. Lastly, we make a case of how high-performance computing (HPC) solutions may be the best bet to ensure the scalability of future big data proteogenomic data analysis.
Collapse
Affiliation(s)
- Muhammad Usman Tariq
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Muhammad Haseeb
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| | - Mohammed Aledhari
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Rehma Razzak
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Reza M Parizi
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA
| | - Fahad Saeed
- School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA
| |
Collapse
|
5
|
Jayathirtha M, Dupree EJ, Manzoor Z, Larose B, Sechrist Z, Neagu AN, Petre BA, Darie CC. Mass Spectrometric (MS) Analysis of Proteins and Peptides. Curr Protein Pept Sci 2020; 22:92-120. [PMID: 32713333 DOI: 10.2174/1389203721666200726223336] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 05/12/2020] [Accepted: 05/28/2020] [Indexed: 01/09/2023]
Abstract
The human genome is sequenced and comprised of ~30,000 genes, making humans just a little bit more complicated than worms or flies. However, complexity of humans is given by proteins that these genes code for because one gene can produce many proteins mostly through alternative splicing and tissue-dependent expression of particular proteins. In addition, post-translational modifications (PTMs) in proteins greatly increase the number of gene products or protein isoforms. Furthermore, stable and transient interactions between proteins, protein isoforms/proteoforms and PTM-ed proteins (protein-protein interactions, PPI) add yet another level of complexity in humans and other organisms. In the past, all of these proteins were analyzed one at the time. Currently, they are analyzed by a less tedious method: mass spectrometry (MS) for two reasons: 1) because of the complexity of proteins, protein PTMs and PPIs and 2) because MS is the only method that can keep up with such a complex array of features. Here, we discuss the applications of mass spectrometry in protein analysis.
Collapse
Affiliation(s)
- Madhuri Jayathirtha
- Biochemistry & Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY, United States
| | - Emmalyn J Dupree
- Biochemistry & Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY, United States
| | - Zaen Manzoor
- Biochemistry & Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY, United States
| | - Brianna Larose
- Biochemistry & Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY, United States
| | - Zach Sechrist
- Biochemistry & Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY, United States
| | - Anca-Narcisa Neagu
- Laboratory of Animal Histology, Faculty of Biology, "Alexandru Ioan Cuza" University of Iasi, Iasi, Romania
| | - Brindusa Alina Petre
- Laboratory of Biochemistry, Department of Chemistry, Al. I. Cuza University of Iasi, Iasi, Romania, Center for Fundamental Research and Experimental Development in Translation Medicine - TRANSCEND, Regional Institute of Oncology, Iasi, Romania
| | - Costel C Darie
- Biochemistry & Proteomics Group, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY, United States
| |
Collapse
|
6
|
Strbenac D, Zhong L, Raftery MJ, Wang P, Wilson SR, Armstrong NJ, Yang JYH. Quantitative Performance Evaluator for Proteomics (QPEP): Web-based Application for Reproducible Evaluation of Proteomics Preprocessing Methods. J Proteome Res 2017; 16:2359-2369. [DOI: 10.1021/acs.jproteome.6b00882] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Dario Strbenac
- School
of Mathematics and Statistics, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Ling Zhong
- Bioanalytical
Mass Spectrometry Facility, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Mark J. Raftery
- Bioanalytical
Mass Spectrometry Facility, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Penghao Wang
- School
of Mathematics and Statistics, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Susan R. Wilson
- School of Mathematics & Statistics, University of New South Wales, Sydney, New South Wales 2052, Australia
- Centre
for Mathematics and its Applications, Mathematical Sciences Institute, Australian National University, Canberra, Australian Capital Territory 0200, Australia
| | - Nicola J. Armstrong
- School
of Mathematics and Statistics, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Jean Y. H. Yang
- School
of Mathematics and Statistics, University of Sydney, Sydney, New South Wales 2006, Australia
| |
Collapse
|
7
|
Abstract
Liquid chromatography coupled with mass spectrometry (LC-MS) has been widely used for profiling protein expression levels. This chapter is focused on LC-MS data preprocessing, which is a crucial step in the analysis of LC-MS based proteomics. We provide a high-level overview, highlight associated challenges, and present a step-by-step example for analysis of data from LC-MS based untargeted proteomic study. Furthermore, key procedures and relevant issues with the subsequent analysis by multiple reaction monitoring (MRM) are discussed.
Collapse
Affiliation(s)
- Tsung-Heng Tsai
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, 20057, USA.
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA, 22203, USA.
| | - Minkun Wang
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, 20057, USA
- Bradley Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA, 22203, USA
| | - Habtom W Ressom
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, DC, 20057, USA
| |
Collapse
|
8
|
Pursiheimo A, Vehmas AP, Afzal S, Suomi T, Chand T, Strauss L, Poutanen M, Rokka A, Corthals GL, Elo LL. Optimization of Statistical Methods Impact on Quantitative Proteomics Data. J Proteome Res 2015; 14:4118-26. [DOI: 10.1021/acs.jproteome.5b00183] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Anna Pursiheimo
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
- Department
of Mathematics and Statistics, University of Turku, FI-20014 Turku, Finland
| | - Anni P. Vehmas
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Saira Afzal
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Tomi Suomi
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
- Department
of Information Technology, University of Turku, FI-20014 Turku, Finland
| | - Thaman Chand
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Leena Strauss
- Department
of Physiology and Turku Center for Disease Modeling, Institute of
Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520 Turku, Finland
| | - Matti Poutanen
- Department
of Physiology and Turku Center for Disease Modeling, Institute of
Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520 Turku, Finland
| | - Anne Rokka
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Garry L. Corthals
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
- Van’t
Hoff Institute for Molecular Sciences, University of Amsterdam, P.O. Box 94157, 1090 GD Amsterdam, The Netherlands
| | - Laura L. Elo
- Turku
Centre for Biotechnology, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
- Department
of Mathematics and Statistics, University of Turku, FI-20014 Turku, Finland
| |
Collapse
|
9
|
Webb-Robertson BJM, Wiberg HK, Matzke MM, Brown JN, Wang J, McDermott JE, Smith RD, Rodland KD, Metz TO, Pounds JG, Waters KM. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J Proteome Res 2015; 14:1993-2001. [PMID: 25855118 DOI: 10.1021/pr501138h] [Citation(s) in RCA: 161] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In this review, we apply selected imputation strategies to label-free liquid chromatography-mass spectrometry (LC-MS) proteomics datasets to evaluate the accuracy with respect to metrics of variance and classification. We evaluate several commonly used imputation approaches for individual merits and discuss the caveats of each approach with respect to the example LC-MS proteomics data. In general, local similarity-based approaches, such as the regularized expectation maximization and least-squares adaptive algorithms, yield the best overall performances with respect to metrics of accuracy and robustness. However, no single algorithm consistently outperforms the remaining approaches, and in some cases, performing classification without imputation sometimes yielded the most accurate classification. Thus, because of the complex mechanisms of missing data in proteomics, which also vary from peptide to protein, no individual method is a single solution for imputation. On the basis of the observations in this review, the goal for imputation in the field of computational proteomics should be to develop new approaches that work generically for this data type and new strategies to guide users in the selection of the best imputation for their dataset and analysis objectives.
Collapse
Affiliation(s)
| | - Holli K Wiberg
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Melissa M Matzke
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Joseph N Brown
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Jing Wang
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Jason E McDermott
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Richard D Smith
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Karin D Rodland
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Thomas O Metz
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Joel G Pounds
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| | - Katrina M Waters
- Pacific Northwest National Laboratory, PO BOX 999, K7-20, Richland, Washington 99352, United States
| |
Collapse
|
10
|
Manohar M, Khan H, Sirohi VK, Das V, Agarwal A, Pandey A, Siddiqui WA, Dwivedi A. Alteration in endometrial proteins during early- and mid-secretory phases of the cycle in women with unexplained infertility. PLoS One 2014; 9:e111687. [PMID: 25405865 PMCID: PMC4236019 DOI: 10.1371/journal.pone.0111687] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 10/05/2014] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Compromised receptivity of the endometrium is a major cause of unexplained infertility, implantation failure and subclinical pregnancy loss. In order to investigate the changes in endometrial protein profile as a cause of unexplained infertility, the current study was undertaken to analyze the differentially expressed proteins of endometrium from early-secretory (LH+2) to mid-secretory phase (LH+7), in women with unexplained infertility. METHODS 2-D gel electrophoresis was performed to analyze the proteomic changes between early- (n = 8) and mid-secretory (n = 8) phase endometrium of women with unexplained infertility. The differentially expressed protein spots were identified by LC-MS analysis and validated by immunoblotting and immuno-histochemical analysis in early- (n = 4) and mid-secretory (n = 4) phase endometrium of infertile women. Validated proteins were also analyzed in early- (n = 4) and mid-secretory (n = 4) phase endometrium of fertile women. RESULTS Nine proteins were found to be differentially expressed between early- and mid- secretory phases of endometrium of infertile women. The expression of Ras-related protein Rap-1b, Protein disulfide isomerase A3, Apolipoprotein-A1 (Apo-A1), Cofilin-1 and RAN GTP-binding nuclear protein (Ran) were found to be significantly increased, whereas, Tubulin polymerization promoting protein family member 3, Superoxide dismutase [Cu-Zn], Sorcin, and Proteasome subunit alpha type-5 were significantly decreased in mid- secretory phase endometrium of infertile women as compared to early-secretory phase endometrium of infertile women. Validation of 4 proteins viz. Sorcin, Cofilin-1, Apo-A1 and Ran were performed in separate endometrial biopsy samples from infertile women. The up-regulated expression of Sorcin and down-regulated expression of Cofilin-1 and Apolipoprotein-A1, were observed in mid-secretory phase as compared to early-secretory phase in case of fertile women. CONCLUSIONS De-regulation of the expression of Sorcin, Cofilin-1, Apo-A1 and Ran, during early- to mid-secretory phase may have physiological significance and it may be one of the causes for altered differentiation and/or maturation of endometrium, in women with unexplained infertility.
Collapse
Affiliation(s)
- Murli Manohar
- Division of Endocrinology, CSIR-Central Drug Research Institute, Lucknow, Uttar Pradesh, India
- Department of Biochemistry, Jamia Hamdard (Hamdard University), New Delhi, India
| | - Huma Khan
- Department of Obstetrics & Gynaecology, King George’s Medical University, Lucknow, Uttar Pradesh, India
| | - Vijay Kumar Sirohi
- Division of Endocrinology, CSIR-Central Drug Research Institute, Lucknow, Uttar Pradesh, India
| | - Vinita Das
- Department of Obstetrics & Gynaecology, King George’s Medical University, Lucknow, Uttar Pradesh, India
| | - Anjoo Agarwal
- Department of Obstetrics & Gynaecology, King George’s Medical University, Lucknow, Uttar Pradesh, India
| | - Amita Pandey
- Department of Obstetrics & Gynaecology, King George’s Medical University, Lucknow, Uttar Pradesh, India
| | | | - Anila Dwivedi
- Division of Endocrinology, CSIR-Central Drug Research Institute, Lucknow, Uttar Pradesh, India
| |
Collapse
|
11
|
Webb-Robertson BJM, Matzke MM, Metz TO, McDermott JE, Walker H, Rodland KD, Pounds JG, Waters KM. Sequential projection pursuit principal component analysis--dealing with missing data associated with new -omics technologies. Biotechniques 2013; 54:165-8. [PMID: 23477384 DOI: 10.2144/000113978] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Accepted: 01/28/2013] [Indexed: 01/22/2023] Open
Abstract
Principal Component Analysis (PCA) is a common exploratory tool used to evaluate large complex data sets. The resulting lower-dimensional representations are often valuable for pattern visualization, clustering, or classification of the data. However, PCA cannot be applied directly to many -omics data sets generated by newer technologies such as label-free mass spectrometry due to large numbers of non-random missing values. Here we present a sequential projection pursuit PCA (sppPCA) method for defining principal components in the presence of missing data. Our results demonstrate that this approach generates robust and informative low-dimensional data representations compared to commonly used imputation approaches.
Collapse
|
12
|
Genetic Programming for Biomarker Detection in Mass Spectrometry Data. LECTURE NOTES IN COMPUTER SCIENCE 2012. [DOI: 10.1007/978-3-642-35101-3_23] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|