1
|
Ali H, Muthudoss P, Ramalingam M, Kanakaraj L, Paudel A, Ramasamy G. Machine Learning-Enabled NIR Spectroscopy. Part 2: Workflow for Selecting a Subset of Samples from Publicly Accessible Data. AAPS PharmSciTech 2023; 24:34. [PMID: 36627410 DOI: 10.1208/s12249-022-02493-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 12/14/2022] [Indexed: 01/11/2023] Open
Abstract
An increasingly large dataset of pharmaceutics disciplines is frequently challenging to comprehend. Since machine learning needs high-quality data sets, the open-source dataset can be a place to start. This work presents a systematic method to choose representative subsamples from the existing research, along with an extensive set of quality measures and a visualization strategy. The preceding article (Muthudoss et al.. in AAPS PharmSciTech 23, 2022) describes a workflow for leveraging near infrared (NIR) spectroscopy to obtain reliable and robust data on pharmaceutical samples. This study describes the systematic and structured procedure for selecting subsamples from the historical data. We offer a wide range of in-depth quality measures, diagnostic tools, and visualization techniques. A real-world, well-researched NIR dataset was employed to demonstrate this approach. This open-source tablet dataset ( http://www.models.life.ku.dk/Tablets ) consists of different doses in milligrams, different shapes, and sizes of dosage forms, slots in tablets, three different manufacturing scales (lab, pilot, production), coating differences (coated vs uncoated), etc. This sample is appropriate; that is, the model was developed on one scale (in this research, the lab scale), and it can be great to investigate how well the top models are transferable when tested on new data like pilot-scale or production (full) scale. A literature review indicated that the PLS regression models outperform artificial neural network-multilayer perceptron (ANN-MLP). This work demonstrates the selection of appropriate hyperparameters and their impact on ANN-MLP model performance. The hyperparameter tuning approaches and performance with available references are discussed for the data under investigation. Model extension from lab-scale to pilot-scale/production scale is demonstrated. HIGHLIGHTS: • We present a comprehensive quality metrics and visualization strategy in selecting subsamples from the existing studies • A comprehensive assessment and workflow are demonstrated using historical real-world near-infrared (NIR) data sets • Selection of appropriate hyperparameters and their impact on artificial neural network-multilayer perceptron (ANN-MLP) model performance • The choice of hyperparameter tuning approaches and performance with available references are discussed for the data under investigation • Model extension from lab-scale to pilot-scale successfully demonstrated.
Collapse
Affiliation(s)
- Hussain Ali
- Christ (Deemed to Be University), Bangalore, 560029, Karnataka, India
| | - Prakash Muthudoss
- A2Z4.0 Research and Analytics Private Limited, Chennai, 600062, Tamilnadu, India
| | - Manikandan Ramalingam
- Chettinad School of Pharmaceutical Sciences, Chettinad Academy of Research and Education, Chettinad Health City, 603103, Chennai, Tamilnadu, India
| | - Lakshmi Kanakaraj
- Chettinad School of Pharmaceutical Sciences, Chettinad Academy of Research and Education, Chettinad Health City, 603103, Chennai, Tamilnadu, India
| | - Amrit Paudel
- Research Center Pharmaceutical Engineering GmbH (RCPE), Inffeldgasse 13, 8010, Graz, Austria. .,Institute of Process and Particle Engineering, Graz University of Technology, Inffeldgasse 13/3, 8010, Graz, Austria.
| | - Gobi Ramasamy
- Christ (Deemed to Be University), Bangalore, 560029, Karnataka, India.
| |
Collapse
|
2
|
Quantitative Microscopy: Particle Size/Shape Characterization, Addressing Common Errors Using 'Analytics Continuum' Approach. J Pharm Sci 2020; 110:833-849. [PMID: 32971124 DOI: 10.1016/j.xphs.2020.09.022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 08/25/2020] [Accepted: 09/16/2020] [Indexed: 11/23/2022]
Abstract
Particle size/shape characterization of active pharmaceutical ingredient (API) is integral to successful product development. It is more of a correlative property than a decision-making measure. Though microscopy is the only technique that provides a direct measure of particle properties, it is neglected for reasons like non-repeatability and non-reproducibility which is often attributed to a) fundamental error, b) segregation error, c) human error, d) sample randomness, e) sample representativeness etc. Using the "Sucrose" as model sample, we propose "analytics continuum" approach that integrates optical microscope PSD measurements complimented by NIR spectroscopy-based trending analysis as a prescreening tool to demonstrate sample randomness and representativeness. Furthermore, plethora of statistical tests are utilized to infer population statistics. Subsequently, an attribute-based control chart and bootstrap-based confidence interval was developed to monitor product performance. A flowchart to serve as an elementary guideline is developed, which is then extended to handle more complex situations involving API crystallized from two different solvent systems. The results show that the developed methodology can be utilized as a quantitative procedure to assess the suitability of API/excipients from different batches or from alternate vendors and can significantly help in understanding the differences between material even on a minor scale.
Collapse
|
3
|
Martínez-Valdivieso D, Font R, Del Río-Celestino M. Prediction of Agro-Morphological and Nutritional Traits in Ethiopian Mustard Leaves ( Brassica Carinata A. Braun) by Visible-Near-Infrared Spectroscopy. Foods 2018; 8:E6. [PMID: 30583550 PMCID: PMC6352040 DOI: 10.3390/foods8010006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 12/16/2018] [Accepted: 12/20/2018] [Indexed: 12/05/2022] Open
Abstract
The particular characteristics of some of the Ethiopian mustard accessions available from seed banks could be used to increase the production and the diversity of products available to consumers and to improve their general quality. The objectives of this study were to determine the genetic variability for agro-morphological (days to first flowering: DFF and leaf pubescence: LP) and nutritional traits (total phenolic content: TPC) among accessions, and to evaluate the potential of near-infrared spectroscopy (NIRS) to predict these traits in Ethiopian mustard leaves. A great variation was found for the traits evaluated. The reference values were regressed against different spectral transformations by modified partial least-squares (MPLS) regression. The coefficients of determination in cross-validation (R²cv) shown by the equations for DFF, LP and TPC were 0.95, 0.63 and 0.99, respectively. The standard deviation to standard error of cross-validation ratio (RPD), were for these traits, as follows: DFF: 4.52, LP: 1.53 and, TPC: 24.50. These results show that the equations developed for DFF and TPC in Ethiopian mustard, can be predicted with sufficient accuracy for screening purposes and quality control, respectively. In addition, the LP equation can be used to identify those samples with "low", "medium" and "high" groups. From the study of the mean and deviation standard spectra, and regression vectors of MPLS models it can be concluded that some major cell components, highly participated in modelling the equations for these traits.
Collapse
Affiliation(s)
- Damián Martínez-Valdivieso
- Department of Genomics and Biotecnology, IFAPA Center La Mojonera, Camino San Nicolás 1, La Mojonera, 04745 Almería, Spain.
| | - Rafael Font
- Department of Food Science and Health, IFAPA Center La Mojonera, Camino San Nicolás 1, La Mojonera, 04745 Almería, Spain.
| | - Mercedes Del Río-Celestino
- Department of Genomics and Biotecnology, IFAPA Center La Mojonera, Camino San Nicolás 1, La Mojonera, 04745 Almería, Spain.
| |
Collapse
|
4
|
Hassan H, Amiruddin MD, Weckwerth W, Ramli US. Deciphering key proteins of oil palm (Elaeis guineensis
Jacq.) fruit mesocarp development by proteomics and chemometrics. Electrophoresis 2018; 40:254-265. [DOI: 10.1002/elps.201800232] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 10/22/2018] [Accepted: 10/25/2018] [Indexed: 12/18/2022]
Affiliation(s)
- Hasliza Hassan
- Advanced Biotechnology and Breeding Centre (ABBC); Malaysian Palm Oil Board (MPOB); Selangor Malaysia
| | - Mohd Din Amiruddin
- Advanced Biotechnology and Breeding Centre (ABBC); Malaysian Palm Oil Board (MPOB); Selangor Malaysia
| | - Wolfram Weckwerth
- Department of Ecogenomics and Systems Biology; Faculty of Life Sciences; University of Vienna; Vienna Austria
- Vienna Metabolomics Center (VIME); University of Vienna; Vienna Austria
| | - Umi Salamah Ramli
- Advanced Biotechnology and Breeding Centre (ABBC); Malaysian Palm Oil Board (MPOB); Selangor Malaysia
| |
Collapse
|
5
|
de Mello CS, Van Dijk JP, Voorhuijzen M, Kok EJ, Arisi ACM. Tuber proteome comparison of five potato varieties by principal component analysis. JOURNAL OF THE SCIENCE OF FOOD AND AGRICULTURE 2016; 96:3928-3936. [PMID: 26799786 DOI: 10.1002/jsfa.7635] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 01/08/2016] [Accepted: 01/11/2016] [Indexed: 06/05/2023]
Abstract
BACKGROUND Data analysis of omics data should be performed by multivariate analysis such as principal component analysis (PCA). The way data are clustered in PCA is of major importance to develop some classification systems based on multivariate analysis, such as soft independent modeling of class analogy (SIMCA). In a previous study a one-class classifier based on SIMCA was built using microarray data from a set of potatoes. The PCA grouped the transcriptomic data according to varieties. The present work aimed to use PCA to verify the clustering of the proteomic profiles for the same potato varieties. RESULTS Proteomic profiles of five potato varieties (Biogold, Fontane, Innovator, Lady Rosetta and Maris Piper) were evaluated by two-dimensional gel electrophoresis (2-DE) performed on two immobilized pH gradient (IPG) strip lengths, 13 and 24 cm, both under pH range 4-7. For each strip length, two gels were prepared from each variety; in total there were ten gels per analysis. For 13 cm strips, 199-320 spots were detected per gel, and for 24 cm strips, 365-684 spots. CONCLUSION All four PCAs performed with these datasets presented clear grouping of samples according to the varieties. The data presented here showed that PCA was applicable for proteomic analysis of potato and was able to separate the samples by varieties. © 2016 Society of Chemical Industry.
Collapse
Affiliation(s)
- Carla Souza de Mello
- Food Science and Technology Department, Federal University of Santa Catarina, Rod. Admar Gonzaga 1346, 88034-001, Florianópolis, SC, Brazil
| | - Jeroen P Van Dijk
- RIKILT, Wageningen University and Research Centre, PO Box 230, NL-6700, AE, Wageningen, The Netherlands
| | - Marleen Voorhuijzen
- RIKILT, Wageningen University and Research Centre, PO Box 230, NL-6700, AE, Wageningen, The Netherlands
| | - Esther J Kok
- RIKILT, Wageningen University and Research Centre, PO Box 230, NL-6700, AE, Wageningen, The Netherlands
| | - Ana Carolina Maisonnave Arisi
- Food Science and Technology Department, Federal University of Santa Catarina, Rod. Admar Gonzaga 1346, 88034-001, Florianópolis, SC, Brazil
| |
Collapse
|
6
|
Balsamo GM, de Mello CS, Arisi ACM. Proteome Comparison of Grains from Two Maize Genotypes, with Colorless Kernel Pericarp (P1-ww) and Red Kernel Pericarp (P1-rr). FOOD BIOTECHNOL 2016. [DOI: 10.1080/08905436.2016.1166382] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
7
|
Renaut J, Leclercq C, Planchon S. 2DE analysis of forest tree proteins using fluorescent labels and multiplexing. Methods Mol Biol 2014; 1072:141-154. [PMID: 24136520 DOI: 10.1007/978-1-62703-631-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Although proteomists working with gel-free methods are considering the gels as coming from the past, proteomics based on gels has still a lot of opportunities to offer and acquisition of images on which thousands of spots may be resolved is still largely performed. Nowadays, two-dimensional electrophoresis remains a powerful tool to explore the plant proteome and to unravel changes in protein abundance between samples. Some weak points can be pointed out, as for any method, as for example the lack of reproducibility, or the detection of low-abundance proteins. The use of the technique called "difference gel electrophoresis" or "DIGE" can help to overcome or at least to reduce these inconveniences. DIGE requires the labelling of proteins by fluorochromes prior to their separation on 2DE gels. This technique may be applied to a wide array of plant stress studies, among others to trees. Accurate quantitative results can then be obtained and proteins presenting an interest in the studied stress are subsequently subjected to an enzymatic digestion (usually with trypsin) and identified using electrospray ionization, matrix-assisted laser desorption/ionization-time-of-flight-MS, and/or tandem MS.
Collapse
Affiliation(s)
- Jenny Renaut
- Department of Environment and Agrobiotechnologies (EVA), Proteomics Platform, Centre de Recherche Public-Gabriel Lippmann, Belvaux, Luxembourg
| | | | | |
Collapse
|
8
|
Cozzolino D. Benefits and Limitations of Infrared Technologies in Omics Research and Development of Natural Drugs and Pharmaceutical Products. Drug Dev Res 2012. [DOI: 10.1002/ddr.21043] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Daniel Cozzolino
- School of Agriculture, Food and Wine; The University of Adelaide; Waite Campus; PMB 1; Glen Osmond; SA; 5064; Australia
| |
Collapse
|
9
|
|
10
|
Faergestad EM, Rye MB, Nhek S, Hollung K, Grove H. The use of chemometrics to analyse protein patterns from gel electrophoresis. ACTA CHROMATOGR 2011. [DOI: 10.1556/achrom.23.2011.1.1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
Tamaki Y, Mazza G. Rapid determination of lignin content of straw using fourier transform mid-infrared spectroscopy. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2011; 59:504-12. [PMID: 21175187 DOI: 10.1021/jf1036678] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
To determine lignin content in triticale and wheat straws, calibration models were built using Fourier transform mid-infrared spectroscopy combined with partial least-squares regression. The best model for triticale and wheat straws was built using averaged spectra with raw spectrum in spectrum format and constant in path length as spectral pretreatments. The values of r(2), root-mean-square error of prediction (RMSEP), and residual predictive deviation (RPD) for the triticale straw model were 0.935, 0.305, and 3.89, respectively. The r(2), RMSEP, and RPD values for the wheat straw model were 0.985, 0.163, and 8.50, respectively. Both models showed good predictive ability. A model built using both triticale and wheat straws indicated that the values of r(2), RMSEP, and RPD were 0.952, 0.27, and 4.63, respectively. This model also showed good predictive ability and could predict lignin contents in triticale and wheat straws with the same high accuracy.
Collapse
Affiliation(s)
- Yukihiro Tamaki
- Pacific Agri-Food Research Centre, Agriculture and Agri-Food Canada, 4200 Highway 97, Summerland, BC, Canada V0H 1Z0
| | | |
Collapse
|
12
|
Śmiechowska A, Bartoszek A, Namieśnik J. Determination of Glucosinolates and Their Decomposition Products—Indoles and Isothiocyanates in Cruciferous Vegetables. Crit Rev Anal Chem 2010. [DOI: 10.1080/10408347.2010.490489] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
13
|
Proteomics of extreme freezing tolerance in Siberian spruce (Picea obovata). J Proteomics 2010; 73:965-75. [DOI: 10.1016/j.jprot.2009.12.010] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2009] [Revised: 12/15/2009] [Accepted: 12/18/2009] [Indexed: 11/22/2022]
|
14
|
The influence of Fusarium infection and growing location on the quantitative protein composition of (part I) emmer (Triticum dicoccum). Eur Food Res Technol 2010. [DOI: 10.1007/s00217-010-1229-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
15
|
Kristiansen LC, Jacobsen S, Jessen F, Jørgensen BM. Using a cross-model loadings plot to identify protein spots causing 2-DE gels to become outliers in PCA. Proteomics 2010; 10:1721-3. [DOI: 10.1002/pmic.200900318] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
16
|
Renaut J. Difference gel electrophoresis as a tool to discover stress-regulated proteins. Methods Mol Biol 2010; 639:207-218. [PMID: 20387048 DOI: 10.1007/978-1-60761-702-0_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Two-dimensional electrophoresis is a powerful tool to explore the plant proteome and to unravel changes in protein expression between samples. However, the acquisition of images on which thousands of spots may be resolved has some weak points, as always pointed out by scientists working with gel-free techniques, such as the lack of reproducibility. Nowadays, this inconvenience can be bypassed by the use of a technique known as "difference gel electrophoresis" or DIGE. This technique requires the labelling of proteins by fluorochromes before their separation on 2DE gels. This technique may be applied to a wide array of plant stress studies. Providing accurate quantitative results, differentially abundant spots are usually subjected to tryptic digestion and identified using electrospray ionization, matrix-assisted laser desorption/ionization-time of flight-MS and/or tandem MS.
Collapse
Affiliation(s)
- Jenny Renaut
- Department of Environment and Agrobiotechnologies (EVA), Proteomics Platform, Centre de Recherche Public - Gabriel Lippmann, Belvaux, Luxembourg
| |
Collapse
|
17
|
Identification of oat (Avena sativa) and buckwheat (Fagopyrum esculentum) proteins and their prolamin fractions using two-dimensional polyacrylamide gel electrophoresis. Eur Food Res Technol 2009. [DOI: 10.1007/s00217-009-1143-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
18
|
Wedholm A, Møller H, Stensballe A, Lindmark-Månsson H, Karlsson A, Andersson R, Andrén A, Larsen L. Effect of Minor Milk Proteins in Chymosin Separated Whey and Casein Fractions on Cheese Yield as Determined by Proteomics and Multivariate Data Analysis. J Dairy Sci 2008; 91:3787-97. [DOI: 10.3168/jds.2008-1022] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
Aït Kaddour A, Barron C, Robert P, Cuq B. Physico-chemical description of bread dough mixing using two-dimensional near-infrared correlation spectroscopy and moving-window two-dimensional correlation spectroscopy. J Cereal Sci 2008. [DOI: 10.1016/j.jcs.2007.07.008] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
20
|
Darewicz M, Dziuba J, Minkiewicz P. Celiac Disease—Background, Molecular, Bioinformatics and Analytical Aspects. FOOD REVIEWS INTERNATIONAL 2008. [DOI: 10.1080/87559120802089258] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
21
|
Abstract
After separation through two-dimensional gel electrophoresis (2DE), several hundreds of individual protein abundances can be quantified in a cell population or sample tissue. Both a good experimental setup and a valid statistical approach are essential to get insight into the data and to draw correct conclusions. High-throughput 2DE proteomics yield complex and large datasets with a huge disproportion between the hundreds of variables and the restricted number of replicates. However, the most commonly used statistical tests have been designed to cope with a high number of replicates and a restricted number of variables. There is some inconsistency in the proteomics community related to the use of statistics. Two approaches of data analysis can be distinguished: exploratory data analysis and confirmatory data analysis. Currently, most proteomic data are analyzed with the emphasis on confirmatory analysis and do not take into account the exploratory data analysis. This chapter gives an overview of the typical statistical exploratory and confirmatory tools available and suggests case-specific guidelines for a reliable statistical approach that can be used for 2DE analysis. Examples are given for an experimental setup based on classical staining methods as well as for the more advanced difference gel electrophoresis.
Collapse
|
22
|
Jensen KN, Jessen F, Jørgensen BM. Multivariate data analysis of two-dimensional gel electrophoresis protein patterns from few samples. J Proteome Res 2008; 7:1288-96. [PMID: 18237110 DOI: 10.1021/pr700800s] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
One application of 2D gel electrophoresis is to reveal differences in protein pattern between two or more groups of individuals, attributable to their group membership. Multivariate data analytical methods are useful in pinpointing the spots relevant for discrimination by focusing not only on single spot differences, but on the covariance structure between proteins. However, their outcome is dependent on data scaling, and they may fail in producing valid multivariate models due to the much higher number of "irrelevant" spots present in the gels. The case where only few gels are available and where the aim is to find as many as possible of the group-dependent proteins seems particularly difficult to handle. The present paper investigates such a case regarding the effect of scaling and of prefiltering by univariate nonparametric statistics on the selection of spots. Besides, a modified 'autoscaling' of the full data set based on within-group standard deviations is introduced and shown to be advantageous in revealing potential group-dependent proteins additional to those found by prefiltering.
Collapse
Affiliation(s)
- Kristina Nedenskov Jensen
- Danish Institute for Fisheries Research, Department of Seafood Research, Technical University of Denmark, Lyngby, Denmark
| | | | | |
Collapse
|
23
|
Jacobsen S, Grove H, Jensen KN, Sørensen HA, Jessen F, Hollung K, Uhlen AK, Jørgensen BM, Faergestad EM, Søndergaard I. Multivariate analysis of 2-DE protein patterns--practical approaches. Electrophoresis 2007; 28:1289-99. [PMID: 17351893 DOI: 10.1002/elps.200600414] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Practical approaches to the use of multivariate data analysis of 2-DE protein patterns are demonstrated by three independent strategies for the image analysis and the multivariate analysis on the same set of 2-DE data. Four wheat varieties were selected on the basis of their baking quality. Two of the varieties were of strong baking quality and hard wheat kernel and two were of weak baking quality and soft kernel. Gliadins at different stages of grain development were analyzed by the application of multivariate data analysis on images of 2-DEs. Patterns related to the wheat varieties, harvest times and quality were detected on images of 2-DE protein patterns for all the three strategies. The use of the multivariate methods was evaluated in the alignment and matching procedures of 2-DE gels. All the three strategies were able to discriminate the samples according to quality, harvest time and variety, although different subsets of protein spots were selected. The explorative approach of using multivariate data analysis and variable selection in the analyses of 2-DEs seems to be promising as a fast, reliable and convenient way of screening and transforming many gel images into spot quantities.
Collapse
Affiliation(s)
- Susanne Jacobsen
- BioCentrum-DTU, Technical University of Denmark, KGs. Lyngby, Denmark.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Grove H, Hollung K, Uhlen AK, Martens H, Faergestad EM. Challenges Related to Analysis of Protein Spot Volumes from Two-Dimensional Gel Electrophoresis As Revealed by Replicate Gels. J Proteome Res 2006; 5:3399-410. [PMID: 17137341 DOI: 10.1021/pr0603250] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Assumptions that need to be considered prior to statistical analysis of protein spot volumes from two-dimensional gel electrophoresis (2-DE) data are studied using replicate gels of the same sample. The most important observation is that the data tables of protein spot volumes from 2-DE images contain a large number of missing values, which are not consistent with the presence or absence of the proteins. This implies both loss of information and problems for the subsequent statistical analysis. Challenges with 2-DE protein spot volumes are viewed in light of multiple gel comparisons and multivariate data analysis.
Collapse
Affiliation(s)
- Harald Grove
- Matforsk AS, Norwegian Food Research Institute, Osloveien 1, N-1430 As, Norway
| | | | | | | | | |
Collapse
|
25
|
Tuomainen MH, Nunan N, Lehesranta SJ, Tervahauta AI, Hassinen VH, Schat H, Koistinen KM, Auriola S, McNicol J, Kärenlampi SO. Multivariate analysis of protein profiles of metal hyperaccumulatorThlaspi caerulescens accessions. Proteomics 2006; 6:3696-706. [PMID: 16691554 DOI: 10.1002/pmic.200501357] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Thlaspi caerulescens is increasingly acknowledged as one of the best models for studying metal hyperaccumulation in plants. In order to study the mechanisms underlying metal hyperaccumulation, we used proteomic profiling to identify differences in protein intensities among three T. caerulescens accessions with pronounced differences in tolerance, uptake and root to shoot translocation of Zn and Cd. Proteins were separated using two-dimensional electrophoresis and stained with SYPRO Orange. Intensity values and quality scores were obtained for each spot by using PDQuest software. Principal component analysis was used to test the separation of the protein profiles of the three plant accessions at various metal exposures, and to detect groups of proteins responsible for the differences. Spot sets representing individual proteins were analysed with the analysis of variance and non-parametric Kruskal-Wallis test. Clearest differences were seen among the Thlaspi accessions, while the effects of metal exposures were less pronounced. The 48 tentatively identified spots represent core metabolic functions (e.g. photosynthesis, nitrogen assimilation, carbohydrate metabolism) as well as putative signalling and regulatory functions. The possible roles of some of the proteins in heavy metal accumulation and tolerance are discussed.
Collapse
|
26
|
Glinski M, Weckwerth W. The role of mass spectrometry in plant systems biology. MASS SPECTROMETRY REVIEWS 2006; 25:173-214. [PMID: 16284938 DOI: 10.1002/mas.20063] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Large-scale analyses of proteins and metabolites are intimately bound to advancements in MS technologies. The aim of these non-targeted "omic" technologies is to extend our understanding beyond the analysis of only parts of the system. Here, metabolomics and proteomics emerged in parallel with the development of novel mass analyzers and hyphenated techniques such as gas chromatography coupled to time-of-flight mass spectrometry (GC-TOF-MS) and multidimensional liquid chromatography coupled to mass spectrometry (LC-MS). The analysis of (i) proteins (ii) phosphoproteins, and (iii) metabolites is discussed in the context of plant physiology and environment and with a focus on novel method developments. Recently published studies measuring dynamic (quantitative) behavior at these levels are summarized; for these works, the completely sequenced plants Arabidopsis thaliana and Oryza sativa (rice) have been the primary models of choice. Particular emphasis is given to key physiological processes such as metabolism, development, stress, and defense. Moreover, attempts to combine spatial, tissue-specific resolution with systematic profiling are described. Finally, we summarize the initial steps to characterize the molecular plant phenotype as a corollary of environment and genotype.
Collapse
Affiliation(s)
- Mirko Glinski
- Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany
| | | |
Collapse
|
27
|
Samson RA, Hong SB, Frisvad JC. Old and new concepts of species differentiation inAspergillus. Med Mycol 2006; 44:S133-S148. [DOI: 10.1080/13693780600913224] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
28
|
Lehesranta SJ, Davies HV, Shepherd LVT, Nunan N, McNicol JW, Auriola S, Koistinen KM, Suomalainen S, Kokko HI, Kärenlampi SO. Comparison of tuber proteomes of potato varieties, landraces, and genetically modified lines. PLANT PHYSIOLOGY 2005; 138:1690-9. [PMID: 15951487 PMCID: PMC1176438 DOI: 10.1104/pp.105.060152] [Citation(s) in RCA: 136] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2005] [Revised: 03/21/2005] [Accepted: 03/21/2005] [Indexed: 05/02/2023]
Abstract
Crop improvement by genetic modification remains controversial, one of the major issues being the potential for unintended effects. Comparative safety assessment includes targeted analysis of key nutrients and antinutritional factors, but broader scale-profiling or "omics" methods could increase the chances of detecting unintended effects. Comparative assessment should consider the extent of natural variation and not simply compare genetically modified (GM) lines and parental controls. In this study, potato (Solanum tuberosum) proteome diversity has been assessed using a range of diverse non-GM germplasm. In addition, a selection of GM potato lines was compared to assess the potential for unintended differences in protein profiles. Clear qualitative and quantitative differences were found in the protein patterns of the varieties and landraces examined, with 1,077 of 1,111 protein spots analyzed showing statistically significant differences. The diploid species Solanum phureja could be clearly differentiated from tetraploid (Solanum tuberosum) genotypes. Many of the proteins apparently contributing to genotype differentiation are involved in disease and defense responses, the glycolytic pathway, and sugar metabolism or protein targeting/storage. Only nine proteins out of 730 showed significant differences between GM lines and their controls. There was much less variation between GM lines and their non-GM controls compared with that found between different varieties and landraces. A number of proteins were identified by mass spectrometry and added to a potato tuber two-dimensional protein map.
Collapse
Affiliation(s)
- Satu J Lehesranta
- Institute of Applied Biotechnology , University of Kuopio, FIN-70211 Kuopio, Finland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Font R, del Río-Celestino M, Cartea E, de Haro-Bailón A. Quantification of glucosinolates in leaves of leaf rape (Brassica napus ssp. pabularia) by near-infrared spectroscopy. PHYTOCHEMISTRY 2005; 66:175-85. [PMID: 15652574 DOI: 10.1016/j.phytochem.2004.11.011] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2004] [Revised: 11/12/2004] [Indexed: 05/18/2023]
Abstract
The potential of near-infrared spectroscopy (NIRS) for screening the total glucosinolate (t-GSL) content, and also, the aliphatic glucosinolates gluconapin (GNA), glucobrassicanapin (GBN), progoitrin (PRO), glucoalyssin (GAL), and the indole glucosinolate glucobrassicin (GBS) in the leaf rape (Brassica napus L. ssp. pabularia DC), was assessed. This crop is grown for edible leaves for both fodder and human consumption. In Galicia (northwestern Spain) it is highly appreciated for human nutrition and have the common name of "nabicol". A collection of 36 local populations of nabicol was analysed by NIRS for glucosinolate composition. The reference values for glucosinolates, as they were obtained by high performance liquid chromatography on the leaf samples, were regressed against different spectral transformations by modified partial least-squares (MPLS) regression. The coefficients of determination in cross-validation (r2) shown by the equations for t-GSL, GNA, GBN, PRO, GAL and GBS were, respectively, 0.88, 0.73, 0.81, 0.78, 0.37 and 0.41. The standard deviation to standard error of cross-validation ratio, were for these constituents, as follows: t-GSL, 2.96; GNA, 1.94; GBN, 2.31; PRO, 2.11; GAL, 1.27, and GBS, 1.29. These results show that the equations developed for total glucosinolates, as well as those for gluconapin, glucobrassicanapin and progoitrin, can be used for screening these compounds in the leaves of this species. In addition, the glucoalyssin and glucobrassicin equations obtained, can be used to identify those samples with low and high contents. From the study of the MPLS loadings of the first three terms of the different equations, it can be concluded that some major cell components as protein and cellulose, highly participated in modelling the equations for glucosinolates.
Collapse
Affiliation(s)
- Rafael Font
- Department of Agronomy and Plant Breeding, Institute of Sustainable Agriculture, (CSIC), Alameda del Obispo s/n, 14080 Córdoba, Spain.
| | | | | | | |
Collapse
|
30
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2005. [PMCID: PMC2448604 DOI: 10.1002/cfg.419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|