1
|
Liu Q, Lin J, Wen L, Wang S, Zhou P, Mei L, Shang S. Systematic Modeling, Prediction, and Comparison of Domain-Peptide Affinities: Does it Work Effectively With the Peptide QSAR Methodology? Front Genet 2022; 12:800857. [PMID: 35096016 PMCID: PMC8795790 DOI: 10.3389/fgene.2021.800857] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 12/14/2021] [Indexed: 11/17/2022] Open
Abstract
The protein-protein association in cellular signaling networks (CSNs) often acts as weak, transient, and reversible domain-peptide interaction (DPI), in which a flexible peptide segment on the surface of one protein is recognized and bound by a rigid peptide-recognition domain from another. Reliable modeling and accurate prediction of DPI binding affinities would help to ascertain the diverse biological events involved in CSNs and benefit our understanding of various biological implications underlying DPIs. Traditionally, peptide quantitative structure-activity relationship (pQSAR) has been widely used to model and predict the biological activity of oligopeptides, which employs amino acid descriptors (AADs) to characterize peptide structures at sequence level and then statistically correlate the resulting descriptor vector with observed activity data via regression. However, the QSAR has not yet been widely applied to treat the direct binding behavior of large-scale peptide ligands to their protein receptors. In this work, we attempted to clarify whether the pQSAR methodology can work effectively for modeling and predicting DPI affinities in a high-throughput manner? Over twenty thousand short linear motif (SLiM)-containing peptide segments involved in SH3, PDZ and 14-3-3 domain-medicated CSNs were compiled to define a comprehensive sequence-based data set of DPI affinities, which were represented by the Boehringer light units (BLUs) derived from previous arbitrary light intensity assays following SPOT peptide synthesis. Four sophisticated MLMs (MLMs) were then utilized to perform pQSAR modeling on the set described with different AADs to systematically create a variety of linear and nonlinear predictors, and then verified by rigorous statistical test. It is revealed that the genome-wide DPI events can only be modeled qualitatively or semiquantitatively with traditional pQSAR strategy due to the intrinsic disorder of peptide conformation and the potential interplay between different peptide residues. In addition, the arbitrary BLUs used to characterize DPI affinity values were measured via an indirect approach, which may not very reliable and may involve strong noise, thus leading to a considerable bias in the modeling. The R prd 2 = 0.7 can be considered as the upper limit of external generalization ability of the pQSAR methodology working on large-scale DPI affinity data.
Collapse
Affiliation(s)
- Qian Liu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Jing Lin
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Li Wen
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Shaozhou Wang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Peng Zhou
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| | - Li Mei
- Institute of Culinary, Sichuan Tourism University, Chengdu, China
| | - Shuyong Shang
- Institute of Ecological Environment Protection, Chengdu Normal University, Chengdu, China
| |
Collapse
|
2
|
Lebanov L, Tedone L, Ghiasvand A, Paull B. Random Forests machine learning applied to gas chromatography – Mass spectrometry derived average mass spectrum data sets for classification and characterisation of essential oils. Talanta 2020; 208:120471. [DOI: 10.1016/j.talanta.2019.120471] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 10/02/2019] [Accepted: 10/12/2019] [Indexed: 01/24/2023]
|
3
|
Locus-specific Retention Predictor (LsRP): A Peptide Retention Time Predictor Developed for Precision Proteomics. Sci Rep 2017; 7:43959. [PMID: 28303880 PMCID: PMC5356008 DOI: 10.1038/srep43959] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 01/31/2017] [Indexed: 11/08/2022] Open
Abstract
The precision prediction of peptide retention time (RT) plays an increasingly important role in liquid chromatography-tandem mass spectrometry (LC-MS/MS) based proteomics. Owing to the high reproducibility of liquid chromatography, RT prediction provides promising information for both identification and quantification experiment design. In this work, we present a Locus-specific Retention Predictor (LsRP) for precise prediction of peptide RT, which is based on amino acid locus information and Support Vector Regression (SVR) algorithm. Corresponding to amino acid locus, each peptide sequence was converted to a featured locus vector consisting of zeros and ones. With locus vector information from LC-MS/MS data sets, an SVR computational process was trained and evaluated. LsRP finally provided a prediction correlation coefficient of 0.95~0.99. We compared our method with two common predictors. Results showed that LsRP outperforms these methods and tracked up to 30% extra peptides in an extraction RT window of 2 min. A new strategy by combining LsRP and calibration peptide approach was then proposed, which open up new opportunities for precision proteomics.
Collapse
|
4
|
Prediction of retention in hydrophilic interaction liquid chromatography using solute molecular descriptors based on chemical structures. J Chromatogr A 2017; 1486:59-67. [DOI: 10.1016/j.chroma.2016.12.025] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Revised: 12/07/2016] [Accepted: 12/11/2016] [Indexed: 11/23/2022]
|
5
|
Žuvela P, Macur K, Jay Liu J, Bączek T. Exploiting non-linear relationships between retention time and molecular structure of peptides originating from proteomes and comparing three multivariate approaches. J Pharm Biomed Anal 2016; 127:94-100. [DOI: 10.1016/j.jpba.2016.01.055] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2015] [Revised: 01/11/2016] [Accepted: 01/23/2016] [Indexed: 12/21/2022]
|
6
|
Examining the Spectral Separability of Prosopis glandulosa from Co-Existent Species Using Field Spectral Measurement and Guided Regularized Random Forest. REMOTE SENSING 2016. [DOI: 10.3390/rs8020144] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
Žuvela P, Liu JJ, Macur K, Bączek T. Molecular Descriptor Subset Selection in Theoretical Peptide Quantitative Structure–Retention Relationship Model Development Using Nature-Inspired Optimization Algorithms. Anal Chem 2015; 87:9876-83. [DOI: 10.1021/acs.analchem.5b02349] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Petar Žuvela
- Department
of Chemical Engineering, Pukyong National University, 365 Sinseon-ro, 608-739 Busan, Korea
| | - J. Jay Liu
- Department
of Chemical Engineering, Pukyong National University, 365 Sinseon-ro, 608-739 Busan, Korea
| | - Katarzyna Macur
- Laboratory
of Mass Spectrometry, Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk, Kładki
24, 80-822 Gdańsk, Poland
| | - Tomasz Bączek
- Department
of Pharmaceutical Chemistry, Medical University of Gdańsk, Hallera
107, 80-416 Gdańsk, Poland
| |
Collapse
|
8
|
To Determine Biologically Important Mutations in Oxytocin. Int J Pept Res Ther 2014. [DOI: 10.1007/s10989-014-9412-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
9
|
Tan J, Tian F, Lv Y, Liu W, Zhong L, Liu Y, Yang L. Integration of QSAR modelling and QM/MM analysis to investigate functional food peptides with antihypertensive activity. MOLECULAR SIMULATION 2013. [DOI: 10.1080/08927022.2013.788247] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
10
|
QSRR Study on Flavor Compounds of Diverse Structures on Different Columns with the Help of New Chemometric Methods. Chromatographia 2012. [DOI: 10.1007/s10337-012-2349-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
Perez-Riverol Y, Audain E, Millan A, Ramos Y, Sanchez A, Vizcaíno JA, Wang R, Müller M, Machado YJ, Betancourt LH, González LJ, Padrón G, Besada V. Isoelectric point optimization using peptide descriptors and support vector machines. J Proteomics 2012; 75:2269-74. [PMID: 22326964 DOI: 10.1016/j.jprot.2012.01.029] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Revised: 01/23/2012] [Accepted: 01/25/2012] [Indexed: 11/24/2022]
Abstract
IPG (Immobilized pH Gradient) based separations are frequently used as the first step in shotgun proteomics methods; it yields an increase in both the dynamic range and resolution of peptide separation prior to the LC-MS analysis. Experimental isoelectric point (pI) values can improve peptide identifications in conjunction with MS/MS information. Thus, accurate estimation of the pI value based on the amino acid sequence becomes critical to perform these kinds of experiments. Nowadays, pI is commonly predicted using the charge-state model [1], and/or the cofactor algorithm [2]. However, none of these methods is capable of calculating the pI value for basic peptides accurately. In this manuscript, we present an new approach that can significant improve the pI estimation, by using Support Vector Machines (SVM) [3], an experimental amino acid descriptor taken from the AAIndex database [4] and the isoelectric point predicted by the charge-state model. Our results have shown a strong correlation (R(2)=0.98) between the predicted and observed values, with a standard deviation of 0.32 pH units across the complete pH range.
Collapse
Affiliation(s)
- Yasset Perez-Riverol
- Department of Proteomics, Center for Genetic Engineering and Biotechnology, Ave 31 e/ 158 y 190, Cubanacán, Playa, Ciudad de la Habana, Cuba
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Tyrkkö E, Pelander A, Ojanperä I. Prediction of liquid chromatographic retention for differentiation of structural isomers. Anal Chim Acta 2012; 720:142-8. [PMID: 22365132 DOI: 10.1016/j.aca.2012.01.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2011] [Revised: 01/13/2012] [Accepted: 01/13/2012] [Indexed: 10/14/2022]
Abstract
A liquid chromatography (LC) retention time prediction software, ACD/ChromGenius, was employed to calculate retention times for structural isomers, which cannot be differentiated by accurate mass measurement techniques alone. For 486 drug compounds included in an in-house database for urine drug screening by liquid chromatography/quadrupole time-of-flight mass spectrometry (LC/Q-TOFMS), a retention time knowledge base was created with the software. ACD/ChromGenius calculated retention times for compounds based on the drawn molecular structure and given chromatographic parameters. The ability of the software for compound identification was evaluated by calculating the retention order of the 118 isomers, in 50 isomer groups of 2-5 compounds each, included in the database. ACD/ChromGenius predicted the correct elution order for 68% (34) of isomer groups. Of the 16 groups for which the isomer elution order was incorrectly calculated, two were diastereomer pairs and thus difficult to distinguish using the software. Correlation between the calculated and experimental retention times in the knowledge base tested was moderate, r(2)=0.8533. The mean and median absolute errors were 1.12 min, and 0.84 min, respectively, and the standard deviation was 1.04 min. The information generated by ACD/ChromGenius, together with other in silico methods employing accurate mass data, makes the identification of substances more reliable. This study demonstrates an approach for tentatively identifying compounds in a large target database without a need for primary reference standards.
Collapse
Affiliation(s)
- Elli Tyrkkö
- Department of Forensic Medicine, Hjelt Institute, University of Helsinki, Finland.
| | | | | |
Collapse
|
13
|
Identification of human protein complexes from local sub-graphs of protein-protein interaction network based on random forest with topological structure features. Anal Chim Acta 2012; 718:32-41. [PMID: 22305895 DOI: 10.1016/j.aca.2011.12.069] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2011] [Revised: 12/28/2011] [Accepted: 12/30/2011] [Indexed: 11/20/2022]
Abstract
In the post-genomic era, one of the most important and challenging tasks is to identify protein complexes and further elucidate its molecular mechanisms in specific biological processes. Previous computational approaches usually identify protein complexes from protein interaction network based on dense sub-graphs and incomplete priori information. Additionally, the computational approaches have little concern about the biological properties of proteins and there is no a common evaluation metric to evaluate the performance. So, it is necessary to construct novel method for identifying protein complexes and elucidating the function of protein complexes. In this study, a novel approach is proposed to identify protein complexes using random forest and topological structure. Each protein complex is represented by a graph of interactions, where descriptor of the protein primary structure is used to characterize biological properties of protein and vertex is weighted by the descriptor. The topological structure features are developed and used to characterize protein complexes. Random forest algorithm is utilized to build prediction model and identify protein complexes from local sub-graphs instead of dense sub-graphs. As a demonstration, the proposed approach is applied to protein interaction data in human, and the satisfied results are obtained with accuracy of 80.24%, sensitivity of 81.94%, specificity of 80.07%, and Matthew's correlation coefficient of 0.4087 in 10-fold cross-validation test. Some new protein complexes are identified, and analysis based on Gene Ontology shows that the complexes are likely to be true complexes and play important roles in the pathogenesis of some diseases. PCI-RFTS, a corresponding executable program for protein complexes identification, can be acquired freely on request from the authors.
Collapse
|
14
|
Characterization of the binding profile of peptide to transporter associated with antigen processing (TAP) using Gaussian process regression. Comput Biol Med 2011; 41:865-70. [DOI: 10.1016/j.compbiomed.2011.07.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Revised: 07/10/2011] [Accepted: 07/18/2011] [Indexed: 11/22/2022]
|
15
|
Zhang Y, Jin Q, Wang S, Ren R. Modeling and prediction of peptide drift times in ion mobility spectrometry using sequence-based and structure-based approaches. Comput Biol Med 2011; 41:272-7. [DOI: 10.1016/j.compbiomed.2011.03.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2010] [Revised: 12/29/2010] [Accepted: 03/02/2011] [Indexed: 10/18/2022]
|
16
|
Peng S, Jian-Wei Z, Peng Z, Lin X. QSPR modeling of bioconcentration factor of nonionic compounds using Gaussian processes and theoretical descriptors derived from electrostatic potentials on molecular surface. CHEMOSPHERE 2011; 83:1045-1052. [PMID: 21339002 DOI: 10.1016/j.chemosphere.2011.01.063] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Revised: 01/21/2011] [Accepted: 01/29/2011] [Indexed: 05/30/2023]
Abstract
In the present study, geometrical structures were constructed and optimized for 122 nonionic organic compounds at the quantum-mechanical HF/6-31G level of theory. The electrostatic potentials and subsequent structural descriptors derived from them were obtained. Gaussian process, and for comparison purpose, multiple linear regression (MLR) and support vector machine (SVM), were then employed to build the quantitative structure-bioconcentration factor relationships. Systematical validations including internal leave-one-out cross-validation, the validation for external test set, as well as a more rigorous Monte Carlo cross-validation were made to confirm the reliability of the constructed models. It has been found that the quantities derived from electrostatic potential, V(min) and ∑V(s,ind)(-), together with the molecular volume (V(mc)), dipole moment (μ) and the energy level of highest occupied molecular orbital (E(HOMO)) can be well used to express the quantitative structure-property relationship of this sample set. Both linear and nonlinear models can give satisfactory results, and the GP, which be capable of handing with linear and nonlinear-hybrid relationship through a mixed covariance function, appears to have better fitting and predictive abilities than other two statistical methods. The coefficient of determination r(pred)(2) and root mean square error of prediction (RMSEP) for the external test set are 0.953 and 0.337, respectively.
Collapse
Affiliation(s)
- Sang Peng
- Department of Chemistry, Zhejiang University, Hangzhou 310027, China
| | | | | | | |
Collapse
|
17
|
He P, Wu W, Yang K, Jing T, Liao KL, Zhang W, Wang HD, Hua X. Exploring the activity space of peptides binding to diverse SH3 domains using principal property descriptors derived from amino acid rotamers. Biopolymers 2011; 96:288-301. [DOI: 10.1002/bip.21531] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
18
|
Tian F, Zhang C, Fan X, Yang X, Wang X, Liang H. Predicting the Flexibility Profile of Ribosomal RNAs. Mol Inform 2010; 29:707-15. [PMID: 27464014 DOI: 10.1002/minf.201000092] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 09/28/2010] [Indexed: 11/06/2022]
Abstract
Flexibility in biomolecules is an important determinant of biological functionality, which can be measured quantitatively by atomic Debye-Waller factor or B-factor. Although numerous works have been addressed on theoretical and computational studies of the B-factor profiles of proteins, the methods used for predicting B-factor values of nucleic acids, especially the complicated ribosomal RNAs (rRNAs), which are very functionally similar to proteins in providing matrix structures and in catalyzing biochemical reactions, still remain unexploited. In this article, we present a quantitative structure-flexibility relationship (QSFR) study with the aim at the quantitative prediction of rRNA B-factor based on primary sequences (sequence-based) and advanced structures (structure-based) by using both linear and nonlinear machine learning approaches, including partial least squares regression (PLS), least squares support vector machine (LSSVM), and Gaussian process (GP). By rigorously examining the performance and reliability of constructed statistical models and by comparing our models in detail to those developed previously for protein B-factors, we demonstrate that (i) rRNA B-factors could be predicted at a similar level of accuracy with that of protein, (ii) a structure-based approach performed much better as compared to sequence-based methods in modeling of rRNA B-factors, and (iii) rRNA flexibility is primarily governed by the local features of nonbonding potential landscapes, such as electrostatic and van der Waals forces.
Collapse
Affiliation(s)
- Feifei Tian
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404.,College of Bioengineering, Chongqing University, Chongqing 400044, China
| | - Chun Zhang
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404
| | - Xia Fan
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404
| | - Xue Yang
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404
| | - Xi Wang
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404
| | - Huaping Liang
- State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404.
| |
Collapse
|
19
|
Ren Y, Chen X, Li X, Lai H, Wang Q, Zhou P, Chen G. Quantitative prediction of the thermal motion and intrinsic disorder of protein cofactors in crystalline state: A case study on halide anions. J Theor Biol 2010; 266:291-8. [DOI: 10.1016/j.jtbi.2010.06.038] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2010] [Revised: 06/08/2010] [Accepted: 06/25/2010] [Indexed: 10/19/2022]
|
20
|
Babushok VI, Zenkevich IG. Retention Characteristics of Peptides in RP-LC: Peptide Retention Prediction. Chromatographia 2010. [DOI: 10.1365/s10337-010-1721-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
21
|
Pan Y, Lv F, Tian F, Luo X, Kong X, Li Y, Yang Q. Prediction of Water′s Mobility and Disorder in Protein Crystals Using Novel Local Hydrophobic Descriptors. Mol Inform 2010; 29:195-201. [DOI: 10.1002/minf.200900058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Accepted: 12/29/2009] [Indexed: 12/11/2022]
|
22
|
Watkins PJ, Clifford D, Rose G, Allen D, Warner RD, Dunshea FR, Pethick DW. Sheep category can be classified using machine learning techniques applied to fatty acid profiles derivatised as trimethylsilyl esters. ANIMAL PRODUCTION SCIENCE 2010. [DOI: 10.1071/an10034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Eruption of permanent incisors (dentition) is used as a proxy for age for defining meat quality in Australian sheep meat. However, this approach may not be reliable. While not presently available, an objective method could be used to determine sheep age, and thus sheep category, which would then potentially remove any inaccuracies that may occur in classifying sheep meat product. Statistical classification algorithms have been successfully used in bioinformatics. In this paper we review the performance of three algorithms (support vector machines, recursive partitioning and random forests) for determining sheep age. The algorithms were applied to the measured fatty acid profiles of fat samples from 533 carcasses; 254 lamb (<1 year old), 131 hogget (~1–2 years old) and 148 mutton (>2 years old) samples. Three data pretreatments (range transformation, column mean centering and range transformation with mean centering) were also examined to determine their impact on the performance of the algorithms. The random forests algorithm, when applied to mean-centred data, gave 100% predictive accuracy when classifying sheep category. This approach could be used for the development of an objective test for determining sheep age and category.
Collapse
|