1
|
Protein Markers for the Identification of Cork Oak Plants Infected with Phytophthora cinnamomi by Applying an (α, β)-k-Feature Set Approach. FORESTS 2022. [DOI: 10.3390/f13060940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Cork oak decline in Mediterranean forests is a complex phenomenon, observed with remarkable frequency in the southern part of the Iberian Peninsula, causing the weakening and death of these woody plants. The defoliation of the canopy, the presence of dry peripheral branches, and exudations on the trunk are visible symptoms used for the prognosis of decline, complemented by the presence of Phytophthora cinnamomi identified in the rhizosphere of the trees and adjacent soils. Recently, a large proteomic dataset obtained from the leaves of cork oak plants inoculated and non-inoculated with P. cinnamomi has become available. We explored it to search for an optimal set of proteins, markers of the biological pattern of interaction with the oomycete. Thus, using published data from the cork oak leaf proteome, we mathematically modelled the problem as an α, β-k-Feature Set Problem to select molecular markers. A set of proteins (features) that represent dominant effects on the host metabolism resulting from pathogen action on roots was found. These results contribute to an early diagnosis of biochemical changes occurring in cork oak associated with P. cinnamomi infection. We hypothesize that these markers may be decisive in identifying trees that go into decline due to interactions with the pathogen, assisting the management of cork oak forest ecosystems.
Collapse
|
2
|
Mathieson L, Mendes A, Marsden J, Pond J, Moscato P. Computer-Aided Breast Cancer Diagnosis with Optimal Feature Sets: Reduction Rules and Optimization Techniques. Methods Mol Biol 2017; 1526:299-325. [PMID: 27896749 DOI: 10.1007/978-1-4939-6613-4_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This chapter introduces a new method for knowledge extraction from databases for the purpose of finding a discriminative set of features that is also a robust set for within-class classification. Our method is generic and we introduce it here in the field of breast cancer diagnosis from digital mammography data. The mathematical formalism is based on a generalization of the k-Feature Set problem called (α, β)-k-Feature Set problem, introduced by Cotta and Moscato (J Comput Syst Sci 67(4):686-690, 2003). This method proceeds in two steps: first, an optimal (α, β)-k-feature set of minimum cardinality is identified and then, a set of classification rules using these features is obtained. We obtain the (α, β)-k-feature set in two phases; first a series of extremely powerful reduction techniques, which do not lose the optimal solution, are employed; and second, a metaheuristic search to identify the remaining features to be considered or disregarded. Two algorithms were tested with a public domain digital mammography dataset composed of 71 malignant and 75 benign cases. Based on the results provided by the algorithms, we obtain classification rules that employ only a subset of these features.
Collapse
Affiliation(s)
- Luke Mathieson
- Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine (CIBM), Faculty of Engineering and Built Environment, The University of Newcastle, Callaghan, NSW, 2308, Australia
| | - Alexandre Mendes
- Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine (CIBM), Faculty of Engineering and Built Environment, The University of Newcastle, Callaghan, NSW, 2308, Australia
| | - John Marsden
- Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine (CIBM), Faculty of Engineering and Built Environment, The University of Newcastle, Callaghan, NSW, 2308, Australia
| | - Jeffrey Pond
- Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine (CIBM), Faculty of Engineering and Built Environment, The University of Newcastle, Callaghan, NSW, 2308, Australia
| | - Pablo Moscato
- Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine (CIBM), Faculty of Engineering and Built Environment, The University of Newcastle, Callaghan, NSW, 2308, Australia.
| |
Collapse
|
3
|
A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study. PLoS One 2015; 10:e0127702. [PMID: 26106884 PMCID: PMC4480358 DOI: 10.1371/journal.pone.0127702] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Accepted: 04/17/2015] [Indexed: 12/26/2022] Open
Abstract
Background The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics. Methods We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone. Results Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease.
Collapse
|
4
|
Cirillo N. Merging experimental data and in silicoanalysis: a systems-level approach to autoimmune disease and cancer. Expert Rev Clin Immunol 2012; 8:361-372. [DOI: 10.1586/eci.12.17] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
|
5
|
Johnstone D, Milward EA, Berretta R, Moscato P. Multivariate protein signatures of pre-clinical Alzheimer's disease in the Alzheimer's disease neuroimaging initiative (ADNI) plasma proteome dataset. PLoS One 2012; 7:e34341. [PMID: 22485168 PMCID: PMC3317783 DOI: 10.1371/journal.pone.0034341] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2011] [Accepted: 03/01/2012] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Recent Alzheimer's disease (AD) research has focused on finding biomarkers to identify disease at the pre-clinical stage of mild cognitive impairment (MCI), allowing treatment to be initiated before irreversible damage occurs. Many studies have examined brain imaging or cerebrospinal fluid but there is also growing interest in blood biomarkers. The Alzheimer's Disease Neuroimaging Initiative (ADNI) has generated data on 190 plasma analytes in 566 individuals with MCI, AD or normal cognition. We conducted independent analyses of this dataset to identify plasma protein signatures predicting pre-clinical AD. METHODS AND FINDINGS We focused on identifying signatures that discriminate cognitively normal controls (n = 54) from individuals with MCI who subsequently progress to AD (n = 163). Based on p value, apolipoprotein E (APOE) showed the strongest difference between these groups (p = 2.3 × 10(-13)). We applied a multivariate approach based on combinatorial optimization ((α,β)-k Feature Set Selection), which retains information about individual participants and maintains the context of interrelationships between different analytes, to identify the optimal set of analytes (signature) to discriminate these two groups. We identified 11-analyte signatures achieving values of sensitivity and specificity between 65% and 86% for both MCI and AD groups, depending on whether APOE was included and other factors. Classification accuracy was improved by considering "meta-features," representing the difference in relative abundance of two analytes, with an 8-meta-feature signature consistently achieving sensitivity and specificity both over 85%. Generating signatures based on longitudinal rather than cross-sectional data further improved classification accuracy, returning sensitivities and specificities of approximately 90%. CONCLUSIONS Applying these novel analysis approaches to the powerful and well-characterized ADNI dataset has identified sets of plasma biomarkers for pre-clinical AD. While studies of independent test sets are required to validate the signatures, these analyses provide a starting point for developing a cost-effective and minimally invasive test capable of diagnosing AD in its pre-clinical stages.
Collapse
Affiliation(s)
- Daniel Johnstone
- Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, The University of Newcastle, Callaghan, New South Wales, Australia
- School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, New South Wales, Australia
| | - Elizabeth A. Milward
- Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, The University of Newcastle, Callaghan, New South Wales, Australia
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, New South Wales, Australia
| | - Regina Berretta
- Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, The University of Newcastle, Callaghan, New South Wales, Australia
- School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, New South Wales, Australia
| | - Pablo Moscato
- Priority Research Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, The University of Newcastle, Callaghan, New South Wales, Australia
- School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, New South Wales, Australia
- * E-mail:
| | | |
Collapse
|
6
|
RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings. Cell Res 2012; 22:806-21. [PMID: 22349460 DOI: 10.1038/cr.2012.30] [Citation(s) in RCA: 282] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
There are remarkable disparities among patients of different races with prostate cancer; however, the mechanism underlying this difference remains unclear. Here, we present a comprehensive landscape of the transcriptome profiles of 14 primary prostate cancers and their paired normal counterparts from the Chinese population using RNA-seq, revealing tremendous diversity across prostate cancer transcriptomes with respect to gene fusions, long noncoding RNAs (long ncRNA), alternative splicing and somatic mutations. Three of the 14 tumors (21.4%) harbored a TMPRSS2-ERG fusion, and the low prevalence of this fusion in Chinese patients was further confirmed in an additional tumor set (10/54=18.5%). Notably, two novel gene fusions, CTAGE5-KHDRBS3 (20/54=37%) and USP9Y-TTTY15 (19/54=35.2%), occurred frequently in our patient cohort. Further systematic transcriptional profiling identified numerous long ncRNAs that were differentially expressed in the tumors. An analysis of the correlation between expression of long ncRNA and genes suggested that long ncRNAs may have functions beyond transcriptional regulation. This study yielded new insights into the pathogenesis of prostate cancer in the Chinese population.
Collapse
|
7
|
Fung DCY, Lo A, Jankova L, Clarke SJ, Molloy M, Robertson GR, Wilkins MR. Classification of cancer patients using pathway analysis and network clustering. Methods Mol Biol 2012; 781:311-36. [PMID: 21877288 DOI: 10.1007/978-1-61779-276-2_15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Molecular expression patterns have often been used for patient classification in oncology in an effort to improve prognostic prediction and treatment compatibility. This effort is, however, hampered by the highly heterogeneous data often seen in the molecular analysis of cancer. The lack of overall similarity between expression profiles makes it difficult to partition data using conventional data mining tools. In this chapter, the authors introduce a bioinformatics protocol that uses REACTOME pathways and patient-protein network structure (also called topology) as the basis for patient classification.
Collapse
Affiliation(s)
- David C Y Fung
- School of Biotechnology and Biomolecular Sciences, New South Wales Systems Biology Initiative, The University of New South Wales, Sydney, NSW, Australia
| | | | | | | | | | | | | |
Collapse
|
8
|
Tilli TM, Thuler LC, Matos AR, Coutinho-Camillo CM, Soares FA, da Silva EA, Neves AF, Goulart LR, Gimba ER. Expression analysis of osteopontin mRNA splice variants in prostate cancer and benign prostatic hyperplasia. Exp Mol Pathol 2011; 92:13-9. [PMID: 21963599 DOI: 10.1016/j.yexmp.2011.09.014] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2011] [Accepted: 09/13/2011] [Indexed: 01/19/2023]
Abstract
Osteopontin splicing isoforms (OPN-SI) present differential expression patterns and specific tumor roles. Our aims were to characterize OPN-SI expression in prostate cancer (PCa) and benign prostate hyperplasia (BPH) tissues, besides evaluating their potential as biomarkers for PCa diagnosis and prognostic implications. Prostatic tissue specimens were obtained from 40 PCa and 30 benign prostate hyperplasia (BPH) patients. Quantitative real time PCR (qRT-PCR) was used to measure OPN-SI mRNA expression. Immunohistochemical analysis was performed using an anti-OPNc polyclonal antibody. Biostatistical analyses evaluated the association of OPN-SI and total Prostate Specific Antigen (PSA) serum levels with clinical and pathological data. PCa tissue samples presented significantly higher levels of OPNa, OPNb and OPNc transcripts (p<0.01) than in BPH specimens. OPN-SI mRNA expression were positively correlated with Gleason Score (p<0.01). ROC curves and logistic regression analyses demonstrated that OPN-SI and PSA were able to distinguish PCa from BPH patients (p<0.01). The OPNc isoform was the most upregulated variant and the best marker to distinguish patients' groups, presenting sensitivity and specificity of 90% and 100%, respectively. Immunohistochemistry analysis also demonstrated OPNc upregulation in PCa samples as compared to BPH tissues. OPNcprotein was also strongly stained PCa tissues presenting High Gleason Score. Multivariate analysis indicated that OPNc expression levels above the cut-off value presented a chance 4-fold higher for PCa occurrence. We conclude that OPN-SI were overexpressed in PCa tissues, strongly associated with PCa occurrence and with tumor cell differentiation. Our results suggest OPNc splicing isoform as an important biomarker contributing to improve PCa diagnosis and prognosis, besides providing insights into early steps of PCa carcinogenesis.
Collapse
Affiliation(s)
- T M Tilli
- Programa de Medicina Experimental, Coordenação de Pesquisa-Instituto Nacional de Câncer, Programa de Pós Graduação Stricto Sensu em Oncologia do INCa, Rio de Janeiro-RJ, Brazil
| | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Penney KL, Sinnott JA, Fall K, Pawitan Y, Hoshida Y, Kraft P, Stark JR, Fiorentino M, Perner S, Finn S, Calza S, Flavin R, Freedman ML, Setlur S, Sesso HD, Andersson SO, Martin N, Kantoff PW, Johansson JE, Adami HO, Rubin MA, Loda M, Golub TR, Andrén O, Stampfer MJ, Mucci LA. mRNA expression signature of Gleason grade predicts lethal prostate cancer. J Clin Oncol 2011; 29:2391-6. [PMID: 21537050 DOI: 10.1200/jco.2010.32.6421] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
PURPOSE Prostate-specific antigen screening has led to enormous overtreatment of prostate cancer because of the inability to distinguish potentially lethal disease at diagnosis. We reasoned that by identifying an mRNA signature of Gleason grade, the best predictor of prognosis, we could improve prediction of lethal disease among men with moderate Gleason 7 tumors, the most common grade, and the most indeterminate in terms of prognosis. PATIENTS AND METHODS Using the complementary DNA-mediated annealing, selection, extension, and ligation assay, we measured the mRNA expression of 6,100 genes in prostate tumor tissue in the Swedish Watchful Waiting cohort (n = 358) and Physicians' Health Study (PHS; n = 109). We developed an mRNA signature of Gleason grade comparing individuals with Gleason ≤ 6 to those with Gleason ≥ 8 tumors and applied the model among patients with Gleason 7 to discriminate lethal cases. RESULTS We built a 157-gene signature using the Swedish data that predicted Gleason with low misclassification (area under the curve [AUC] = 0.91); when this signature was tested in the PHS, the discriminatory ability remained high (AUC = 0.94). In men with Gleason 7 tumors, who were excluded from the model building, the signature significantly improved the prediction of lethal disease beyond knowing whether the Gleason score was 4 + 3 or 3 + 4 (P = .006). CONCLUSION Our expression signature and the genes identified may improve our understanding of the de-differentiation process of prostate tumors. Additionally, the signature may have clinical applications among men with Gleason 7, by further estimating their risk of lethal prostate cancer and thereby guiding therapy decisions to improve outcomes and reduce overtreatment.
Collapse
Affiliation(s)
- Kathryn L Penney
- Department of Epidemiology, Harvard School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Rocha de Paula M, Gómez Ravetti M, Berretta R, Moscato P. Differences in abundances of cell-signalling proteins in blood reveal novel biomarkers for early detection of clinical Alzheimer's disease. PLoS One 2011; 6:e17481. [PMID: 21479255 PMCID: PMC3063784 DOI: 10.1371/journal.pone.0017481] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2010] [Accepted: 02/06/2011] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND In November 2007 a study published in Nature Medicine proposed a simple test based on the abundance of 18 proteins in blood to predict the onset of clinical symptoms of Alzheimer's Disease (AD) two to six years before these symptoms manifest. Later, another study, published in PLoS ONE, showed that only five proteins (IL-1, IL-3, EGF, TNF- and G-CSF) have overall better prediction accuracy. These classifiers are based on the abundance of 120 proteins. Such values were standardised by a Z-score transformation, which means that their values are relative to the average of all others. METHODOLOGY The original datasets from the Nature Medicine paper are further studied using methods from combinatorial optimisation and Information Theory. We expand the original dataset by also including all pair-wise differences of z-score values of the original dataset ("metafeatures"). Using an exact algorithm to solve the resulting Feature Set problem, used to tackle the feature selection problem, we found signatures that contain either only features, metafeatures or both, and evaluated their predictive performance on the independent test set. CONCLUSIONS It was possible to show that a specific pattern of cell signalling imbalance in blood plasma has valuable information to distinguish between NDC and AD samples. The obtained signatures were able to predict AD in patients that already had a Mild Cognitive Impairment (MCI) with up to 84% of sensitivity, while maintaining also a strong prediction accuracy of 90% on a independent dataset with Non Demented Controls (NDC) and AD samples. The novel biomarkers uncovered with this method now confirms ANG-2, IL-11, PDGF-BB, CCL15/MIP-1; and supports the joint measurement of other signalling proteins not previously discussed: GM-CSF, NT-3, IGFBP-2 and VEGF-B.
Collapse
Affiliation(s)
- Mateus Rocha de Paula
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, The University of Newcastle, Callaghan, Australia
| | - Martín Gómez Ravetti
- Departamento de Engenharia de Produção, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Regina Berretta
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, The University of Newcastle, Callaghan, Australia
| | - Pablo Moscato
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, The University of Newcastle, Callaghan, Australia
| |
Collapse
|
11
|
Riveros C, Mellor D, Gandhi KS, McKay FC, Cox MB, Berretta R, Vaezpour SY, Inostroza-Ponta M, Broadley SA, Heard RN, Vucic S, Stewart GJ, Williams DW, Scott RJ, Lechner-Scott J, Booth DR, Moscato P. A transcription factor map as revealed by a genome-wide gene expression analysis of whole-blood mRNA transcriptome in multiple sclerosis. PLoS One 2010; 5:e14176. [PMID: 21152067 PMCID: PMC2995726 DOI: 10.1371/journal.pone.0014176] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 10/20/2010] [Indexed: 12/03/2022] Open
Abstract
Background Several lines of evidence suggest that transcription factors are involved in the pathogenesis of Multiple Sclerosis (MS) but complete mapping of the whole network has been elusive. One of the reasons is that there are several clinical subtypes of MS and transcription factors that may be involved in one subtype may not be in others. We investigate the possibility that this network could be mapped using microarray technologies and contemporary bioinformatics methods on a dataset derived from whole blood in 99 untreated MS patients (36 Relapse Remitting MS, 43 Primary Progressive MS, and 20 Secondary Progressive MS) and 45 age-matched healthy controls. Methodology/Principal Findings We have used two different analytical methodologies: a non-standard differential expression analysis and a differential co-expression analysis, which have converged on a significant number of regulatory motifs that are statistically overrepresented in genes that are either differentially expressed (or differentially co-expressed) in cases and controls (e.g., V$KROX_Q6, p-value <3.31E-6; V$CREBP1_Q2, p-value <9.93E-6, V$YY1_02, p-value <1.65E-5). Conclusions/Significance Our analysis uncovered a network of transcription factors that potentially dysregulate several genes in MS or one or more of its disease subtypes. The most significant transcription factor motifs were for the Early Growth Response EGR/KROX family, ATF2, YY1 (Yin and Yang 1), E2F-1/DP-1 and E2F-4/DP-2 heterodimers, SOX5, and CREB and ATF families. These transcription factors are involved in early T-lymphocyte specification and commitment as well as in oligodendrocyte dedifferentiation and development, both pathways that have significant biological plausibility in MS causation.
Collapse
Affiliation(s)
- Carlos Riveros
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, University of Newcastle, and Hunter Medical Research Institute, Newcastle, Australia
| | - Drew Mellor
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, University of Newcastle, and Hunter Medical Research Institute, Newcastle, Australia
- School of Computer Science and Software Engineering, The University of Western Australia, Crawley, Australia
| | - Kaushal S. Gandhi
- Westmead Millennium Institute, University of Sydney, Westmead, Australia
| | - Fiona C. McKay
- Westmead Millennium Institute, University of Sydney, Westmead, Australia
| | - Mathew B. Cox
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, University of Newcastle, and Hunter Medical Research Institute, Newcastle, Australia
- Hunter Medical Research Institute, Newcastle, Australia
| | - Regina Berretta
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, University of Newcastle, and Hunter Medical Research Institute, Newcastle, Australia
| | - S. Yahya Vaezpour
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, University of Newcastle, and Hunter Medical Research Institute, Newcastle, Australia
- Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Mario Inostroza-Ponta
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, University of Newcastle, and Hunter Medical Research Institute, Newcastle, Australia
- Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Santiago, Chile
| | - Simon A. Broadley
- School of Medicine, Griffith University, Brisbane, Australia
- Department of Neurology, Gold Coast Hospital, Southport, Australia
| | - Robert N. Heard
- Westmead Millennium Institute, University of Sydney, Westmead, Australia
| | - Stephen Vucic
- Westmead Millennium Institute, University of Sydney, Westmead, Australia
| | - Graeme J. Stewart
- Westmead Millennium Institute, University of Sydney, Westmead, Australia
| | | | - Rodney J. Scott
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, University of Newcastle, and Hunter Medical Research Institute, Newcastle, Australia
| | - Jeanette Lechner-Scott
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, University of Newcastle, and Hunter Medical Research Institute, Newcastle, Australia
| | - David R. Booth
- Westmead Millennium Institute, University of Sydney, Westmead, Australia
| | - Pablo Moscato
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, University of Newcastle, and Hunter Medical Research Institute, Newcastle, Australia
- Australian Research Council Centre of Excellence in Bioinformatics, St Lucia, Australia
- * E-mail:
| | | |
Collapse
|
12
|
Berretta R, Moscato P. Cancer biomarker discovery: the entropic hallmark. PLoS One 2010; 5:e12262. [PMID: 20805891 PMCID: PMC2923618 DOI: 10.1371/journal.pone.0012262] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2009] [Accepted: 06/26/2010] [Indexed: 12/29/2022] Open
Abstract
Background It is a commonly accepted belief that cancer cells modify their transcriptional state during the progression of the disease. We propose that the progression of cancer cells towards malignant phenotypes can be efficiently tracked using high-throughput technologies that follow the gradual changes observed in the gene expression profiles by employing Shannon's mathematical theory of communication. Methods based on Information Theory can then quantify the divergence of cancer cells' transcriptional profiles from those of normally appearing cells of the originating tissues. The relevance of the proposed methods can be evaluated using microarray datasets available in the public domain but the method is in principle applicable to other high-throughput methods. Methodology/Principal Findings Using melanoma and prostate cancer datasets we illustrate how it is possible to employ Shannon Entropy and the Jensen-Shannon divergence to trace the transcriptional changes progression of the disease. We establish how the variations of these two measures correlate with established biomarkers of cancer progression. The Information Theory measures allow us to identify novel biomarkers for both progressive and relatively more sudden transcriptional changes leading to malignant phenotypes. At the same time, the methodology was able to validate a large number of genes and processes that seem to be implicated in the progression of melanoma and prostate cancer. Conclusions/Significance We thus present a quantitative guiding rule, a new unifying hallmark of cancer: the cancer cell's transcriptome changes lead to measurable observed transitions of Normalized Shannon Entropy values (as measured by high-througput technologies). At the same time, tumor cells increment their divergence from the normal tissue profile increasing their disorder via creation of states that we might not directly measure. This unifying hallmark allows, via the the Jensen-Shannon divergence, to identify the arrow of time of the processes from the gene expression profiles, and helps to map the phenotypical and molecular hallmarks of specific cancer subtypes. The deep mathematical basis of the approach allows us to suggest that this principle is, hopefully, of general applicability for other diseases.
Collapse
Affiliation(s)
- Regina Berretta
- Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, The University of Newcastle, Callaghan, New South Wales, Australia
- Information Based Medicine Program, Hunter Medical Research Institute, John Hunter Hospital, New Lambton Heights, New South Wales, Australia
| | - Pablo Moscato
- Centre for Bioinformatics, Biomarker Discovery and Information-Based Medicine, The University of Newcastle, Callaghan, New South Wales, Australia
- Information Based Medicine Program, Hunter Medical Research Institute, John Hunter Hospital, New Lambton Heights, New South Wales, Australia
- Australian Research Council Centre of Excellence in Bioinformatics, Callaghan, New South Wales, Australia
- * E-mail:
| |
Collapse
|
13
|
Gómez Ravetti M, Moscato P. Identification of a 5-protein biomarker molecular signature for predicting Alzheimer's disease. PLoS One 2008; 3:e3111. [PMID: 18769539 PMCID: PMC2518833 DOI: 10.1371/journal.pone.0003111] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2008] [Accepted: 08/04/2008] [Indexed: 11/19/2022] Open
Abstract
Background Alzheimer's disease (AD) is a progressive brain disease with a huge cost to human lives. The impact of the disease is also a growing concern for the governments of developing countries, in particular due to the increasingly high number of elderly citizens at risk. Alzheimer's is the most common form of dementia, a common term for memory loss and other cognitive impairments. There is no current cure for AD, but there are drug and non-drug based approaches for its treatment. In general the drug-treatments are directed at slowing the progression of symptoms. They have proved to be effective in a large group of patients but success is directly correlated with identifying the disease carriers at its early stages. This justifies the need for timely and accurate forms of diagnosis via molecular means. We report here a 5-protein biomarker molecular signature that achieves, on average, a 96% total accuracy in predicting clinical AD. The signature is composed of the abundances of IL-1α, IL-3, EGF, TNF-α and G-CSF. Methodology/Principal Findings Our results are based on a recent molecular dataset that has attracted worldwide attention. Our paper illustrates that improved results can be obtained with the abundance of only five proteins. Our methodology consisted of the application of an integrative data analysis method. This four step process included: a) abundance quantization, b) feature selection, c) literature analysis, d) selection of a classifier algorithm which is independent of the feature selection process. These steps were performed without using any sample of the test datasets. For the first two steps, we used the application of Fayyad and Irani's discretization algorithm for selection and quantization, which in turn creates an instance of the (alpha-beta)-k-Feature Set problem; a numerical solution of this problem led to the selection of only 10 proteins. Conclusions/Significance the previous study has provided an extremely useful dataset for the identification of AD biomarkers. However, our subsequent analysis also revealed several important facts worth reporting: 1. A 5-protein signature (which is a subset of the 18-protein signature of Ray et al.) has the same overall performance (when using the same classifier). 2. Using more than 20 different classifiers available in the widely-used Weka software package, our 5-protein signature has, on average, a smaller prediction error indicating the independence of the classifier and the robustness of this set of biomarkers (i.e. 96% accuracy when predicting AD against non-demented control). 3. Using very simple classifiers, like Simple Logistic or Logistic Model Trees, we have achieved the following results on 92 samples: 100 percent success to predict Alzheimer's Disease and 92 percent to predict Non Demented Control on the AD dataset.
Collapse
Affiliation(s)
- Martín Gómez Ravetti
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, The University of Newcastle, Callaghan, Australia
- * E-mail: (MGR); (PM)
| | - Pablo Moscato
- Centre for Bioinformatics, Biomarker Discovery & Information-Based Medicine, The University of Newcastle, Callaghan, Australia
- * E-mail: (MGR); (PM)
| |
Collapse
|