1
|
Marques D, Ferreira-Costa LR, Ferreira-Costa LL, Bezerra-Oliveira AB, Correa RDS, Ramos CCDO, Vinasco-Sandoval T, Lopes KDP, Vialle RA, Vidal AF, Silbiger VN, Ribeiro-dos-Santos Â. Role of miRNAs in Sigmoid Colon Cancer: A Search for Potential Biomarkers. Cancers (Basel) 2020; 12:cancers12113311. [PMID: 33182525 PMCID: PMC7697997 DOI: 10.3390/cancers12113311] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 09/12/2020] [Accepted: 09/25/2020] [Indexed: 02/07/2023] Open
Abstract
The aberrant expression of microRNAs in known to play a crucial role in carcinogenesis. Here, we evaluated the miRNA expression profile of sigmoid colon cancer (SCC) compared to adjacent-to-tumor (ADJ) and sigmoid colon healthy (SCH) tissues obtained from colon biopsy extracted from Brazilian patients. Comparisons were performed between each group separately, considering as significant p-values < 0.05 and |Log2(Fold-Change)| > 2. We found 20 differentially expressed miRNAs (DEmiRNAs) in all comparisons, two of which were shared between SCC vs. ADJ and SCC vs. SCH. We used miRTarBase, and miRTargetLink to identify target-genes of the differentially expressed miRNAs, and DAVID and REACTOME databases for gene enrichment analysis. We also used TCGA and GTEx databases to build miRNA-gene regulatory networks and check for the reproducibility in our results. As findings, in addition to previously known miRNAs associated with colorectal cancer, we identified three potential novel biomarkers. We showed that the three types of colon tissue could be clearly distinguished using a panel composed by the 20 DEmiRNAs. Additionally, we found enriched pathways related to the carcinogenic process in which miRNA could be involved, indicating that adjacent-to-tumor tissues may be already altered and cannot be considered as healthy tissues. Overall, we expect that these findings may help in the search for biomarkers to prevent cancer progression or, at least, allow its early detection, however, more studies are needed to confirm our results.
Collapse
Affiliation(s)
- Diego Marques
- Laboratório de Genética Humana e Médica, Universidade Federal do Pará, Av. Augusto Corrêa, 01, Guamá, Belém 66.075-110, Brazil; (D.M.); (T.V.-S.); (K.d.P.L.); (R.A.V.); (A.F.V.)
- Laboratório de Bioanálise e Biotecnologia Molecular, Universidade Federal do Rio Grande do Norte, Av. Nilo Peçanha, 620, Petrópolis, Natal 59012-300, Brazil; (L.R.F.-C.); (L.L.F.-C.); (A.B.B.-O.)
| | - Layse Raynara Ferreira-Costa
- Laboratório de Bioanálise e Biotecnologia Molecular, Universidade Federal do Rio Grande do Norte, Av. Nilo Peçanha, 620, Petrópolis, Natal 59012-300, Brazil; (L.R.F.-C.); (L.L.F.-C.); (A.B.B.-O.)
| | - Lorenna Larissa Ferreira-Costa
- Laboratório de Bioanálise e Biotecnologia Molecular, Universidade Federal do Rio Grande do Norte, Av. Nilo Peçanha, 620, Petrópolis, Natal 59012-300, Brazil; (L.R.F.-C.); (L.L.F.-C.); (A.B.B.-O.)
| | - Ana Beatriz Bezerra-Oliveira
- Laboratório de Bioanálise e Biotecnologia Molecular, Universidade Federal do Rio Grande do Norte, Av. Nilo Peçanha, 620, Petrópolis, Natal 59012-300, Brazil; (L.R.F.-C.); (L.L.F.-C.); (A.B.B.-O.)
| | - Romualdo da Silva Correa
- Departamento de Cirurgia Oncológica, Liga Norte Riograndense Contra o Câncer, R. Mário Negócio, 2267, Quintas, Natal 59040-000, Brazil;
| | - Carlos Cesar de Oliveira Ramos
- Laboratório de Patologia e Citopatologia, Liga Norte Riograndense Contra o Câncer, R. Mário Negócio, 2267, Quintas, Natal 59040-000, Brazil;
| | - Tatiana Vinasco-Sandoval
- Laboratório de Genética Humana e Médica, Universidade Federal do Pará, Av. Augusto Corrêa, 01, Guamá, Belém 66.075-110, Brazil; (D.M.); (T.V.-S.); (K.d.P.L.); (R.A.V.); (A.F.V.)
| | - Katia de Paiva Lopes
- Laboratório de Genética Humana e Médica, Universidade Federal do Pará, Av. Augusto Corrêa, 01, Guamá, Belém 66.075-110, Brazil; (D.M.); (T.V.-S.); (K.d.P.L.); (R.A.V.); (A.F.V.)
| | - Ricardo Assunção Vialle
- Laboratório de Genética Humana e Médica, Universidade Federal do Pará, Av. Augusto Corrêa, 01, Guamá, Belém 66.075-110, Brazil; (D.M.); (T.V.-S.); (K.d.P.L.); (R.A.V.); (A.F.V.)
| | - Amanda Ferreira Vidal
- Laboratório de Genética Humana e Médica, Universidade Federal do Pará, Av. Augusto Corrêa, 01, Guamá, Belém 66.075-110, Brazil; (D.M.); (T.V.-S.); (K.d.P.L.); (R.A.V.); (A.F.V.)
- Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Pará, Av. Augusto Corrêa, 01, Guamá, Belém 66.075-110, Brazil
| | - Vivian Nogueira Silbiger
- Laboratório de Bioanálise e Biotecnologia Molecular, Universidade Federal do Rio Grande do Norte, Av. Nilo Peçanha, 620, Petrópolis, Natal 59012-300, Brazil; (L.R.F.-C.); (L.L.F.-C.); (A.B.B.-O.)
- Correspondence: (V.N.S.); (Â.R.-d.-S.)
| | - Ândrea Ribeiro-dos-Santos
- Laboratório de Genética Humana e Médica, Universidade Federal do Pará, Av. Augusto Corrêa, 01, Guamá, Belém 66.075-110, Brazil; (D.M.); (T.V.-S.); (K.d.P.L.); (R.A.V.); (A.F.V.)
- Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Pará, Av. Augusto Corrêa, 01, Guamá, Belém 66.075-110, Brazil
- Núcleo de Pesquisas em Oncologia, Universidade Federal do Pará, R. dos Mundurucus, 4487, Guamá, Belém 66073-000, Brazil
- Correspondence: (V.N.S.); (Â.R.-d.-S.)
| |
Collapse
|
2
|
Carinci F, Lo Muzio L, Piattelli A, Rubini C, Chiesa F, Ionna F, Palmieri A, Maiorano E, Pastore A, Laino G, Dolci M, Pezzetti F. Potential Markers of Tongue Tumor Progression Selected by cDNA Micro Array. Int J Immunopathol Pharmacol 2016; 18:513-24. [PMID: 16164832 DOI: 10.1177/039463200501800311] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Squamous cell carcinoma (SCC), the most frequent malignant tumor of the oral cavity, generally exhibits a poor prognosis and metastases are the main cause of death. This tumor often arises from pre-malignant lesions. To date, it is difficult to predict if and which pre-malignant lesions may progress into oral SCC using traditional methods. For these reasons, several studies are trying to identify markers useful in the progression of pre-malignant lesions and tumors. To define the genetic expression profile of tongue tumor progression we compared 9 dysplasias (DS), 8 tumors without metastasis (TWM), 11 metastasizing SCCs (MT) of the tongue, and a baseline of 11 normal tissues by using cDNA microarray containing 19.2 K clones. We initially applied hierarchical agglomerative clustering based on information from all 6026 clones. Results were obtained by performing a two steps analysis: a Significance Analysis of Microarray (SAM) and a Gene Ontology search. One hundred and five clones have statistically significant different expression levels (FDR <0.01) between DS and TWM, whereas 570 genes have statistically significant difference expression levels between TWM and MT (FDR <0.01) as detected by SAM. By filtering with FatiGo only 33 genes were differentially expressed in TWN, respect to DS, whereas 155 genes were differentially expressed in MT respect to TWM. We detected some genes which encode for oncogenes, transcription factors and cell cycle regulators as potential markers of DS progression. Examples are BAG4, PAX3 and CCNI, respectively. Among potential markers of metastases are some genes related to cell mobility (TSPAN-2 and SNTA1), intercellular adhesion (integrin alpha 7) or extracellular matrix components (ADAMTS2 and cathepsin O). Additionally, under-expressed genes encoded apoptosis-related proteins (PDCD4 and CASP4). In conclusion, we identified several genes differentially expressed in tumor progression which can potentially help in better classifying premalignant lesions and tongue SCCs.
Collapse
MESH Headings
- Adult
- Aged
- Aged, 80 and over
- Algorithms
- Biomarkers, Tumor/genetics
- Biomarkers, Tumor/metabolism
- Carcinoma, Squamous Cell/diagnosis
- Carcinoma, Squamous Cell/genetics
- Carcinoma, Squamous Cell/metabolism
- Carcinoma, Squamous Cell/pathology
- Carcinoma, Squamous Cell/surgery
- DNA, Complementary/genetics
- DNA, Neoplasm/genetics
- Disease Progression
- Female
- Gene Expression Profiling
- Gene Expression Regulation, Neoplastic
- Humans
- Male
- Middle Aged
- Neoplasm Metastasis
- Neoplasm Staging
- Oligonucleotide Array Sequence Analysis
- Precancerous Conditions/classification
- Precancerous Conditions/genetics
- Precancerous Conditions/metabolism
- Software
- Tongue/pathology
- Tongue Neoplasms/diagnosis
- Tongue Neoplasms/genetics
- Tongue Neoplasms/metabolism
- Tongue Neoplasms/surgery
Collapse
Affiliation(s)
- F Carinci
- Section of Maxillofacial Surgery, University of Ferrara, Italy
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Wang M, Zhang W, Ding W, Dai D, Zhang H, Xie H, Chen L, Guo Y, Xie J. Parallel clustering algorithm for large-scale biological data sets. PLoS One 2014; 9:e91315. [PMID: 24705246 PMCID: PMC3976248 DOI: 10.1371/journal.pone.0091315] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Accepted: 02/10/2014] [Indexed: 02/06/2023] Open
Abstract
BACKGROUNDS Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. METHODS Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. RESULT A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.
Collapse
Affiliation(s)
- Minchao Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Wu Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
- High Performance Computing Center, Shanghai University, Shanghai, P.R.China
| | - Wang Ding
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Dongbo Dai
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Huiran Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| | - Hao Xie
- College of Stomatology, Wuhan University, Wuhan, P.R.China
| | - Luonan Chen
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P.R.China
| | - Yike Guo
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
- Department of Computing, Imperial College London, London, United Kingdom
| | - Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, P.R.China
| |
Collapse
|
4
|
Eijssen LMT, Lindsey PJ, Peeters R, Westra RL, van Eijsden RGE, Bolotin-Fukuhara M, Smeets HJM, Vlietinck RFM. A novel stepwise analysis procedure of genome-wide expression profiles identifies transcript signatures of thiamine genes as classifiers of mitochondrial mutants. Yeast 2008; 25:129-40. [PMID: 18081196 DOI: 10.1002/yea.1573] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
To extract functional information on genes and processes from large expression datasets, analysis methods are required that can computationally deal with these amounts of data, are tunable to specific research questions, and construct classifiers that are not overspecific to the dataset at hand. To satisfy these requirements, a stepwise procedure that combines elements from principal component analysis and discriminant analysis, was developed to specifically retrieve genes involved in processes of interest and classify samples based upon those genes. In a global expression dataset of 300 gene knock-outs in Saccharomyces cerevisiae, the procedure successfully classified samples with similar 'cellular component' Gene Ontology annotations of the knock-out gene by expression signatures of limited numbers of genes. The genes discriminating 'mitochondrion' from the other subgroups were evaluated in more detail. The thiamine pathway turned out to be one of the processes involved and was successfully evaluated in a logistic model to predict whether yeast knock-outs were mitochondrial or not. Further, this pathway is biologically related to the mitochondrial system. Hence, this strongly indicates that our approach is effective and efficient in extracting meaningful information from large microarray experiments and assigning functions to yet uncharacterized genes.
Collapse
Affiliation(s)
- L M T Eijssen
- Department of Genetics and Cell Biology, Maastricht University, PO Box 616, 6200 MD Maastricht, The Netherlands.
| | | | | | | | | | | | | | | |
Collapse
|
5
|
Lee G, Rodriguez C, Madabhushi A. Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:368-84. [PMID: 18670041 PMCID: PMC2562675 DOI: 10.1109/tcbb.2008.36] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.
Collapse
Affiliation(s)
- George Lee
- Department of Biomedical Engineering, Rutgers The State University of New Jersey, 599 Taylor Road, Piscatway, NJ 08854, USA.
| | | | | |
Collapse
|
6
|
Prediction of the tissue-specificity of selective estrogen receptor modulators by using a single biochemical method. Proc Natl Acad Sci U S A 2008; 105:7171-6. [PMID: 18474858 DOI: 10.1073/pnas.0710802105] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Here, we demonstrate that a single biochemical assay is able to predict the tissue-selective pharmacology of an array of selective estrogen receptor modulators (SERMs). We describe an approach to classify estrogen receptor (ER) modulators based on dynamics of the receptor-ligand complex as probed with hydrogen/deuterium exchange (HDX) mass spectrometry. Differential HDX mapping coupled with cluster and discriminate analysis effectively predicted tissue-selective function in most, but not all, cases tested. We demonstrate that analysis of dynamics of the receptor-ligand complex facilitates binning of ER modulators into distinct groups based on structural dynamics. Importantly, we were able to differentiate small structural changes within ER ligands of the same chemotype. In addition, HDX revealed differentially stabilized regions within the ligand-binding pocket that may contribute to the different pharmacology phenotypes of the compounds independent of helix 12 positioning. In summary, HDX provides a sensitive and rapid approach to classify modulators of the estrogen receptor that correlates with their pharmacological profile.
Collapse
|
7
|
Yoo C, Gernaey KV. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms. JOURNAL OF CHEMICAL ENGINEERING OF JAPAN 2008. [DOI: 10.1252/jcej.08we042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Changkyoo Yoo
- College of Environment and Applied Chemistry, Green Energy Center/Center for Environmental Studies, Kyung Hee University
| | - Krist V. Gernaey
- Department of Chemical Engineering, Technical University of Denmark
| |
Collapse
|
8
|
Kaput J, Dawson K. Complexity of type 2 diabetes mellitus data sets emerging from nutrigenomic research: a case for dimensionality reduction? Mutat Res 2007; 622:19-32. [PMID: 17559889 PMCID: PMC1994901 DOI: 10.1016/j.mrfmmm.2007.02.033] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2006] [Accepted: 02/13/2007] [Indexed: 02/07/2023]
Abstract
Nutrigenomics promises personalized nutrition and an improvement in preventing, delaying, and reducing the symptoms of chronic diseases such as diabetes. Nutritional genomics is the study of how foods affect the expression of genetic information in an individual and how an individual's genetic makeup affects the metabolism and response to nutrients and other bioactive components in food. The path to those promises has significant challenges, from experimental designs that include analysis of genetic heterogeneity to the complexities of food and environmental factors. One of the more significant complications in developing the knowledge base and potential applications is how to analyze high-dimensional datasets of genetic, nutrient, metabolomic (clinical), and other variables influencing health and disease processes. Type 2 diabetes mellitus (T2DM) is used as an illustration of the challenges in studying complex phenotypes with nutrigenomics concepts and approaches.
Collapse
Affiliation(s)
- Jim Kaput
- Center of Excellence in Nutritional Genomics, University of California at Davis, Davis, CA 95616, USA.
| | | |
Collapse
|
9
|
Kamiya Y, Furao S, Hasegawa O. An Incremental Neural Network for Online Supervised Learning and Topology Learning. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2007. [DOI: 10.20965/jaciii.2007.p0087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A new self-organizing incremental network is designed for online supervised learning. During learning of the network, an adaptive similarity threshold is used to judge if new nodes are needed when online training data are introduced into the system. Nodes caused by noise are deleted to decrease the misclassification. The proposed network, which is robust to noisy training data, suits the following tasks: (1) online or even life-long supervised learning; (2) incremental learning, i.e., learning new information without destroying old learned information; (3) learning without any predefined optimal condition; (4) representing the topology structure of inputting online data; and (5) learning the number of nodes needed to represent every class. Experiments of artificial data and high-dimension real-world data show that the proposed method can achieve classification with a high recognition ratio, high speed, and low memory.
Collapse
|
10
|
Weeraratna AT, Taub DD. Microarray data analysis: an overview of design, methodology, and analysis. Methods Mol Biol 2007; 377:1-16. [PMID: 17634607 DOI: 10.1007/978-1-59745-390-5_1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Microarray analysis results in the gathering of massive amounts of information concerning gene expression profiles of different cells and experimental conditions. Analyzing these data can often be a quagmire, with endless discussion as to what the appropriate statistical analyses for any given experiment might be. As a result many different methods of data analysis have evolved, the basics of which are outlined in this chapter.
Collapse
Affiliation(s)
- Ashani T Weeraratna
- Laboratory of Immunology, National Institutes of Health, National Institute on Aging, Gerontology Research Center, Baltimore, MD, USA
| | | |
Collapse
|
11
|
Mamtani MR, Thakre TP, Kalkonde MY, Amin MA, Kalkonde YV, Amin AP, Kulkarni H. A simple method to combine multiple molecular biomarkers for dichotomous diagnostic classification. BMC Bioinformatics 2006; 7:442. [PMID: 17032455 PMCID: PMC1618410 DOI: 10.1186/1471-2105-7-442] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2006] [Accepted: 10/10/2006] [Indexed: 11/29/2022] Open
Abstract
Background In spite of the recognized diagnostic potential of biomarkers, the quest for squelching noise and wringing in information from a given set of biomarkers continues. Here, we suggest a statistical algorithm that – assuming each molecular biomarker to be a diagnostic test – enriches the diagnostic performance of an optimized set of independent biomarkers employing established statistical techniques. We validated the proposed algorithm using several simulation datasets in addition to four publicly available real datasets that compared i) subjects having cancer with those without; ii) subjects with two different cancers; iii) subjects with two different types of one cancer; and iv) subjects with same cancer resulting in differential time to metastasis. Results Our algorithm comprises of three steps: estimating the area under the receiver operating characteristic curve for each biomarker, identifying a subset of biomarkers using linear regression and combining the chosen biomarkers using linear discriminant function analysis. Combining these established statistical methods that are available in most statistical packages, we observed that the diagnostic accuracy of our approach was 100%, 99.94%, 96.67% and 93.92% for the real datasets used in the study. These estimates were comparable to or better than the ones previously reported using alternative methods. In a synthetic dataset, we also observed that all the biomarkers chosen by our algorithm were indeed truly differentially expressed. Conclusion The proposed algorithm can be used for accurate diagnosis in the setting of dichotomous classification of disease states.
Collapse
Affiliation(s)
| | - Tushar P Thakre
- Lata Medical Research Foundation, Nagpur, India
- University of North Texas Health Science Center, Fort Worth, Texas, USA
| | | | | | | | - Amit P Amin
- Lata Medical Research Foundation, Nagpur, India
| | | |
Collapse
|
12
|
Gillis JS. Microarray evidence of glutaminyl cyclase gene expression in melanoma: implications for tumor antigen specific immunotherapy. J Transl Med 2006; 4:27. [PMID: 16820060 PMCID: PMC1557589 DOI: 10.1186/1479-5876-4-27] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2006] [Accepted: 07/04/2006] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND In recent years encouraging progress has been made in developing vaccine treatments for cancer, particularly with melanoma. However, the overall rate of clinically significant results has remained low. The present research used microarray datasets from previous investigations to examine gene expression patterns in cancer cell lines with the goal of better understanding the tumor microenvironment. METHODS Principal Components Analyses with Promax rotational transformations were carried out with 90 cancer cell lines from 3 microarray datasets, which had been made available on the internet as supplementary information from prior publications. RESULTS In each of the analyses a well defined melanoma component was identified that contained a gene coding for the enzyme, glutaminyl cyclase, which was as highly expressed as genes from a variety of well established biomarkers for melanoma, such as MAGE-3 and MART-1, which have frequently been used in clinical trials of melanoma vaccines. CONCLUSION Since glutaminyl cyclase converts glutamine and glutamic acid into a pyroglutamic form, it may interfere with the tumor destructive process of vaccines using peptides having glutamine or glutamic acid at their N-terminals. Finding ways of inhibiting the activity of glutaminyl cyclase in the tumor microenvironment may help to increase the effectiveness of some melanoma vaccines.
Collapse
Affiliation(s)
- John Stuart Gillis
- Science and Technology Studies, St, Thomas University, Fredericton, New Brunswick, Canada.
| |
Collapse
|
13
|
Li B, Gallin WJ. Computational identification of residues that modulate voltage sensitivity of voltage-gated potassium channels. BMC STRUCTURAL BIOLOGY 2005; 5:16. [PMID: 16111489 PMCID: PMC1208917 DOI: 10.1186/1472-6807-5-16] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2005] [Accepted: 08/19/2005] [Indexed: 01/29/2023]
Abstract
Background Studies of the structure-function relationship in proteins for which no 3D structure is available are often based on inspection of multiple sequence alignments. Many functionally important residues of proteins can be identified because they are conserved during evolution. However, residues that vary can also be critically important if their variation is responsible for diversity of protein function and improved phenotypes. If too few sequences are studied, the support for hypotheses on the role of a given residue will be weak, but analysis of large multiple alignments is too complex for simple inspection. When a large body of sequence and functional data are available for a protein family, mature data mining tools, such as machine learning, can be applied to extract information more easily, sensitively and reliably. We have undertaken such an analysis of voltage-gated potassium channels, a transmembrane protein family whose members play indispensable roles in electrically excitable cells. Results We applied different learning algorithms, combined in various implementations, to obtain a model that predicts the half activation voltage of a voltage-gated potassium channel based on its amino acid sequence. The best result was obtained with a k-nearest neighbor classifier combined with a wrapper algorithm for feature selection, producing a mean absolute error of prediction of 7.0 mV. The predictor was validated by permutation test and evaluation of independent experimental data. Feature selection identified a number of residues that are predicted to be involved in the voltage sensitive conformation changes; these residues are good target candidates for mutagenesis analysis. Conclusion Machine learning analysis can identify new testable hypotheses about the structure/function relationship in the voltage-gated potassium channel family. This approach should be applicable to any protein family if the number of training examples and the sequence diversity of the training set that are necessary for robust prediction are empirically validated. The predictor and datasets can be found at the VKCDB web site [1].
Collapse
Affiliation(s)
- Bin Li
- Department of Biological Sciences, University of Alberta, Edmonton, Canada T6G 2E9
- Partners AIDS Research Center, Massachusetts General Hospital, Harvard Medical School, 149 13th Street 6th floor, Charlestown MA USA 02129
| | - Warren J Gallin
- Department of Biological Sciences, University of Alberta, Edmonton, Canada T6G 2E9
- Department of Cell Biology, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
14
|
Hibbs MA, Dirksen NC, Li K, Troyanskaya OG. Visualization methods for statistical analysis of microarray clusters. BMC Bioinformatics 2005; 6:115. [PMID: 15890080 PMCID: PMC1156867 DOI: 10.1186/1471-2105-6-115] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2004] [Accepted: 05/12/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gold-standard. Appropriate data visualization tools can aid this analysis process, but existing visualization methods do not specifically address this issue. RESULTS We present several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices. This methodology is implemented in GeneVAnD (Genomic Visual ANalysis of Datasets) and is available at http://function.princeton.edu/GeneVAnD. CONCLUSION Incorporating relevant statistical information into data visualizations is key for analysis of large biological datasets, particularly because of high levels of noise and the lack of a gold-standard for comparisons. We developed several new visualization techniques and demonstrated their effectiveness for evaluating cluster quality and relationships between clusters.
Collapse
Affiliation(s)
- Matthew A Hibbs
- Computer Science Department, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton, NJ 08544, USA
| | - Nathaniel C Dirksen
- Computer Science Department, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| | - Kai Li
- Computer Science Department, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| | - Olga G Troyanskaya
- Computer Science Department, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton, NJ 08544, USA
| |
Collapse
|
15
|
Zhang N, Xu Y, Akash M, McCouch S, Oard JH. Identification of candidate markers associated with agronomic traits in rice using discriminant analysis. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2005; 110:721-9. [PMID: 15678327 DOI: 10.1007/s00122-004-1898-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2004] [Accepted: 12/01/2004] [Indexed: 05/20/2023]
Abstract
Plant genetic mapping strategies routinely utilize marker genotype frequencies obtained from progeny of controlled crosses to declare presence of a quantitative trait locus (QTL) on previously constructed linkage maps. We have evaluated the potential of discriminant analysis (DA), a multivariate statistical procedure, to detect candidate markers associated with agronomic traits among inbred lines of rice (Oryza sativa L.). A total of 218 lines originating from the US and Asia were planted in field plots near Alvin, Texas, in 1996 and 1997. Agronomic data were collected for 12 economically important traits, and DNA profiles of each inbred line were produced using 60 SSR and 114 RFLP markers. Model-based methods revealed population structure among the lines. Marker alleles associated with all traits were identified by DA at high levels of correct percent classification within subpopulations and across all lines. Associated marker alleles pointed to the same and different regions on the rice genetic map when compared to previous QTL mapping experiments. Results from this study suggest that candidate markers associated with agronomic traits can be readily detected among inbred lines of rice using DA combined with other methods described in this report.
Collapse
Affiliation(s)
- N Zhang
- Department of Agronomy and Environmental Management, LSU AgCenter, Louisiana State University, Baton Rouge, LA 70803, USA
| | | | | | | | | |
Collapse
|
16
|
Dennis JL, Oien KA. Hunting the primary: novel strategies for defining the origin of tumours. J Pathol 2005; 205:236-47. [PMID: 15641019 DOI: 10.1002/path.1702] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In 1995, two methods of genome-wide expression profiling were first described: expression microarrays and serial analysis of gene expression (SAGE). In the subsequent 10 years, many hundreds of papers have been published describing the application of these technologies to a wide spectrum of biological and clinical questions. Common to all of this research is a basic process of data gathering and analysis. The techniques and statistical and bio-informatic tools involved in this process are reviewed. The processes of class discovery (using clustering and self-organizing maps), class prediction (weighted voting, k nearest neighbour, support vector machines, and artificial neural networks), target identification (fold change, discriminant analysis, and principal component analysis), and target validation (RT-PCR and tissue microarrays) are described. Finally, the diagnostic problem of adenocarcinomas that present as metastases of unknown origin is reviewed, and it is demonstrated how integration of expression profiling techniques promises to throw new light on this important clinical challenge.
Collapse
Affiliation(s)
- Jayne L Dennis
- Department of Cancer Medicine, Imperial College of Science, Technology and Medicine at Hammersmith Hospital, London, UK
| | | |
Collapse
|
17
|
Deshane J, Chaves L, Sarikonda KV, Isbell S, Wilson L, Kirk M, Grubbs C, Barnes S, Meleth S, Kim H. Proteomics analysis of rat brain protein modulations by grape seed extract. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2004; 52:7872-7883. [PMID: 15612770 DOI: 10.1021/jf040407d] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Dietary supplements such as grape seed extract (GSE) enriched in proanthocyanidins (PA) (oligomeric polyphenols) have been suggested to have multiple health benefits, due to antioxidant and other beneficial activities of the PA. However, a systematic analysis of the molecular basis of these benefits has not been demonstrated. Because the brain is vulnerable to age-related oxidative damage and other insults including inflammation, it was hypothesized that rats ingesting GSE would experience changes in expression or modifications of specific brain proteins that might protect against pathologic events. Normal adult female rats were fed diets supplemented with 5% GSE for 6 weeks. Proteomics analysis (2D electrophoresis and mass spectrometry) of brain homogenates from these animals identified 13 proteins that were altered in amount and/or charge. Because many of these changes were quantitatively in the opposite direction from previous findings for the same proteins in either Alzheimer disease or mouse models of neurodegeneration, the data suggest that these identified proteins may mediate the neuroprotective actions of GSE. This is the first identification and quantitation of specific proteins in mammalian tissues modulated by a dietary supplement, as well as the first to demonstrate links of such proteins with any disease.
Collapse
Affiliation(s)
- Jessy Deshane
- Department of Pharmacology and Toxicology, University of Alabama at Birmingham Comprehensive Cancer Center, Birmingham, Alabama 35294, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Abstract
The "informatics revolution" in both bioinformatics and dental informatics will eventually change the way we practice dentistry. This convergence will play a pivotal role in creating a bridge of opportunity by integrating scientific and clinical specialties to promote the advances in treatment, risk assessment, diagnosis, therapeutics, and oral health-care outcome. Bioinformatics has been an emerging field in the biomedical research community and has been gaining momentum in dental medicine. This area has created a steady stream of large and complex genomic data, which has transformed the way a clinical or basic science researcher approaches genomic research. This application to dental medicine, termed "oral genomics", can aid in the molecular understanding of the genes and proteins, their interactions, pathways, and networks that are responsible for the development and progression of oral diseases and disorders. As the result of the Human Genome Project, new advances have prompted high-throughput technologies, such as DNA microarrays, which have become accepted tools in the biomedical research community. This manuscript reviews the two most commonly used microarray technologies, basic microarray data analysis, and the results from several ongoing oral cancer genomic studies.
Collapse
Affiliation(s)
- W P Kuo
- Harvard School of Dental Medicine, Department of Oral Medicine, Infection, and Immunity, 188 Longwood Avenue, Boston, MA 02115, USA.
| |
Collapse
|
19
|
Gunther EC, Stone DJ, Gerwien RW, Bento P, Heyes MP. Prediction of clinical drug efficacy by classification of drug-induced genomic expression profiles in vitro. Proc Natl Acad Sci U S A 2003; 100:9608-13. [PMID: 12869696 PMCID: PMC170965 DOI: 10.1073/pnas.1632587100] [Citation(s) in RCA: 132] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Assays of drug action typically evaluate biochemical activity. However, accurately matching therapeutic efficacy with biochemical activity is a challenge. High-content cellular assays seek to bridge this gap by capturing broad information about the cellular physiology of drug action. Here, we present a method of predicting the general therapeutic classes into which various psychoactive drugs fall, based on high-content statistical categorization of gene expression profiles induced by these drugs. When we used the classification tree and random forest supervised classification algorithms to analyze microarray data, we derived general "efficacy profiles" of biomarker gene expression that correlate with anti-depressant, antipsychotic and opioid drug action on primary human neurons in vitro. These profiles were used as predictive models to classify naïve in vitro drug treatments with 83.3% (random forest) and 88.9% (classification tree) accuracy. Thus, the detailed information contained in genomic expression data is sufficient to match the physiological effect of a novel drug at the cellular level with its clinical relevance. This capacity to identify therapeutic efficacy on the basis of gene expression signatures in vitro has potential utility in drug discovery and drug target validation.
Collapse
Affiliation(s)
- Erik C Gunther
- CuraGen Corporation, 322 East Main Street, Branford, CT 06405, USA.
| | | | | | | | | |
Collapse
|
20
|
Morrison DA, Ellis JT. The design and analysis of microarray experiments: applications in parasitology. DNA Cell Biol 2003; 22:357-94. [PMID: 12906732 DOI: 10.1089/104454903767650658] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Microarray experiments can generate enormous amounts of data, but large datasets are usually inherently complex, and the relevant information they contain can be difficult to extract. For the practicing biologist, we provide an overview of what we believe to be the most important issues that need to be addressed when dealing with microarray data. In a microarray experiment we are simply trying to identify which genes are the most "interesting" in terms of our experimental question, and these will usually be those that are either overexpressed or underexpressed (upregulated or downregulated) under the experimental conditions. Analysis of the data to find these genes involves first preprocessing of the raw data for quality control, including filtering of the data (e.g., detection of outlying values) followed by standardization of the data (i.e., making the data uniformly comparable throughout the dataset). This is followed by the formal quantitative analysis of the data, which will involve either statistical hypothesis testing or multivariate pattern recognition. Statistical hypothesis testing is the usual approach to "class comparison," where several experimental groups are being directly compared. The best approach to this problem is to use analysis of variance, although issues related to multiple hypothesis testing and probability estimation still need to be evaluated. Pattern recognition can involve "class prediction," for which a range of supervised multivariate techniques are available, or "class discovery," for which an even broader range of unsupervised multivariate techniques have been developed. Each technique has its own limitations, which need to be kept in mind when making a choice from among them. To put these ideas in context, we provide a detailed examination of two specific examples of the analysis of microarray data, both from parasitology, covering many of the most important points raised.
Collapse
Affiliation(s)
- David A Morrison
- Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences, Uppsala, Sweden
| | | |
Collapse
|
21
|
Cook KL, Sayler GS. Environmental application of array technology: promise, problems and practicalities. Curr Opin Biotechnol 2003; 14:311-8. [PMID: 12849785 DOI: 10.1016/s0958-1669(03)00057-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Array technology has been applied in environmental research using innovative approaches in gene expression, comparative genomics and mixed community analysis. Greater fundamental understanding of sources of experimental and analytical error in array experiments should facilitate the future application of array technology to environmental analysis.
Collapse
Affiliation(s)
- Kimberly L Cook
- Department of Microbiology, Center for Environmental Biotechnology, 676 Dabney Hall, University of Tennessee, Knoxville, TN 37996, USA.
| | | |
Collapse
|
22
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2003. [PMCID: PMC2447381 DOI: 10.1002/cfg.226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|