1
|
Budhraja S, Doborjeh M, Singh B, Tan S, Doborjeh Z, Lai E, Merkin A, Lee J, Goh W, Kasabov N. Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data. Brief Bioinform 2023; 24:bbad382. [PMID: 37889118 PMCID: PMC10605029 DOI: 10.1093/bib/bbad382] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 09/18/2023] [Accepted: 10/03/2023] [Indexed: 10/28/2023] Open
Abstract
Selecting informative features, such as accurate biomarkers for disease diagnosis, prognosis and response to treatment, is an essential task in the field of bioinformatics. Medical data often contain thousands of features and identifying potential biomarkers is challenging due to small number of samples in the data, method dependence and non-reproducibility. This paper proposes a novel ensemble feature selection method, named Filter and Wrapper Stacking Ensemble (FWSE), to identify reproducible biomarkers from high-dimensional omics data. In FWSE, filter feature selection methods are run on numerous subsets of the data to eliminate irrelevant features, and then wrapper feature selection methods are applied to rank the top features. The method was validated on four high-dimensional medical datasets related to mental illnesses and cancer. The results indicate that the features selected by FWSE are stable and statistically more significant than the ones obtained by existing methods while also demonstrating biological relevance. Furthermore, FWSE is a generic method, applicable to various high-dimensional datasets in the fields of machine intelligence and bioinformatics.
Collapse
Affiliation(s)
- Sugam Budhraja
- Knowledge Engineering and Discovery Research Innovation (KEDRI), School of Engineering Computer and Mathematical Sciences, Auckland University of Technology, 55 Wellesley Street East, 1010 Auckland, New Zealand
| | - Maryam Doborjeh
- Knowledge Engineering and Discovery Research Innovation (KEDRI), School of Engineering Computer and Mathematical Sciences, Auckland University of Technology, 55 Wellesley Street East, 1010 Auckland, New Zealand
| | - Balkaran Singh
- Knowledge Engineering and Discovery Research Innovation (KEDRI), School of Engineering Computer and Mathematical Sciences, Auckland University of Technology, 55 Wellesley Street East, 1010 Auckland, New Zealand
| | - Samuel Tan
- Lee Kong Chian School of Medicine, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
| | - Zohreh Doborjeh
- School of Population Health, The University of Auckland, Grafton, 1023,Auckland, New Zealand
| | - Edmund Lai
- Knowledge Engineering and Discovery Research Innovation (KEDRI), School of Engineering Computer and Mathematical Sciences, Auckland University of Technology, 55 Wellesley Street East, 1010 Auckland, New Zealand
| | - Alexander Merkin
- National Institute for Stroke and Applied Neuroscience, Auckland University of Technology, 55 Wellesley Street East, 1010 Auckland, New Zealand
| | - Jimmy Lee
- Lee Kong Chian School of Medicine, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
- Institute of Mental Health, 10 Buangkok View, 539747, Singapore
| | - Wilson Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
- School of Biological Sciences, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore
| | - Nikola Kasabov
- Knowledge Engineering and Discovery Research Innovation (KEDRI), School of Engineering Computer and Mathematical Sciences, Auckland University of Technology, 55 Wellesley Street East, 1010 Auckland, New Zealand
- Intelligent Systems Research Center, Ulster University, Magee Campus, Derry, BT48 7JL, Ulster, United Kingdom
- Auckland Bioengineering Institute, The University of Auckland, 6/70 Symonds Street, 1010 Auckland, New Zealand
- Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
| |
Collapse
|
2
|
Tumor Nonimmune-Microenvironment-Related Gene Expression Signature Predicts Brain Metastasis in Lung Adenocarcinoma Patients after Surgery: A Machine Learning Approach Using Gene Expression Profiling. Cancers (Basel) 2021; 13:cancers13174468. [PMID: 34503278 PMCID: PMC8430997 DOI: 10.3390/cancers13174468] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/30/2021] [Accepted: 09/02/2021] [Indexed: 12/26/2022] Open
Abstract
Simple Summary It is important to be able to predict brain metastasis in lung adenocarcinoma patients; however, research in this area is still lacking. Much of the previous work on tumor microenvironments in lung adenocarcinoma with brain metastasis concerns the tumor immune microenvironment. The importance of the tumor nonimmune microenvironment (extracellular matrix (ECM), epithelial–mesenchymal transition (EMT) feature, and angiogenesis) has been overlooked with regard to brain metastasis. We evaluated tumor nonimmune-microenvironment-related gene expression signatures that could predict brain metastasis after the surgical resection of lung adenocarcinoma using a machine learning approach. We identified a tumor nonimmune-microenvironment-related 17-gene expression signature, and this signature showed high brain metastasis predictive power in four machine learning classifiers. The immunohistochemical expression of the top three genes of the 17-gene expression signature yielded similar results to NanoString tests. Our tumor nonimmune-microenvironment-related gene expression signatures are important biological markers that can predict brain metastasis and provide patient-specific treatment options. Abstract Using a machine learning approach with a gene expression profile, we discovered a tumor nonimmune-microenvironment-related gene expression signature, including extracellular matrix (ECM) remodeling, epithelial–mesenchymal transition (EMT), and angiogenesis, that could predict brain metastasis (BM) after the surgical resection of 64 lung adenocarcinomas (LUAD). Gene expression profiling identified a tumor nonimmune-microenvironment-related 17-gene expression signature that significantly correlated with BM. Of the 17 genes, 11 were ECM-remodeling-related genes. The 17-gene expression signature showed high BM predictive power in four machine learning classifiers (areas under the receiver operating characteristic curve = 0.845 for naïve Bayes, 0.849 for support vector machine, 0.858 for random forest, and 0.839 for neural network). Subgroup analysis revealed that the BM predictive power of the 17-gene signature was higher in the early-stage LUAD than in the late-stage LUAD. Pathway enrichment analysis showed that the upregulated differentially expressed genes were mainly enriched in the ECM–receptor interaction pathway. The immunohistochemical expression of the top three genes of the 17-gene expression signature yielded similar results to NanoString tests. The tumor nonimmune-microenvironment-related gene expression signatures found in this study are important biological markers that can predict BM and provide patient-specific treatment options.
Collapse
|
3
|
Zhang X, Jonassen I, Goksøyr A. Machine Learning Approaches for Biomarker Discovery Using Gene Expression Data. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
|
4
|
Eckardt JN, Bornhäuser M, Wendt K, Middeke JM. Application of machine learning in the management of acute myeloid leukemia: current practice and future prospects. Blood Adv 2020; 4:6077-6085. [PMID: 33290546 PMCID: PMC7724910 DOI: 10.1182/bloodadvances.2020002997] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Accepted: 10/26/2020] [Indexed: 12/19/2022] Open
Abstract
Machine learning (ML) is rapidly emerging in several fields of cancer research. ML algorithms can deal with vast amounts of medical data and provide a better understanding of malignant disease. Its ability to process information from different diagnostic modalities and functions to predict prognosis and suggest therapeutic strategies indicates that ML is a promising tool for the future management of hematologic malignancies; acute myeloid leukemia (AML) is a model disease of various recent studies. An integration of these ML techniques into various applications in AML management can assure fast and accurate diagnosis as well as precise risk stratification and optimal therapy. Nevertheless, these techniques come with various pitfalls and need a strict regulatory framework to ensure safe use of ML. This comprehensive review highlights and discusses recent advances in ML techniques in the management of AML as a model disease of hematologic neoplasms, enabling researchers and clinicians alike to critically evaluate this upcoming, potentially practice-changing technology.
Collapse
Affiliation(s)
- Jan-Niklas Eckardt
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, Dresden, Germany
| | - Martin Bornhäuser
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, Dresden, Germany
- National Center for Tumor Diseases, Dresden (NCT/UCC), Dresden, Germany
- German Consortium for Translational Cancer Research, DKFZ, Heidelberg, Germany; and
| | - Karsten Wendt
- Institute of Circuits and Systems, Technical University Dresden, Dresden, Germany
| | - Jan Moritz Middeke
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, Dresden, Germany
| |
Collapse
|
5
|
Zhang W, Robbins K, Wang Y, Bertrand K, Rekaya R. A jackknife-like method for classification and uncertainty assessment of multi-category tumor samples using gene expression information. BMC Genomics 2010; 11:273. [PMID: 20429942 PMCID: PMC2876124 DOI: 10.1186/1471-2164-11-273] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2009] [Accepted: 04/29/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of gene expression profiling for the classification of human cancer tumors has been widely investigated. Previous studies were successful in distinguishing several tumor types in binary problems. As there are over a hundred types of cancers, and potentially even more subtypes, it is essential to develop multi-category methodologies for molecular classification for any meaningful practical application. RESULTS A jackknife-based supervised learning method called paired-samples test algorithm (PST), coupled with a binary classification model based on linear regression, was proposed and applied to two well known and challenging datasets consisting of 14 (GCM dataset) and 9 (NC160 dataset) tumor types. The results showed that the proposed method improved the prediction accuracy of the test samples for the GCM dataset, especially when t-statistic was used in the primary feature selection. For the NCI60 dataset, the application of PST improved prediction accuracy when the numbers of used genes were relatively small (100 or 200). These improvements made the binary classification method more robust to the gene selection mechanism and the size of genes to be used. The overall prediction accuracies were competitive in comparison to the most accurate results obtained by several previous studies on the same datasets and with other methods. Furthermore, the relative confidence R(T) provided a unique insight into the sources of the uncertainty shown in the statistical classification and the potential variants within the same tumor type. CONCLUSION We proposed a novel bagging method for the classification and uncertainty assessment of multi-category tumor samples using gene expression information. The strengths were demonstrated in the application to two bench datasets.
Collapse
Affiliation(s)
- Wensheng Zhang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA
| | | | | | | | | |
Collapse
|
6
|
Mapping multi-class cancers and clinical outcomes prediction for multiple classifications of microarray gene expression data. KOREAN J CHEM ENG 2010. [DOI: 10.1007/s11814-009-0161-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
7
|
Array of hope: expression profiling identifies disease biomarkers and mechanism. Biochem Soc Trans 2009; 37:855-62. [PMID: 19614607 DOI: 10.1042/bst0370855] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
High-throughput, genome-wide analytical technologies are now commonly used in all fields of medical research. The most commonly applied of these technologies, gene expression microarrays, have been shown to be both accurate and precise when properly implemented. For over a decade, microarrays have provided novel insight into many complex human diseases. Microarray-based discovery can be classified into three components, biomarker detection, disease (sub)classification and identification of causal mechanism, in order of accomplishment. Within the respiratory system, the application of microarrays has achieved significant success in all components, particularly with respect to lung cancer. Numerous studies over the last half-decade have applied this technology to the characterization of non-malignant respiratory diseases, animal models of respiratory disease and normal developmental processes. Studies of obstructive lung diseases by many groups, including our own, have yielded not only disease biomarkers, but also some novel putative pathogenic mechanisms. We have successfully used an integrative genomics approach, combining microarray analysis with human genetics, to identify susceptibility genes for COPD (chronic obstructive pulmonary disease). Interestingly, we find that the assessment of quantitative phenotypic variables enhances gene discovery. Our studies contribute to the identification of obstructive lung disease biomarkers, provide data associated with disease phenotypes and support the use of an integrated approach to move beyond marker identification to mechanism discovery.
Collapse
|
8
|
Yang TY. Simple Bayesian binary framework for discovering significant genes and classifying cancer diagnosis. Comput Stat Data Anal 2009. [DOI: 10.1016/j.csda.2008.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
9
|
Efficient multi-class cancer diagnosis algorithm, using a global similarity pattern. Comput Stat Data Anal 2009. [DOI: 10.1016/j.csda.2008.08.028] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
10
|
Bhattacharya S, Srisuma S, Demeo DL, Shapiro SD, Bueno R, Silverman EK, Reilly JJ, Mariani TJ. Molecular biomarkers for quantitative and discrete COPD phenotypes. Am J Respir Cell Mol Biol 2008; 40:359-67. [PMID: 18849563 DOI: 10.1165/rcmb.2008-0114oc] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is an inflammatory lung disorder with complex pathological features and largely unknown etiology. The identification of biomarkers for this disease could aid the development of methods to facilitate earlier diagnosis, the classification of disease subtypes, and provide a means to define therapeutic response. To identify gene expression biomarkers, we completed expression profiling of RNA derived from the lung tissue of 56 subjects with varying degrees of airflow obstruction using the Affymetrix U133 Plus 2.0 array. We applied multiple, independent analytical methods to define biomarkers for either discrete or quantitative disease phenotypes. Analysis of differential expression between cases (n = 15) and controls (n = 18) identified a set of 65 discrete biomarkers. Correlation of gene expression with quantitative measures of airflow obstruction (FEV(1)%predicted or FEV(1)/FVC) identified a set of 220 biomarkers. Biomarker genes were enriched in functions related to DNA binding and regulation of transcription. We used this group of biomarkers to predict disease in an unrelated data set, generated from patients with severe emphysema, with 97% accuracy. Our data contribute to the understanding of gene expression changes occurring in the lung tissue of patients with obstructive lung disease and provide additional insight into potential mechanisms involved in the disease process. Furthermore, we present the first gene expression biomarker for COPD validated in an independent data set.
Collapse
Affiliation(s)
- Soumyaroop Bhattacharya
- Division of Neonatology and Center for Pediatric Biomedical Research, Department of Pediatrics, University of Rochester, Box 703, 601 Elmwood Avenue, Rochester, NY 14642, USA
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Yoo C, Gernaey KV. Classification and Diagnostic Output Prediction of Cancer Using Gene Expression Profiling and Supervised Machine Learning Algorithms. JOURNAL OF CHEMICAL ENGINEERING OF JAPAN 2008. [DOI: 10.1252/jcej.08we042] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Changkyoo Yoo
- College of Environment and Applied Chemistry, Green Energy Center/Center for Environmental Studies, Kyung Hee University
| | - Krist V. Gernaey
- Department of Chemical Engineering, Technical University of Denmark
| |
Collapse
|
12
|
Craig FE, Johnson LR, Harvey SAK, Nalesnik MA, Luo JH, Bhattacharya SD, Swerdlow SH. Gene expression profiling of Epstein-Barr virus-positive and -negative monomorphic B-cell posttransplant lymphoproliferative disorders. ACTA ACUST UNITED AC 2007; 16:158-68. [PMID: 17721324 DOI: 10.1097/pdm.0b013e31804f54a9] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Although most posttransplant lymphoproliferative disorders (PTLD) are related to Epstein-Barr virus (EBV) infection, approximately 20% lack detectable EBV (EBV-). It is uncertain whether the latter cases are truly distinct from EBV+ PTLD or possibly relate to another infectious agent. This study used gene expression profiling to further investigate the relationship between EBV+ and EBV- monomorphic B-cell PTLD, and to search for clues to their pathogenesis. Affymetrix HU133A GeneChips were used to compare 4 EBV+ and 4 EBV- cases of monomorphic B-cell PTLD. Hierarchical clustering successfully distinguished the EBV+ and EBV- groups. Relative to EBV- PTLD, 54 transcripts were over-expressed in EBV+ PTLD. The transcripts identified included IRF7 (a known regulator of EBV LMP1 expression), EBI2 (EBV-induced gene 2), and 3 that are interferon induced (MX1, IFITM1, and IFITM3). In addition, the EBV+ group contained 232 transcripts decreased relative to the EBV- group, including changes concordant with those previously reported after EBV infection of cultured B-cell lines. In summary, in a small group of monomorphic B-cell PTLD, EBV+ cases demonstrated a subset of gene expression changes associated with EBV infection of B cells. By contrast, EBV- PTLD lacked viral-associated changes suggesting that they are biologically distinct.
Collapse
Affiliation(s)
- Fiona E Craig
- Division of Hematopathology, Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| | | | | | | | | | | | | |
Collapse
|
13
|
MacDonald TJ, Pollack IF, Okada H, Bhattacharya S, Lyons-Weiler J. Progression-associated genes in astrocytoma identified by novel microarray gene expression data reanalysis. Methods Mol Biol 2007; 377:203-22. [PMID: 17634619 DOI: 10.1007/978-1-59745-390-5_13] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Astrocytoma is graded as pilocytic (WHO grade I), diffuse (WHO grade II), anaplastic (WHO grade III), and glioblastoma multiforme (WHO grade IV). The progression from low- to high-grade astrocytoma is associated with distinct molecular changes that vary with patient age, yet the prognosis of high-grade tumors in children and adults is equally dismal. Whether specific gene expression changes are consistently associated with all high-grade astrocytomas, independent of patient age, is not known. To address this question, we reanalyzed the microarray datasets comprising astrocytomas from children and adults, respectively. We identified nine genes consistently dysregulated in high-grade tumors, using four novel tests for identifying differentially expressed genes. Four genes encoding ribosomal proteins (RPS2, RPS8, RPS18, RPL37A) were upregulated, and five genes (APOD, SORL1, SPOCK2, PRSS11, ID3) were downregulated in high-grade by all tests. Expression results were validated using a third astrocytoma dataset. APOD, the most differentially expressed gene, has been shown to inhibit tumor cell and vascular smooth muscle cell proliferation. This suggests that dysregulation of APOD may be critical for malignant astrocytoma formation, and thus a possible novel universal target for therapeutic intervention. Further investigation is needed to evaluate the role of APOD, as well as the other genes identified, in malignant astrocytoma development.
Collapse
Affiliation(s)
- Tobey J MacDonald
- Center for Cancer and Immunology Research, Children's Research Institute, Department of Hematology-Oncology, Children's National Medical Center, Washington, DC, USA
| | | | | | | | | |
Collapse
|
14
|
Pavlidis P, Poirazi P. Individualized markers optimize class prediction of microarray data. BMC Bioinformatics 2006; 7:345. [PMID: 16842618 PMCID: PMC1569876 DOI: 10.1186/1471-2105-7-345] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2006] [Accepted: 07/14/2006] [Indexed: 11/17/2022] Open
Abstract
Background Identification of molecular markers for the classification of microarray data is a challenging task. Despite the evident dissimilarity in various characteristics of biological samples belonging to the same category, most of the marker – selection and classification methods do not consider this variability. In general, feature selection methods aim at identifying a common set of genes whose combined expression profiles can accurately predict the category of all samples. Here, we argue that this simplified approach is often unable to capture the complexity of a disease phenotype and we propose an alternative method that takes into account the individuality of each patient-sample. Results Instead of using the same features for the classification of all samples, the proposed technique starts by creating a pool of informative gene-features. For each sample, the method selects a subset of these features whose expression profiles are most likely to accurately predict the sample's category. Different subsets are utilized for different samples and the outcomes are combined in a hierarchical framework for the classification of all samples. Moreover, this approach can innately identify subgroups of samples within a given class which share common feature sets thus highlighting the effect of individuality on gene expression. Conclusion In addition to high classification accuracy, the proposed method offers a more individualized approach for the identification of biological markers, which may help in better understanding the molecular background of a disease and emphasize the need for more flexible medical interventions.
Collapse
Affiliation(s)
- Pavlos Pavlidis
- Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology-Hellas (FORTH), Vassilika Vouton PO Box 1385, GR-71110, Heraklion, Crete, Greece
- Department of Biology, University of Crete, PO Box 2208, GR-71409, Heraklion, Crete, Greece
| | - Panayiota Poirazi
- Institute of Molecular Biology and Biotechnology (IMBB), Foundation for Research and Technology-Hellas (FORTH), Vassilika Vouton PO Box 1385, GR-71110, Heraklion, Crete, Greece
| |
Collapse
|
15
|
Abstract
Human cancer is caused by multiple factors, such as genetic predisposition, chronic persistent inflammation, environmental factors, life style, and aging. Dysregulated proliferation, dysregulated adhesion, resistance to apoptosis, resistance to senescence, and resistance to anti-cancer drugs are features of cancer cells. Accumulation of multiple epigenetic changes and genetic alterations of cancer-associated genes during multi-stage carcinogenesis results in more malignant phenotypes. Post-genome science is characterized by omics data related to genome, transcriptome, proteome, metabolome, interactome, and epigenome as well as by high-throughput technology, such as whole-genome tiling oligonucleotide array, array CGH with 32,433 overlapping BAC clones, transcriptome microarray, mass spectrometry, tissue-based expression array, and cell-based transfection array. Benchtop oncology supplies Desktop oncology with large amounts of omics data produced by high-throughput technology. Desktop oncology establishes knowledge on cancer-related biomarkers, such as predisposition markers, diagnostic markers, prognostic markers, and therapeutic markers, by using bioinformatics and human intelligence of experts for data mining and text mining. Bedside oncology applies the knowledge established by Desktop oncology to determine therapeutics for cancer patients. Antibody drugs (Trastuzumab/Herceptin, Cetuximab/Erbitux, Bevacizumab/Avastin, et cetera), small molecule inhibitors for tyrosine kinases (Gefitinib/Iressa, Erlotinib/Tarceva, Imatinib/Gleevec, et cetera), conventional cytotoxic drugs, and anti-hormonal drugs are used for cancer chemotherapy. Biomarker monitoring contributes to therapeutic optional choice and drug dosage determination for cancer patients. Knowledge on biomarkers is feedforwarded from desktop to bedside in the translational research, and then biomarker monitoring is feedbacked from bedside to desktop in the reverse translational research. Desktop oncology is indispensable for cancer research in the post-genome era. Combination of genetic screening for cancer predisposition in the general population and precise selection of therapeutic options during cancer management could contribute to the realization of personalized prevention and to dramatically improve the prognosis of cancer patients in the future.
Collapse
Affiliation(s)
- Masuko Katoh
- M & M Medical BioInformatics, Hongo 113-0033, Japan
| | | |
Collapse
|
16
|
Tuck DP, Kluger HM, Kluger Y. Characterizing disease states from topological properties of transcriptional regulatory networks. BMC Bioinformatics 2006; 7:236. [PMID: 16670008 PMCID: PMC1482723 DOI: 10.1186/1471-2105-7-236] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2005] [Accepted: 05/02/2006] [Indexed: 11/20/2022] Open
Abstract
Background High throughput gene expression experiments yield large amounts of data that can augment our understanding of disease processes, in addition to classifying samples. Here we present new paradigms of data Separation based on construction of transcriptional regulatory networks for normal and abnormal cells using sequence predictions, literature based data and gene expression studies. We analyzed expression datasets from a number of diseased and normal cells, including different types of acute leukemia, and breast cancer with variable clinical outcome. Results We constructed sample-specific regulatory networks to identify links between transcription factors (TFs) and regulated genes that differentiate between healthy and diseased states. This approach carries the advantage of identifying key transcription factor-gene pairs with differential activity between healthy and diseased states rather than merely using gene expression profiles, thus alluding to processes that may be involved in gene deregulation. We then generalized this approach by studying simultaneous changes in functionality of multiple regulatory links pointing to a regulated gene or emanating from one TF (or changes in gene centrality defined by its in-degree or out-degree measures, respectively). We found that samples can often be separated based on these measures of gene centrality more robustly than using individual links. We examined distributions of distances (the number of links needed to traverse the path between each pair of genes) in the transcriptional networks for gene subsets whose collective expression profiles could best separate each dataset into predefined groups. We found that genes that optimally classify samples are concentrated in neighborhoods in the gene regulatory networks. This suggests that genes that are deregulated in diseased states exhibit a remarkable degree of connectivity. Conclusion Transcription factor-regulated gene links and centrality of genes on transcriptional networks can be used to differentiate between cell types. Transcriptional network blueprints can be used as a basis for further research into gene deregulation in diseased states.
Collapse
Affiliation(s)
- David P Tuck
- Department of Pathology, Yale University School of Medicine, New Haven, Connecticut 06510, USA
| | - Harriet M Kluger
- Department of Internat Medicine, Yale University School of Medicine, New Haven, Connecticut 06510, USA
| | - Yuval Kluger
- Department of Cell Biology, New York University School of Medicine, New York, New York 10016, USA
| |
Collapse
|
17
|
Abstract
The ability to form tenable hypotheses regarding the neurobiological basis of normative functions as well as mechanisms underlying neurodegenerative and neuropsychiatric disorders is often limited by the highly complex brain circuitry and the cellular and molecular mosaics therein. The brain is an intricate structure with heterogeneous neuronal and nonneuronal cell populations dispersed throughout the central nervous system. Varied and diverse brain functions are mediated through gene expression, and ultimately protein expression, within these cell types and interconnected circuits. Large-scale high-throughput analysis of gene expression in brain regions and individual cell populations using modern functional genomics technologies has enabled the simultaneous quantitative assessment of dozens to hundreds to thousands of genes. Technical and experimental advances in the accession of tissues, RNA amplification technologies, and the refinement of downstream genetic methodologies including microarray analysis and real-time quantitative PCR have generated a wellspring of informative studies pertinent to understanding brain structure and function. In this review, we outline the advantages as well as some of the potential challenges of applying high throughput functional genomics technologies toward a better understanding of brain tissues and diseases using animal models as well as human postmortem tissues.
Collapse
|
18
|
Bhattacharya S, Mariani TJ. Transformation of expression intensities across generations of Affymetrix microarrays using sequence matching and regression modeling. Nucleic Acids Res 2005; 33:e157. [PMID: 16224098 PMCID: PMC1258179 DOI: 10.1093/nar/gni159] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
The utility of previously generated microarray data is severely limited owing to small study size, leading to under-powered analysis, and failure of replication. Multiplicity of platforms and various sources of systematic noise limit the ability to compile existing data from similar studies. We present a model for transformation of data across different generations of Affymetrix arrays, developed using previously published datasets describing technical replicates performed with two generations of arrays. The transformation is based upon a probe set-specific regression model, generated from replicate measurements across platforms, performed using correlation coefficients. The model, when applied to the expression intensities of 5069 shared, sequence-matched probe sets in three different generations of Affymetrix Human oligonucleotide arrays, showed significant improvement in inter generation correlations between sample-wide means and individual probe set pairs. The approach was further validated by an observed reduction in Euclidean distance between signal intensities across generations for the predicted values. Finally, application of the model to independent, but related datasets resulted in improved clustering of samples based upon their biological, as opposed to technical, attributes. Our results suggest that this transformation method is a valuable tool for integrating microarray datasets from different generations of arrays.
Collapse
Affiliation(s)
- Soumyaroop Bhattacharya
- Division of Pulmonary Medicine, Pulmonary Medicine Thorn 908, Brigham and Women's Hospital, Boston, MA 02115, USA.
| | | |
Collapse
|
19
|
Baty F, Bihl MP, Perrière G, Culhane AC, Brutsche MH. Optimized between-group classification: a new jackknife-based gene selection procedure for genome-wide expression data. BMC Bioinformatics 2005; 6:239. [PMID: 16191195 PMCID: PMC1261161 DOI: 10.1186/1471-2105-6-239] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2005] [Accepted: 09/28/2005] [Indexed: 11/10/2022] Open
Abstract
Background A recent publication described a supervised classification method for microarray data: Between Group Analysis (BGA). This method which is based on performing multivariate ordination of groups proved to be very efficient for both classification of samples into pre-defined groups and disease class prediction of new unknown samples. Classification and prediction with BGA are classically performed using the whole set of genes and no variable selection is required. We hypothesize that an optimized selection of highly discriminating genes might improve the prediction power of BGA. Results We propose an optimized between-group classification (OBC) which uses a jackknife-based gene selection procedure. OBC emphasizes classification accuracy rather than feature selection. OBC is a backward optimization procedure that maximizes the percentage of between group inertia by removing the least influential genes one by one from the analysis. This selects a subset of highly discriminative genes which optimize disease class prediction. We apply OBC to four datasets and compared it to other classification methods. Conclusion OBC considerably improved the classification and predictive accuracy of BGA, when assessed using independent data sets and leave-one-out cross-validation. Availability The R code is freely available [see Additional file 1] as well as supplementary information [see Additional file 2].
Collapse
Affiliation(s)
- Florent Baty
- Pulmonary Gene Research, University Hospital Basel, CH-4031 Basel, Switzerland
| | - Michel P Bihl
- Pulmonary Gene Research, University Hospital Basel, CH-4031 Basel, Switzerland
| | - Guy Perrière
- Laboratoire de Biométrie et de Biologie Évolutive, UMR CNRS 5558, Université Claude Bernard Lyon 1, 43 blvd du 11 Novembre, 1918, 69622 Villeurbanne Cedex, France
| | - Aedín C Culhane
- Bioinformatics Conway Institute, University College Dublin, Ireland
| | - Martin H Brutsche
- Pulmonary Gene Research, University Hospital Basel, CH-4031 Basel, Switzerland
| |
Collapse
|
20
|
Guinn BA, Gilkes AF, Woodward E, Westwood NB, Mufti GJ, Linch D, Burnett AK, Mills KI. Microarray analysis of tumour antigen expression in presentation acute myeloid leukaemia. Biochem Biophys Res Commun 2005; 333:703-13. [PMID: 15963951 DOI: 10.1016/j.bbrc.2005.05.161] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2005] [Accepted: 05/25/2005] [Indexed: 10/25/2022]
Abstract
Acute myeloid leukaemia (AML) is a difficult to treat disease, especially for those patients who have no eligible haematopoietic stem cell (HSC) donor. One of the most promising treatment options for these patients is immunotherapy. To investigate the expression of known tumour antigens in AML, we analysed microarray data from 124 presentation AML patient samples and investigated the present/absent calls of 82 tumour-specific or -associated antigens. We found 11 antigens which were expressed in AML patient samples but not normal donors. Nine of these were cancer-testis (CT) antigens, previously shown to be expressed in tumour cells and immunologically protected sites and at very low levels, if at all, in normal tissues. Expression was confirmed using real-time PCR. We have identified a number of CT antigens with expression in presentation AML samples but not normal donor samples, which may provide effective targets for future immunotherapy treatments early in disease.
Collapse
Affiliation(s)
- Barbara-Ann Guinn
- Department of Haematological Medicine, Guy's, King's and St. Thomas' School of Medicine, King's College London, London SE5 9NU, UK.
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Huang S, Luangtongkum T, Morishita TY, Zhang Q. Molecular typing of Campylobacter strains using the cmp gene encoding the major outer membrane protein. Foodborne Pathog Dis 2005; 2:12-23. [PMID: 15992295 DOI: 10.1089/fpd.2005.2.12] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Thermophilic Campylobacter, particularly Campylobacter jejuni, is one of the major foodborne human pathogens of animal origin. Reliable and sensitive typing tools are required for understanding the epidemiology and ecology of this zoonotic bacteria agent. Currently, several molecular typing methods are available for differentiating Campylobacter strains, but each of them has limitations. Our previous study revealed that considerable sequence polymorphism exists in the cmp gene encoding the major outer membrane protein of Campylobacter and suggested that sequence variation of cmp may be utilized for discrimination of Campylobacter strains. In this study, we evaluated the feasibility of the cmp-based typing tool, using pulsed-field gel electrophoresis (PFGE) as the "gold" standard for comparison. The cmp alleles were sequenced from multiple Campylobacter strains, grouped, and compared with the PFGE profiles of these strains using Bionumerics. Results showed that 43 cmp sequence types and 43 PFGE types existed among the 60 Campylobacter isolates. Typeability of these strains is 100% using either the cmp-based method or PFGE. The discrimination indices are 0.973 for the cmp-based method and 0.969 for PFGE, respectively. The cmp sequence types are 77.6% congruent with the PFGE types. These results indicate that the cmp-based typing is a simple, yet highly discriminatory approach for molecular differentiation of C. jejuni strains.
Collapse
Affiliation(s)
- Shouxiong Huang
- Food Animal Health Research Program, Ohio Agriculture Research and Development Center, The Ohio State University, Wooster, Ohio, USA
| | | | | | | |
Collapse
|
22
|
Yoo C, Lee IB, Vanrolleghem PA. Interpreting patterns and analysis of acute leukemia gene expression data by multivariate fuzzy statistical analysis. Comput Chem Eng 2005. [DOI: 10.1016/j.compchemeng.2005.02.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
23
|
Syed V, Mukherjee K, Lyons-Weiler J, Lau KM, Mashima T, Tsuruo T, Ho SM. Identification of ATF-3, caveolin-1, DLC-1, and NM23-H2 as putative antitumorigenic, progesterone-regulated genes for ovarian cancer cells by gene profiling. Oncogene 2005; 24:1774-87. [PMID: 15674352 DOI: 10.1038/sj.onc.1207991] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Although progesterone (P4) has been implicated to offer protection against ovarian cancer (OCa), little is known of its mechanism of action. The goal of this study was to identify P4-regulated genes that have anti-OCa action. Three immortalized nontumorigenic human ovarian surface epithelial (HOSE) cell lines and three OCa (OVCA) cell lines were subjected to 5 days of P4 treatment. Transcriptional profiling with a cDNA microarray containing approximately 2400 known genes was used to identify genes (1) whose expression was consistently downregulated in OVCA cell lines compared to HOSE cell lines, and (2) whose expression was restored in OCa cell lines by P4 treatment. From the candidates selected, activating transcription factor-3 (ATF-3), caveolin-1, deleted in liver cancer-1 (DLC-1), and nonmetastatic clone 23 (NM23-H2) were chosen for post hoc functional studies based on their previously reported action as tumor suppressors or apoptosis inducers. Semiquantitative RT-PCR analyses confirmed loss of or reduced transcription of these genes in OVCA cells when compared to HOSE cells and their upregulation following P4 treatment. Hormonal specificity was demonstrated by blockade experiments with a progestin antagonist RU 38486. Ectopic expression of caveolin-1, DLC-1, and NM23-H2 caused growth inhibition in OVCA cell cultures, but not in HOSE cell cultures, while forced expression of ATF-3 suppressed growth in both. Overexpression of AFT-3 also enhanced caspase-3 activity in both HOSE and OVCA cells, whereas ectopic expression of caveolin-1 and DLC-1 only activated this enzyme in OCa cells. In contrast, NM23-H2 overexpression was ineffective in activating caspase-3. Overexpression of any of the four genes in OCa cells reduced soft-agar colony formation and cell invasiveness. Taken together, we have identified four new P4-regulated, antitumor genes for OCa. However, their modes of action differ significantly; ATF-3 primarily functions as an apoptosis inducer, NM23-H2 as a suppressor of cell motility, and caveolin-1 and DLC-1 exhibiting features of classical tumor suppressors. To the best of our knowledge, except for NM23-H2, this is the first report linking P4 to the regulation of these tumor suppressor/proapoptotic genes, which could serve as future therapeutic targets.
Collapse
Affiliation(s)
- Viqar Syed
- Department of Surgery, University of Massachusetts Medical School, Worcester, MA 06105, USA
| | | | | | | | | | | | | |
Collapse
|
24
|
Wang H, Huang H. SED, a normalization free method for DNA microarray data analysis. BMC Bioinformatics 2004; 5:121. [PMID: 15345033 PMCID: PMC517708 DOI: 10.1186/1471-2105-5-121] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2004] [Accepted: 09/02/2004] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Analysis of DNA microarray data usually begins with a normalization step where intensities of different arrays are adjusted to the same scale so that the intensity levels from different arrays can be compared with one other. Both simple total array intensity-based as well as more complex "local intensity level" dependent normalization methods have been developed, some of which are widely used. Much less developed methods for microarray data analysis include those that bypass the normalization step and therefore yield results that are not confounded by potential normalization errors. RESULTS Instead of focusing on the raw intensity levels, we developed a new method for microarray data analysis that maps each gene's expression intensity level to a high dimensional space of SEDs (Signs of Expression Difference), the signs of the expression intensity difference between a given gene and every other gene on the array. Since SED are unchanged under any monotonic transformation of intensity levels, the SED based method is normalization free. When tested on a multi-class tumor classification problem, simple Naive Bayes and Nearest Neighbor methods using the SED approach gave results comparable with normalized intensity-based algorithms. Furthermore, a high percentage of classifiers based on a single gene's SED gave good classification results, suggesting that SED does capture essential information from the intensity levels. CONCLUSION The results of testing this new method on multi-class tumor classification problems suggests that the SED-based, normalization-free method of microarray data analysis is feasible and promising.
Collapse
Affiliation(s)
- Huajun Wang
- Oscient Pharmaceuticals Corporation, 100 Beaver St, Waltham, Massachusetts 02453, USA
| | - Hui Huang
- Oscient Pharmaceuticals Corporation, 100 Beaver St, Waltham, Massachusetts 02453, USA
| |
Collapse
|
25
|
Gohil K, Chakraborty AA. Applications of microarray and bioinformatics tools to dissect molecular responses of the central nervous system to antioxidant micronutrients. Nutrition 2004; 20:50-5. [PMID: 14698014 DOI: 10.1016/j.nut.2003.09.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Kishorchandra Gohil
- Center for Comparative Respiratory Biology and Medicine, Department of Internal Medicine, University of California, Davis, California 95616, USA.
| | | |
Collapse
|
26
|
Abstract
Abstract
Genomic responses to nutrients are important determinants of physiological and pathological functions of living systems. Many of these responses are mediated by changes in mRNA concentrations that are primarily regulated by gene transcription. Transcriptional networks that regulate the expression and activities of transcription factors and structural genes in response to nutrients need to be defined. The tools of functional genomics and bioinformatics offer powerful means to address these needs. The application of global mRNA profiling tools to define genome-wide responses to nutrients and micronutrients with a primary focus on in vivo genomic responses of vital organs of laboratory mice is reviewed here. The studies show that major and minor nutrients affect the expression of mRNAs that are related to aging and inflammation, and chemically diverse micronutrients such as polyphenols and tocopherols may exert their effects through modulating the expression of functionally related genes.
Collapse
Affiliation(s)
- Kishorchandra Gohil
- Center for Comparative Respiratory Biology and Medicine, Department of Internal Medicine, University of California, Davis, CA 95616, USA.
| |
Collapse
|
27
|
A primer on gene expression and microarrays for machine learning researchers. J Biomed Inform 2004; 37:293-303. [DOI: 10.1016/j.jbi.2004.07.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2004] [Indexed: 01/09/2023]
|
28
|
Borczuk AC, Shah L, Pearson GDN, Walter KL, Wang L, Austin JHM, Friedman RA, Powell CA. Molecular Signatures in Biopsy Specimens of Lung Cancer. Am J Respir Crit Care Med 2004; 170:167-74. [PMID: 15087295 DOI: 10.1164/rccm.200401-066oc] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Gene expression profiles of resected tumors may predict treatment response and outcome. We hypothesized that profiles derived from lung tumor biopsies would discriminate tumor-specific gene signatures and provide predictive information about outcome. Lung carcinoma specimens were obtained from 23 patients undergoing computed tomography-guided transthoracic biopsy or endobronchial brushing for undiagnosed nodules. Excess tissue was processed for gene profiling. We built class prediction models for lung cancer histology and for cancer outcome. The histology model used an F test to identify 99 genes that were differentially expressed among lung cancer subtypes. The histology validation set class prediction accuracy rate was 86%. The outcome model used the maximum difference subset algorithm to identify 42 genes associated with high risk for cancer death. The outcome training set class prediction accuracy rate was 87%. In conclusion, gene expression profiles of biopsy specimens of lung cancers identify unique tumoral signatures that provide information about tissue morphology and prognosis. The use of specimens acquired from lung biopsy procedures to identify biomarkers of clinical outcome may have application in the management of patients with lung cancer. The procedures are safe and feasible; the efficacy and utility of this strategy will ultimately be determined by prospective clinical trials.
Collapse
Affiliation(s)
- Alain C Borczuk
- Department of Pathology, Columbia University College of Physicians and Surgeons, New York, NY 10032, USA
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2003. [PMCID: PMC2447285 DOI: 10.1002/cfg.230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|