Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lee G, Rodriguez C, Madabhushi A. Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. IEEE/ACM Trans Comput Biol Bioinform 2008;5:368-84. [PMID: 18670041 PMCID: PMC2562675 DOI: 10.1109/tcbb.2008.36] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]

For:	Lee G, Rodriguez C, Madabhushi A. Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. IEEE/ACM Trans Comput Biol Bioinform 2008;5:368-84. [PMID: 18670041 PMCID: PMC2562675 DOI: 10.1109/tcbb.2008.36] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]

Number

Cited by Other Article(s)

Binson VA, Thomas S, Subramoniam M, Arun J, Naveen S, Madhu S. A Review of Machine Learning Algorithms for Biomedical Applications. Ann Biomed Eng 2024;52:1159-1183. [PMID: 38383870 DOI: 10.1007/s10439-024-03459-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 01/24/2024] [Indexed: 02/23/2024]

M S K, Rajaguru H, Nair AR. Enhancement of Classifier Performance with Adam and RanAdam Hyper-Parameter Tuning for Lung Cancer Detection from Microarray Data-In Pursuit of Precision. Bioengineering (Basel) 2024;11:314. [PMID: 38671736 PMCID: PMC11047746 DOI: 10.3390/bioengineering11040314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 03/18/2024] [Accepted: 03/20/2024] [Indexed: 04/28/2024] Open

Medina-Ortiz D, Contreras S, Quiroz C, Olivera-Nappa Á. Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets. Front Mol Biosci 2020;7:13. [PMID: 32118039 PMCID: PMC7031350 DOI: 10.3389/fmolb.2020.00013] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 01/22/2020] [Indexed: 11/13/2022] Open

Abstract

In highly non-linear datasets, attributes or features do not allow readily finding visual patterns for identifying common underlying behaviors. Therefore, it is not possible to achieve classification or regression using linear or mildly non-linear hyperspace partition functions. Hence, supervised learning models based on the application of most existing algorithms are limited, and their performance metrics are low. Linear transformations of variables, such as principal components analysis, cannot avoid the problem, and even models based on artificial neural networks and deep learning are unable to improve the metrics. Sometimes, even when features allow classification or regression in reported cases, performance metrics of supervised learning algorithms remain unsatisfyingly low. This problem is recurrent in many areas of study as, per example, the clinical, biotechnological, and protein engineering areas, where many of the attributes are correlated in an unknown and very non-linear fashion or are categorical and difficult to relate to a target response variable. In such areas, being able to create predictive models would dramatically impact the quality of their outcomes, generating an immediate added value for both the scientific and general public. In this manuscript, we present RV-Clustering, a library of unsupervised learning algorithms, and a new methodology designed to find optimum partitions within highly non-linear datasets that allow deconvoluting variables and notoriously improving performance metrics in supervised learning classification or regression models. The partitions obtained are statistically cross-validated, ensuring correct representativity and no over-fitting. We have successfully tested RV-Clustering in several highly non-linear datasets with different origins. The approach herein proposed has generated classification and regression models with high-performance metrics, which further supports its ability to generate predictive models for highly non-linear datasets. Advantageously, the method does not require significant human input, which guarantees a higher usability in the biological, biomedical, and protein engineering community with no specific knowledge in the machine learning area.

Collapse

Seok HS. Performance comparison of dimensionality reduction methods on RNA-Seq data from the GTEx project. Genes Genomics 2019;42:225-234. [PMID: 31833048 DOI: 10.1007/s13258-019-00896-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 11/22/2019] [Indexed: 11/25/2022]

Akazawa Y, Mizuno S, Fujinami N, Suzuki T, Yoshioka Y, Ochiya T, Nakamoto Y, Nakatsura T. Usefulness of serum microRNA as a predictive marker of recurrence and prognosis in biliary tract cancer after radical surgery. Sci Rep 2019;9:5925. [PMID: 30976046 PMCID: PMC6459925 DOI: 10.1038/s41598-019-42392-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 03/20/2019] [Indexed: 02/06/2023] Open

Pfaffl MW, Riedmaier-Sprenzel I. New surveillance concepts in food safety in meat producing animals: the advantage of high throughput 'omics' technologies - A review. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2018;31:1062-1071. [PMID: 29879820 PMCID: PMC6039326 DOI: 10.5713/ajas.18.0155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 05/23/2018] [Indexed: 12/14/2022]

Abstract

The misuse of anabolic hormones or illegal drugs is a ubiquitous problem in animal husbandry and in food safety. The ban on growth promotants in food producing animals in the European Union is well controlled. However, application regimens that are difficult to detect persist, including newly designed anabolic drugs and complex hormone cocktails. Therefore identification of molecular endogenous biomarkers which are based on the physiological response after the illicit treatment has become a focus of detection methods. The analysis of the ‘transcriptome’ has been shown to have promise to discover the misuse of anabolic drugs, by indirect detection of their pharmacological action in organs or selected tissues. Various studies have measured gene expression changes after illegal drug or hormone application. So-called transcriptomic biomarkers were quantified at the mRNA and/or microRNA level by reverse transcription-quantitative polymerase chain reaction (RT-qPCR) technology or by more modern ‘omics’ and high throughput technologies including RNA-sequencing (RNA-Seq). With the addition of advanced bioinformatical approaches such as hierarchical clustering analysis or dynamic principal components analysis, a valid ‘biomarker signature’ can be established to discriminate between treated and untreated individuals. It has been shown in numerous animal and cell culture studies, that identification of treated animals is possible via our transcriptional biomarker approach. The high throughput sequencing approach is also capable of discovering new biomarker candidates and, in combination with quantitative RT-qPCR, validation and confirmation of biomarkers has been possible. These results from animal production and food safety studies demonstrate that analysis of the transcriptome has high potential as a new screening method using transcriptional ‘biomarker signatures’ based on the physiological response triggered by illegal substances.

Collapse

Bhargava R, Madabhushi A. Emerging Themes in Image Informatics and Molecular Analysis for Digital Pathology. Annu Rev Biomed Eng 2017;18:387-412. [PMID: 27420575 DOI: 10.1146/annurev-bioeng-112415-114722] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Alilou M, Beig N, Orooji M, Rajiah P, Velcheti V, Rakshit S, Reddy N, Yang M, Jacono F, Gilkeson RC, Linden P, Madabhushi A. An integrated segmentation and shape-based classification scheme for distinguishing adenocarcinomas from granulomas on lung CT. Med Phys 2017;44:3556-3569. [PMID: 28295386 DOI: 10.1002/mp.12208] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2016] [Revised: 02/20/2017] [Accepted: 02/27/2017] [Indexed: 12/30/2022] Open

Abstract

PURPOSE

Distinguishing between benign granulmoas and adenocarcinomas is confounded by their similar visual appearance on routine CT scans. Unfortunately, owing to the inability to discriminate these lesions radigraphically, many patients with benign granulomas are subjected to unnecessary surgical wedge resections and biopsies for pathologic confirmation of cancer presence or absence. This suggests the need for improved computerized characterization of these nodules in order to distinguish between these two classes of lesions on CT scans. While there has been substantial interest in the use of textural analysis for radiomic characterization of lung nodules, relatively less work has been done in shape based characterization of lung nodules, particularly with respect to granulmoas and adenocarcinomas. The primary goal of this study is to evaluate the role of 3D shape features for discrimination of benign granulomas from malignant adenocarcinomas on lung CT images. Towards this end we present an integrated framework for segmentation, feature characterization and classification of these nodules on CT.

METHODS

The nodule segmentation method starts with separation of lung regions from the surrounding lung anatomy. Next, the lung CT scans are projected into and represented in a three dimensional spectral embedding (SE) space, allowing for better determination of the boundaries of the nodule. This then enables the application of a gradient vector flow active contour (SEGvAC) model for nodule boundary extraction. A set of 24 shape features from both 2D slices and 3D surface of the segmented nodules are extracted, including features pertaining to the angularity, spiculation, elongation and nodule compactness. A feature selection scheme, PCA-VIP, is employed to identify the most discriminating set of features to distinguish granulmoas from adenocarcinomas within a learning set of 82 patients. The features thus identified were then combined with a support vector machine classifier and independently validated on a distinct test set comprising 67 patients. The performance of the classifier for both of the training and validation cohorts was evaluated by the area under receiver characteristic curve (ROC).

RESULTS

We used 82 and 67 studies from two different institutions respectively for training and independent validation of the model and the shape features. The Dice coefficient between automatically segmented nodules by SEGvAC and the manual delineations by expert radiologists (readers) was 0.84± 0.04 whereas inter-reader segmentation agreement was 0.79± 0.12. We also identified a set of consistent features (Roughness, Convexity and Spherecity) that were found to be strongly correlated across both manual and automated nodule segmentations (R > 0.80, p < 0.0001) and capture the marginal smoothness and 3D compactness of the nodules. On the independent validation set of 67 studies our classifier yielded a ROC AUC of 0.72 and 0.64 for manually- and automatically segmented nodules respectively. On a subset of 20 studies, the AUCs for the two expert radiologists and 1 pulmonologist were found to be 0.82, 0.68 and 0.58 respectively.

CONCLUSIONS

The major finding of this study was that certain shape features appear to differentially express between granulomas and adenocarcinomas and thus computer extracted shape cues could be used to distinguish these radiographically similar pathologies.

Collapse

Viswanath SE, Tiwari P, Lee G, Madabhushi A. Dimensionality reduction-based fusion approaches for imaging and non-imaging biomedical data: concepts, workflow, and use-cases. BMC Med Imaging 2017;17:2. [PMID: 28056889 PMCID: PMC5217665 DOI: 10.1186/s12880-016-0172-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 12/09/2016] [Indexed: 12/27/2022] Open

Abstract

BACKGROUND

With a wide array of multi-modal, multi-protocol, and multi-scale biomedical data being routinely acquired for disease characterization, there is a pressing need for quantitative tools to combine these varied channels of information. The goal of these integrated predictors is to combine these varied sources of information, while improving on the predictive ability of any individual modality. A number of application-specific data fusion methods have been previously proposed in the literature which have attempted to reconcile the differences in dimensionalities and length scales across different modalities. Our objective in this paper was to help identify metholodological choices that need to be made in order to build a data fusion technique, as it is not always clear which strategy is optimal for a particular problem. As a comprehensive review of all possible data fusion methods was outside the scope of this paper, we have focused on fusion approaches that employ dimensionality reduction (DR).

METHODS

In this work, we quantitatively evaluate 4 non-overlapping existing instantiations of DR-based data fusion, within 3 different biomedical applications comprising over 100 studies. These instantiations utilized different knowledge representation and knowledge fusion methods, allowing us to examine the interplay of these modules in the context of data fusion. The use cases considered in this work involve the integration of (a) radiomics features from T2w MRI with peak area features from MR spectroscopy for identification of prostate cancer in vivo, (b) histomorphometric features (quantitative features extracted from histopathology) with protein mass spectrometry features for predicting 5 year biochemical recurrence in prostate cancer patients, and (c) volumetric measurements on T1w MRI with protein expression features to discriminate between patients with and without Alzheimers' Disease.

RESULTS AND CONCLUSIONS

Our preliminary results in these specific use cases indicated that the use of kernel representations in conjunction with DR-based fusion may be most effective, as a weighted multi-kernel-based DR approach resulted in the highest area under the ROC curve of over 0.8. By contrast non-optimized DR-based representation and fusion methods yielded the worst predictive performance across all 3 applications. Our results suggest that when the individual modalities demonstrate relatively poor discriminability, many of the data fusion methods may not yield accurate, discriminatory representations either. In summary, to outperform the predictive ability of individual modalities, methodological choices for data fusion must explicitly account for the sparsity of and noise in the feature space.

Collapse

Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data. PLoS One 2016;11:e0159088. [PMID: 27421116 PMCID: PMC4946789 DOI: 10.1371/journal.pone.0159088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 06/27/2016] [Indexed: 11/19/2022] Open

Buschmann D, Haberberger A, Kirchner B, Spornraft M, Riedmaier I, Schelling G, Pfaffl MW. Toward reliable biomarker signatures in the age of liquid biopsies - how to standardize the small RNA-Seq workflow. Nucleic Acids Res 2016;44:5995-6018. [PMID: 27317696 PMCID: PMC5291277 DOI: 10.1093/nar/gkw545] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 06/03/2016] [Indexed: 12/21/2022] Open

Sparks R, Madabhushi A. Out-of-Sample Extrapolation utilizing Semi-Supervised Manifold Learning (OSE-SSL): Content Based Image Retrieval for Histopathology Images. Sci Rep 2016;6:27306. [PMID: 27264985 PMCID: PMC4893667 DOI: 10.1038/srep27306] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 05/16/2016] [Indexed: 12/22/2022] Open

Kumar M, Rath NK, Rath SK. Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier. J Biomed Inform 2016;60:395-409. [DOI: 10.1016/j.jbi.2016.03.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Revised: 02/28/2016] [Accepted: 03/02/2016] [Indexed: 10/22/2022]

Shimomura A, Shiino S, Kawauchi J, Takizawa S, Sakamoto H, Matsuzaki J, Ono M, Takeshita F, Niida S, Shimizu C, Fujiwara Y, Kinoshita T, Tamura K, Ochiya T. Novel combination of serum microRNA for detecting breast cancer in the early stage. Cancer Sci 2016;107:326-34. [PMID: 26749252 PMCID: PMC4814263 DOI: 10.1111/cas.12880] [Citation(s) in RCA: 238] [Impact Index Per Article: 26.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Revised: 12/23/2015] [Accepted: 01/03/2016] [Indexed: 12/18/2022] Open

Affiliation(s)

Akihiko Shimomura Department of Breast and Medical Oncology, National Cancer Center Hospital, Tokyo, Japan.,Department of Medical Oncology and Translational Research, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
Sho Shiino Department of Breast Surgery, National Cancer Center Hospital, Tokyo, Japan
Junpei Kawauchi New Frontiers Research Institute, Toray Industries, Kanagawa, Japan
Satoko Takizawa New Frontiers Research Institute, Toray Industries, Kanagawa, Japan
Hiromi Sakamoto Division of Genetics, Fundamental Innovative Oncology Core Center, National Cancer Center Research Institute, Tokyo, Japan
Juntaro Matsuzaki Division of Molecular and Cellular Medicine, Fundamental Innovative Oncology Core Center, National Cancer Center Research Institute, Tokyo, Japan
Makiko Ono Department of Breast and Medical Oncology, National Cancer Center Hospital, Tokyo, Japan.,Division of Molecular and Cellular Medicine, Fundamental Innovative Oncology Core Center, National Cancer Center Research Institute, Tokyo, Japan
Fumitaka Takeshita Department of Functional Analysis, Fundamental Innovative Oncology Core Center, National Cancer Center Research Institute, Tokyo, Japan
Shumpei Niida Medical Genome Center, National Center for Geriatrics and Gerontology, Aichi, Japan
Chikako Shimizu Department of Breast and Medical Oncology, National Cancer Center Hospital, Tokyo, Japan
Yasuhiro Fujiwara Department of Breast and Medical Oncology, National Cancer Center Hospital, Tokyo, Japan
Takayuki Kinoshita Department of Breast Surgery, National Cancer Center Hospital, Tokyo, Japan
Kenji Tamura Department of Breast and Medical Oncology, National Cancer Center Hospital, Tokyo, Japan.,Department of Medical Oncology and Translational Research, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan
Takahiro Ochiya Division of Molecular and Cellular Medicine, Fundamental Innovative Oncology Core Center, National Cancer Center Research Institute, Tokyo, Japan

Collapse

Ring BZ, Hout DR, Morris SW, Lawrence K, Schweitzer BL, Bailey DB, Lehmann BD, Pietenpol JA, Seitz RS. Generation of an algorithm based on minimal gene sets to clinically subtype triple negative breast cancer patients. BMC Cancer 2016;16:143. [PMID: 26908167 PMCID: PMC4763445 DOI: 10.1186/s12885-016-2198-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 02/17/2016] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Recently, a gene expression algorithm, TNBCtype, was developed that can divide triple-negative breast cancer (TNBC) into molecularly-defined subtypes. The algorithm has potential to provide predictive value for TNBC subtype-specific response to various treatments. TNBCtype used in a retrospective analysis of neoadjuvant clinical trial data of TNBC patients demonstrated that TNBC subtype and pathological complete response to neoadjuvant chemotherapy were significantly associated. Herein we describe an expression algorithm reduced to 101 genes with the power to subtype TNBC tumors similar to the original 2188-gene expression algorithm and predict patient outcomes.

METHODS

The new classification model was built using the same expression data sets used for the original TNBCtype algorithm. Gene set enrichment followed by shrunken centroid analysis were used for feature reduction, then elastic-net regularized linear modeling was used to identify genes for a centroid model classifying all subtypes, comprised of 101 genes. The predictive capability of both this new "lean" algorithm and the original 2188-gene model were applied to an independent clinical trial cohort of 139 TNBC patients treated initially with neoadjuvant doxorubicin/cyclophosphamide and then randomized to receive either paclitaxel or ixabepilone to determine association of pathologic complete response within the subtypes.

RESULTS

The new 101-gene expression model reproduced the classification provided by the 2188-gene algorithm and was highly concordant in the same set of seven TNBC cohorts used to generate the TNBCtype algorithm (87%), as well as in the independent clinical trial cohort (88%), when cases with significant correlations to multiple subtypes were excluded. Clinical responses to both neoadjuvant treatment arms, found BL2 to be significantly associated with poor response (Odds Ratio (OR) =0.12, p=0.03 for the 2188-gene model; OR = 0.23, p < 0.03 for the 101-gene model). Additionally, while the BL1 subtype trended towards significance in the 2188-gene model (OR = 1.91, p = 0.14), the 101-gene model demonstrated significant association with improved response in patients with the BL1 subtype (OR = 3.59, p = 0.02).

CONCLUSIONS

These results demonstrate that a model using small gene sets can recapitulate the TNBC subtypes identified by the original 2188-gene model and in the case of standard chemotherapy, the ability to predict therapeutic response.

Collapse

Hu C, Sepulcre J, Johnson KA, Fakhri GE, Lu YM, Li Q. Matched signal detection on graphs: Theory and application to brain imaging data classification. Neuroimage 2016;125:587-600. [PMID: 26481679 DOI: 10.1016/j.neuroimage.2015.10.026] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 08/11/2015] [Accepted: 10/11/2015] [Indexed: 12/23/2022] Open

Ginsburg SB, Lee G, Ali S, Madabhushi A. Feature Importance in Nonlinear Embeddings (FINE): Applications in Digital Pathology. IEEE TRANSACTIONS ON MEDICAL IMAGING 2016;35:76-88. [PMID: 26186772 DOI: 10.1109/tmi.2015.2456188] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]

Classification of microarray using MapReduce based proximal support vector machine classifier. Knowl Based Syst 2015. [DOI: 10.1016/j.knosys.2015.09.005] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Sridhar A, Doyle S, Madabhushi A. Content-based image retrieval of digitized histopathology in boosted spectrally embedded spaces. J Pathol Inform 2015;6:41. [PMID: 26167385 PMCID: PMC4498317 DOI: 10.4103/2153-3539.159441] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Accepted: 11/04/2014] [Indexed: 01/07/2023] Open

Abstract

Context:

Content-based image retrieval (CBIR) systems allow for retrieval of images from within a database that are similar in visual content to a query image. This is useful for digital pathology, where text-based descriptors alone might be inadequate to accurately describe image content. By representing images via a set of quantitative image descriptors, the similarity between a query image with respect to archived, annotated images in a database can be computed and the most similar images retrieved. Recently, non-linear dimensionality reduction methods have become popular for embedding high-dimensional data into a reduced-dimensional space while preserving local object adjacencies, thereby allowing for object similarity to be determined more accurately in the reduced-dimensional space. However, most dimensionality reduction methods implicitly assume, in computing the reduced-dimensional representation, that all features are equally important.

Aims:

In this paper we present boosted spectral embedding(BoSE), which utilizes a boosted distance metric to selectively weight individual features (based on training data) to subsequently map the data into a reduced-dimensional space.

Settings and Design:

BoSE is evaluated against spectral embedding (SE) (which employs equal feature weighting) in the context of CBIR of digitized prostate and breast cancer histopathology images.

Materials and Methods:

The following datasets, which were comprised of a total of 154 hematoxylin and eosin stained histopathology images, were used: (1) Prostate cancer histopathology (benign vs. malignant), (2) estrogen receptor (ER) + breast cancer histopathology (low vs. high grade), and (3) HER2+ breast cancer histopathology (low vs. high levels of lymphocytic infiltration).

Statistical Analysis Used:

We plotted and calculated the area under precision-recall curves (AUPRC) and calculated classification accuracy using the Random Forest classifier.

Results:

BoSE outperformed SE both in terms of CBIR-based (area under the precision-recall curve) and classifier-based (classification accuracy) on average across all of the dimensions tested for all three datasets: (1) Prostate cancer histopathology (AUPRC: BoSE = 0.79, SE = 0.63; Accuracy: BoSE = 0.93, SE = 0.80), (2) ER + breast cancer histopathology (AUPRC: BoSE = 0.79, SE = 0.68; Accuracy: BoSE = 0.96, SE = 0.96), and (3) HER2+ breast cancer histopathology (AUPRC: BoSE = 0.54, SE = 0.44; Accuracy: BoSE = 0.93, SE = 0.91).

Conclusion:

Our results suggest that BoSE could serve as an important tool for CBIR and classification of high-dimensional biomedical data.

Collapse

Kojima M, Sudo H, Kawauchi J, Takizawa S, Kondou S, Nobumasa H, Ochiai A. MicroRNA markers for the diagnosis of pancreatic and biliary-tract cancers. PLoS One 2015;10:e0118220. [PMID: 25706130 PMCID: PMC4338196 DOI: 10.1371/journal.pone.0118220] [Citation(s) in RCA: 104] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 01/11/2015] [Indexed: 12/21/2022] Open

Abstract

It is difficult to detect pancreatic cancer or biliary-tract cancer at an early stage using current diagnostic technology. Utilizing microRNA (miRNA) markers that are stably present in peripheral blood, we aimed to identify pancreatic and biliary-tract cancers in patients. With "3D-Gene", a highly sensitive microarray, we examined comprehensive miRNA expression profiles in 571 serum samples obtained from healthy patients, patients with pancreatic, biliary-tract, or other digestive cancers, and patients with non-malignant abnormalities in the pancreas or biliary tract. The samples were randomly divided into training and test cohorts, and candidate miRNA markers were independently evaluated. We found 81 miRNAs for pancreatic cancer and 66 miRNAs for biliary-tract cancer that showed statistically different expression compared with healthy controls. Among those markers, 55 miRNAs were common in both the pancreatic and biliary-tract cancer samples. The previously reported miR-125a-3p was one of the common markers; however, it was also expressed in other types of digestive-tract cancers, suggesting that it is not specific to cancer types. In order to discriminate the pancreato-biliary cancers from all other clinical conditions including the healthy controls, non-malignant abnormalities, and other types of cancers, we developed a diagnostic index using expression profiles of the 10 most significant miRNAs. A combination of eight miRNAs (miR-6075, miR-4294, miR-6880-5p, miR-6799-5p, miR-125a-3p, miR-4530, miR-6836-3p, and miR-4476) achieved a sensitivity, specificity, accuracy and AUC of 80.3%, 97.6%, 91.6% and 0.953, respectively. In contrast, CA19-9 and CEA gave sensitivities of 65.6% and 40.0%, specificities of 92.9% and 88.6%, and accuracies of 82.1% and 71.8%, respectively, in the same test cohort. This diagnostic index identified 18/21 operable pancreatic cancers and 38/48 operable biliary-tract cancers in the entire cohort. Our results suggest that the assessment of these miRNA markers is clinically valuable to identify patients with pancreato-biliary cancers who could benefit from surgical intervention.

Collapse

Lee G, Singanamalli A, Wang H, Feldman MD, Master SR, Shih NNC, Spangler E, Rebbeck T, Tomaszewski JE, Madabhushi A. Supervised multi-view canonical correlation analysis (sMVCCA): integrating histologic and proteomic features for predicting recurrent prostate cancer. IEEE TRANSACTIONS ON MEDICAL IMAGING 2015;34:284-297. [PMID: 25203987 DOI: 10.1109/tmi.2014.2355175] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Abstract

In this work, we present a new methodology to facilitate prediction of recurrent prostate cancer (CaP) following radical prostatectomy (RP) via the integration of quantitative image features and protein expression in the excised prostate. Creating a fused predictor from high-dimensional data streams is challenging because the classifier must 1) account for the "curse of dimensionality" problem, which hinders classifier performance when the number of features exceeds the number of patient studies and 2) balance potential mismatches in the number of features across different channels to avoid classifier bias towards channels with more features. Our new data integration methodology, supervised Multi-view Canonical Correlation Analysis (sMVCCA), aims to integrate infinite views of highdimensional data to provide more amenable data representations for disease classification. Additionally, we demonstrate sMVCCA using Spearman's rank correlation which, unlike Pearson's correlation, can account for nonlinear correlations and outliers. Forty CaP patients with pathological Gleason scores 6-8 were considered for this study. 21 of these men revealed biochemical recurrence (BCR) following RP, while 19 did not. For each patient, 189 quantitative histomorphometric attributes and 650 protein expression levels were extracted from the primary tumor nodule. The fused histomorphometric/proteomic representation via sMVCCA combined with a random forest classifier predicted BCR with a mean AUC of 0.74 and a maximum AUC of 0.9286. We found sMVCCA to perform statistically significantly (p < 0.05) better than comparative state-of-the-art data fusion strategies for predicting BCR. Furthermore, Kaplan-Meier analysis demonstrated improved BCR-free survival prediction for the sMVCCA-fused classifier as compared to histology or proteomic features alone.

Collapse

Multiplicative distance: a method to alleviate distance instability for high-dimensional data. Knowl Inf Syst 2014. [DOI: 10.1007/s10115-014-0813-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Classification of Microarray Data Using Kernel Fuzzy Inference System. INTERNATIONAL SCHOLARLY RESEARCH NOTICES 2014;2014:769159. [PMID: 27433543 PMCID: PMC4897118 DOI: 10.1155/2014/769159] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Revised: 05/28/2014] [Accepted: 06/12/2014] [Indexed: 12/02/2022]

Cantor-Rivera D, Khan AR, Goubran M, Mirsattari SM, Peters TM. Detection of temporal lobe epilepsy using support vector machines in multi-parametric quantitative MR imaging. Comput Med Imaging Graph 2014;41:14-28. [PMID: 25103878 DOI: 10.1016/j.compmedimag.2014.07.002] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Revised: 06/11/2014] [Accepted: 07/09/2014] [Indexed: 11/30/2022]

Abstract

The detection of MRI abnormalities that can be associated to seizures in the study of temporal lobe epilepsy (TLE) is a challenging task. In many cases, patients with a record of epileptic activity do not present any discernible MRI findings. In this domain, we propose a method that combines quantitative relaxometry and diffusion tensor imaging (DTI) with support vector machines (SVM) aiming to improve TLE detection. The main contribution of this work is two-fold: on one hand, the feature selection process, principal component analysis (PCA) transformations of the feature space, and SVM parameterization are analyzed as factors constituting a classification model and influencing its quality. On the other hand, several of these classification models are studied to determine the optimal strategy for the identification of TLE patients using data collected from multi-parametric quantitative MRI. A total of 17 TLE patients and 19 control volunteers were analyzed. Four images were considered for each subject (T1 map, T2 map, fractional anisotropy, and mean diffusivity) generating 936 regions of interest per subject, then 8 different classification models were studied, each one comprised by a distinct set of factors. Subjects were correctly classified with an accuracy of 88.9%. Further analysis revealed that the heterogeneous nature of the disease impeded an optimal outcome. After dividing patients into cohesive groups (9 left-sided seizure onset, 8 right-sided seizure onset) perfect classification for the left group was achieved (100% accuracy) whereas the accuracy for the right group remained the same (88.9%). We conclude that a linear SVM combined with an ANOVA-based feature selection+PCA method is a good alternative in scenarios like ours where feature spaces are high dimensional, and the sample size is limited. The good accuracy results and the localization of the respective features in the temporal lobe suggest that a multi-parametric quantitative MRI, ROI-based, SVM classification could be used for the identification of TLE patients. This method has the potential to improve the diagnostic assessment, especially for patients who do not have any obvious lesions in standard radiological examinations.

Collapse

Agner SC, Xu J, Madabhushi A. Spectral embedding based active contour (SEAC) for lesion segmentation on breast dynamic contrast enhanced magnetic resonance imaging. Med Phys 2013;40:032305. [PMID: 23464337 DOI: 10.1118/1.4790466] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Abstract

PURPOSE

Segmentation of breast lesions on dynamic contrast enhanced (DCE) magnetic resonance imaging (MRI) is the first step in lesion diagnosis in a computer-aided diagnosis framework. Because manual segmentation of such lesions is both time consuming and highly susceptible to human error and issues of reproducibility, an automated lesion segmentation method is highly desirable. Traditional automated image segmentation methods such as boundary-based active contour (AC) models require a strong gradient at the lesion boundary. Even when region-based terms are introduced to an AC model, grayscale image intensities often do not allow for clear definition of foreground and background region statistics. Thus, there is a need to find alternative image representations that might provide (1) strong gradients at the margin of the object of interest (OOI); and (2) larger separation between intensity distributions and region statistics for the foreground and background, which are necessary to halt evolution of the AC model upon reaching the border of the OOI.

METHODS

In this paper, the authors introduce a spectral embedding (SE) based AC (SEAC) for lesion segmentation on breast DCE-MRI. SE, a nonlinear dimensionality reduction scheme, is applied to the DCE time series in a voxelwise fashion to reduce several time point images to a single parametric image where every voxel is characterized by the three dominant eigenvectors. This parametric eigenvector image (PrEIm) representation allows for better capture of image region statistics and stronger gradients for use with a hybrid AC model, which is driven by both boundary and region information. They compare SEAC to ACs that employ fuzzy c-means (FCM) and principal component analysis (PCA) as alternative image representations. Segmentation performance was evaluated by boundary and region metrics as well as comparing lesion classification using morphological features from SEAC, PCA+AC, and FCM+AC.

RESULTS

On a cohort of 50 breast DCE-MRI studies, PrEIm yielded overall better region and boundary-based statistics compared to the original DCE-MR image, FCM, and PCA based image representations. Additionally, SEAC outperformed a hybrid AC applied to both PCA and FCM image representations. Mean dice similarity coefficient (DSC) for SEAC was significantly better (DSC = 0.74 ± 0.21) than FCM+AC (DSC = 0.50 ± 0.32) and similar to PCA+AC (DSC = 0.73 ± 0.22). Boundary-based metrics of mean absolute difference and Hausdorff distance followed the same trends. Of the automated segmentation methods, breast lesion classification based on morphologic features derived from SEAC segmentation using a support vector machine classifier also performed better (AUC = 0.67 ± 0.05; p < 0.05) than FCM+AC (AUC = 0.50 ± 0.07), and PCA+AC (AUC = 0.49 ± 0.07).

CONCLUSIONS

In this work, we presented SEAC, an accurate, general purpose AC segmentation tool that could be applied to any imaging domain that employs time series data. SE allows for projection of time series data into a PrEIm representation so that every voxel is characterized by the dominant eigenvectors, capturing the global and local time-intensity curve similarities in the data. This PrEIm allows for the calculation of strong tensor gradients and better region statistics than the original image intensities or alternative image representations such as PCA and FCM. The PrEIm also allows for building a more accurate hybrid AC scheme.

Collapse

Transcriptional biomarkers--high throughput screening, quantitative verification, and bioinformatical validation methods. Methods 2012;59:3-9. [PMID: 22967906 DOI: 10.1016/j.ymeth.2012.08.012] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Revised: 08/21/2012] [Accepted: 08/25/2012] [Indexed: 02/08/2023] Open

He L, Long LR, Antani S, Thoma GR. Histology image analysis for carcinoma detection and grading. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2012;107:538-56. [PMID: 22436890 PMCID: PMC3587978 DOI: 10.1016/j.cmpb.2011.12.007] [Citation(s) in RCA: 161] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2010] [Revised: 09/27/2011] [Accepted: 12/13/2011] [Indexed: 05/25/2023]

Xu R, Wunsch DC. Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng 2012;3:120-54. [PMID: 22275205 DOI: 10.1109/rbme.2010.2083647] [Citation(s) in RCA: 121] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Golugula A, Lee G, Madabhushi A. Evaluating feature selection strategies for high dimensional, small sample size datasets. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2012;2011:949-52. [PMID: 22254468 DOI: 10.1109/iembs.2011.6090214] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Nanni L, Brahnam S, Lumini A. Combining multiple approaches for gene microarray classification. Bioinformatics 2012;28:1151-7. [DOI: 10.1093/bioinformatics/bts108] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Viswanath S, Madabhushi A. Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data. BMC Bioinformatics 2012;13:26. [PMID: 22316103 PMCID: PMC3395843 DOI: 10.1186/1471-2105-13-26] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2011] [Accepted: 02/08/2012] [Indexed: 11/21/2022] Open

Abstract

Background

Dimensionality reduction (DR) enables the construction of a lower dimensional space (embedding) from a higher dimensional feature space while preserving object-class discriminability. However several popular DR approaches suffer from sensitivity to choice of parameters and/or presence of noise in the data. In this paper, we present a novel DR technique known as consensus embedding that aims to overcome these problems by generating and combining multiple low-dimensional embeddings, hence exploiting the variance among them in a manner similar to ensemble classifier schemes such as Bagging. We demonstrate theoretical properties of consensus embedding which show that it will result in a single stable embedding solution that preserves information more accurately as compared to any individual embedding (generated via DR schemes such as Principal Component Analysis, Graph Embedding, or Locally Linear Embedding). Intelligent sub-sampling (via mean-shift) and code parallelization are utilized to provide for an efficient implementation of the scheme.

Results

Applications of consensus embedding are shown in the context of classification and clustering as applied to: (1) image partitioning of white matter and gray matter on 10 different synthetic brain MRI images corrupted with 18 different combinations of noise and bias field inhomogeneity, (2) classification of 4 high-dimensional gene-expression datasets, (3) cancer detection (at a pixel-level) on 16 image slices obtained from 2 different high-resolution prostate MRI datasets. In over 200 different experiments concerning classification and segmentation of biomedical data, consensus embedding was found to consistently outperform both linear and non-linear DR methods within all applications considered.

Conclusions

We have presented a novel framework termed consensus embedding which leverages ensemble classification theory within dimensionality reduction, allowing for application to a wide range of high-dimensional biomedical data classification and segmentation problems. Our generalizable framework allows for improved representation and classification in the context of both imaging and non-imaging data. The algorithm offers a promising solution to problems that currently plague DR methods, and may allow for extension to other areas of biomedical data analysis.

Collapse

Reutlinger M, Schneider G. Nonlinear dimensionality reduction and mapping of compound libraries for drug discovery. J Mol Graph Model 2012;34:108-17. [PMID: 22326864 DOI: 10.1016/j.jmgm.2011.12.006] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2011] [Revised: 12/13/2011] [Accepted: 12/14/2011] [Indexed: 01/29/2023]

Agner SC, Soman S, Libfeld E, McDonald M, Thomas K, Englander S, Rosen MA, Chin D, Nosher J, Madabhushi A. Textural kinetics: a novel dynamic contrast-enhanced (DCE)-MRI feature for breast lesion classification. J Digit Imaging 2011;24:446-63. [PMID: 20508965 DOI: 10.1007/s10278-010-9298-1] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Riedmaier I, Pfaffl MW, Meyer HHD. The analysis of the transcriptome as a new approach for biomarker development to trace the abuse of anabolic steroid hormones. Drug Test Anal 2011;3:676-81. [DOI: 10.1002/dta.304] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2011] [Revised: 05/02/2011] [Accepted: 05/04/2011] [Indexed: 01/20/2023]

Viswanath S, Bloch BN, Chappelow J, Patel P, Rofsky N, Lenkinski R, Genega E, Madabhushi A. Enhanced Multi-Protocol Analysis via Intelligent Supervised Embedding (EMPrAvISE): Detecting Prostate Cancer on Multi-Parametric MRI. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2011;7963:79630U. [PMID: 25301991 DOI: 10.1117/12.878312] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Abstract

Currently, there is significant interest in developing methods for quantitative integration of multi-parametric (structural, functional) imaging data with the objective of building automated meta-classifiers to improve disease detection, diagnosis, and prognosis. Such techniques are required to address the differences in dimensionalities and scales of individual protocols, while deriving an integrated multi-parametric data representation which best captures all disease-pertinent information available. In this paper, we present a scheme called Enhanced Multi-Protocol Analysis via Intelligent Supervised Embedding (EMPrAvISE); a powerful, generalizable framework applicable to a variety of domains for multi-parametric data representation and fusion. Our scheme utilizes an ensemble of embeddings (via dimensionality reduction, DR); thereby exploiting the variance amongst multiple uncorrelated embeddings in a manner similar to ensemble classifier schemes (e.g. Bagging, Boosting). We apply this framework to the problem of prostate cancer (CaP) detection on 12 3 Tesla pre-operative in vivo multi-parametric (T2-weighted, Dynamic Contrast Enhanced, and Diffusion-weighted) magnetic resonance imaging (MRI) studies, in turn comprising a total of 39 2D planar MR images. We first align the different imaging protocols via automated image registration, followed by quantification of image attributes from individual protocols. Multiple embeddings are generated from the resultant high-dimensional feature space which are then combined intelligently to yield a single stable solution. Our scheme is employed in conjunction with graph embedding (for DR) and probabilistic boosting trees (PBTs) to detect CaP on multi-parametric MRI. Finally, a probabilistic pairwise Markov Random Field algorithm is used to apply spatial constraints to the result of the PBT classifier, yielding a per-voxel classification of CaP presence. Per-voxel evaluation of detection results against ground truth for CaP extent on MRI (obtained by spatially registering pre-operative MRI with available whole-mount histological specimens) reveals that EMPrAvISE yields a statistically significant improvement (AUC=0.77) over classifiers constructed from individual protocols (AUC=0.62, 0.62, 0.65, for T2w, DCE, DWI respectively) as well as one trained using multi-parametric feature concatenation (AUC=0.67).

Collapse

Madabhushi A, Agner S, Basavanhally A, Doyle S, Lee G. Computer-aided prognosis: predicting patient and disease outcome via quantitative fusion of multi-scale, multi-modal data. Comput Med Imaging Graph 2011;35:506-14. [PMID: 21333490 DOI: 10.1016/j.compmedimag.2011.01.008] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2010] [Revised: 12/16/2010] [Accepted: 01/10/2011] [Indexed: 12/31/2022]

Abstract

Computer-aided prognosis (CAP) is a new and exciting complement to the field of computer-aided diagnosis (CAD) and involves developing and applying computerized image analysis and multi-modal data fusion algorithms to digitized patient data (e.g. imaging, tissue, genomic) for helping physicians predict disease outcome and patient survival. While a number of data channels, ranging from the macro (e.g. MRI) to the nano-scales (proteins, genes) are now being routinely acquired for disease characterization, one of the challenges in predicting patient outcome and treatment response has been in our inability to quantitatively fuse these disparate, heterogeneous data sources. At the Laboratory for Computational Imaging and Bioinformatics (LCIB)(1) at Rutgers University, our team has been developing computerized algorithms for high dimensional data and image analysis for predicting disease outcome from multiple modalities including MRI, digital pathology, and protein expression. Additionally, we have been developing novel data fusion algorithms based on non-linear dimensionality reduction methods (such as Graph Embedding) to quantitatively integrate information from multiple data sources and modalities with the overarching goal of optimizing meta-classifiers for making prognostic predictions. In this paper, we briefly describe 4 representative and ongoing CAP projects at LCIB. These projects include (1) an Image-based Risk Score (IbRiS) algorithm for predicting outcome of Estrogen receptor positive breast cancer patients based on quantitative image analysis of digitized breast cancer biopsy specimens alone, (2) segmenting and determining extent of lymphocytic infiltration (identified as a possible prognostic marker for outcome in human epidermal growth factor amplified breast cancers) from digitized histopathology, (3) distinguishing patients with different Gleason grades of prostate cancer (grade being known to be correlated to outcome) from digitized needle biopsy specimens, and (4) integrating protein expression measurements obtained from mass spectrometry with quantitative image features derived from digitized histopathology for distinguishing between prostate cancer patients at low and high risk of disease recurrence following radical prostatectomy.

Collapse

Zhang J, Zhang K, Feng J, Small M. Rhythmic dynamics and synchronization via dimensionality reduction: application to human gait. PLoS Comput Biol 2010;6:e1001033. [PMID: 21187907 PMCID: PMC3002994 DOI: 10.1371/journal.pcbi.1001033] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 11/11/2010] [Indexed: 11/18/2022] Open

Zhang Y, Xu G, Wang J, Liang L. An automatic patient-specific seizure onset detection method in intracranial EEG based on incremental nonlinear dimensionality reduction. Comput Biol Med 2010;40:889-99. [PMID: 20951372 DOI: 10.1016/j.compbiomed.2010.09.010] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2010] [Revised: 09/14/2010] [Accepted: 09/28/2010] [Indexed: 11/17/2022]

Shi J, Luo Z. Nonlinear dimensionality reduction of gene expression data for visualization and clustering analysis of cancer tissue samples. Comput Biol Med 2010;40:723-32. [PMID: 20637456 DOI: 10.1016/j.compbiomed.2010.06.007] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2009] [Revised: 06/24/2010] [Accepted: 06/30/2010] [Indexed: 11/17/2022]

Spectral embedding based probabilistic boosting tree (ScEPTre): classifying high dimensional heterogeneous biomedical data. ACTA ACUST UNITED AC 2010. [PMID: 20426190 DOI: 10.1007/978-3-642-04271-3_102] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Cevallos-Cevallos JM, Reyes-De-Corcuera JI, Etxeberria E, Danyluk MD, Rodrick GE. Metabolomic analysis in food science: a review. Trends Food Sci Technol 2009. [DOI: 10.1016/j.tifs.2009.07.002] [Citation(s) in RCA: 379] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Tiwari P, Rosen M, Madabhushi A. A hierarchical spectral clustering and nonlinear dimensionality reduction scheme for detection of prostate cancer from magnetic resonance spectroscopy (MRS). Med Phys 2009;36:3927-39. [PMID: 19810465 DOI: 10.1118/1.3180955] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open

Abstract

Magnetic resonance spectroscopy (MRS) has been shown to have great clinical potential as a supplement to magnetic resonance imaging in the detection of prostate cancer (CaP). MRS provides functional information in the form of changes in the relative concentration of specific metabolites including choline, creatine, and citrate which can be used to identify potential areas of CaP. With a view to assisting radiologists in interpretation and analysis of MRS data, some researchers have begun to develop computer-aided detection (CAD) schemes for CaP identification from spectroscopy. Most of these schemes have been centered on identifying and integrating the area under metabolite peaks which is then used to compute relative metabolite ratios. However, manual identification of metabolite peaks on the MR spectra, and especially via CAD, is a challenging problem due to low signal-to-noise ratio, baseline irregularity, peak overlap, and peak distortion. In this article the authors present a novel CAD scheme that integrates nonlinear dimensionality reduction (NLDR) with an unsupervised hierarchical clustering algorithm to automatically identify suspicious regions on the prostate using MRS and hence avoids the need to explicitly identify metabolite peaks. The methodology comprises two stages. In stage 1, a hierarchical spectral clustering algorithm is used to distinguish between extracapsular and prostatic spectra in order to localize the region of interest (ROI) corresponding to the prostate. Once the prostate ROI is localized, in stage 2, a NLDR scheme, in conjunction with a replicated clustering algorithm, is used to automatically discriminate between three classes of spectra (normal appearing, suspicious appearing, and indeterminate). The methodology was quantitatively and qualitatively evaluated on a total of 18 1.5 T in vivo prostate T2-weighted (w) and MRS studies obtained from the multisite, multi-institutional American College of Radiology (ACRIN) trial. In the absence of the precise ground truth for CaP extent on the MR imaging for most of the ACRIN studies, probabilistic quantitative metrics were defined based on partial knowledge on the quadrant location and size of the tumor. The scheme, when evaluated against this partial ground truth, was found to have a CaP detection sensitivity of 89.33% and specificity of 79.79%. The results obtained from randomized threefold and fivefold cross validation suggest that the NLDR based clustering scheme has a higher CaP detection accuracy compared to such commonly used MRS analysis schemes as z score and PCA. In addition, the scheme was found to be robust to changes in system parameters. For 6 of the 18 studies an expert radiologist laboriously labeled each of the individual spectra according to a five point scale, with 1/2 representing spectra that the expert considered normal and 3/4/5 being spectra the expert deemed suspicious. When evaluated on these expert annotated datasets, the CAD system yielded an average sensitivity (cluster corresponding to suspicious spectra being identified as the CaP class) and specificity of 81.39% and 64.71%, respectively.

Collapse

The use of omic technologies for biomarker development to trace functions of anabolic agents. J Chromatogr A 2009;1216:8192-9. [DOI: 10.1016/j.chroma.2009.01.094] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2009] [Revised: 01/27/2009] [Accepted: 01/30/2009] [Indexed: 12/25/2022]

Basavanhally AN, Ganesan S, Agner S, Monaco JP, Feldman MD, Tomaszewski JE, Bhanot G, Madabhushi A. Computerized image-based detection and grading of lymphocytic infiltration in HER2+ breast cancer histopathology. IEEE Trans Biomed Eng 2009;57:642-53. [PMID: 19884074 DOI: 10.1109/tbme.2009.2035305] [Citation(s) in RCA: 189] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Abstract

The identification of phenotypic changes in breast cancer (BC) histopathology on account of corresponding molecular changes is of significant clinical importance in predicting disease outcome. One such example is the presence of lymphocytic infiltration (LI) in histopathology, which has been correlated with nodal metastasis and distant recurrence in HER2+ BC patients. In this paper, we present a computer-aided diagnosis (CADx) scheme to automatically detect and grade the extent of LI in digitized HER2+ BC histopathology. Lymphocytes are first automatically detected by a combination of region growing and Markov random field algorithms. Using the centers of individual detected lymphocytes as vertices, three graphs (Voronoi diagram, Delaunay triangulation, and minimum spanning tree) are constructed and a total of 50 image-derived features describing the arrangement of the lymphocytes are extracted from each sample. A nonlinear dimensionality reduction scheme, graph embedding (GE), is then used to project the high-dimensional feature vector into a reduced 3-D embedding space. A support vector machine classifier is used to discriminate samples with high and low LI in the reduced dimensional embedding space. A total of 41 HER2+ hematoxylin-and-eosin-stained images obtained from 12 patients were considered in this study. For more than 100 three-fold cross-validation trials, the architectural feature set successfully distinguished samples of high and low LI levels with a classification accuracy greater than 90%. The popular unsupervised Varma-Zisserman texton-based classification scheme was used for comparison and yielded a classification accuracy of only 60%. Additionally, the projection of the 50 image-derived features for all 41 tissue samples into a reduced dimensional space via GE allowed for the visualization of a smooth manifold that revealed a continuum between low, intermediate, and high levels of LI. Since it is known that extent of LI in BC biopsy specimens is a prognostic indicator, our CADx scheme will potentially help clinicians determine disease outcome and allow them to make better therapy recommendations for patients with HER2+ BC.

Collapse

Algorithm for the Analysis of Tryptophan Fluorescence Spectra and Their Correlation with Protein Structural Parameters. ALGORITHMS 2009. [DOI: 10.3390/a2031155] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Consensus-locally linear embedding (C-LLE): application to prostate cancer detection on magnetic resonance spectroscopy. ACTA ACUST UNITED AC 2008;11:330-8. [PMID: 18982622 DOI: 10.1007/978-3-540-85990-1_40] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/24/2023]