1
|
Cai L, Wang Z, Kulathinal R, Kumar S, Ji S. Deep Low-Shot Learning for Biological Image Classification and Visualization From Limited Training Samples. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2528-2538. [PMID: 34487501 DOI: 10.1109/tnnls.2021.3106831] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Predictive modeling is useful but very challenging in biological image analysis due to the high cost of obtaining and labeling training data. For example, in the study of gene interaction and regulation in Drosophila embryogenesis, the analysis is most biologically meaningful when in situ hybridization (ISH) gene expression pattern images from the same developmental stage are compared. However, labeling training data with precise stages is very time-consuming even for developmental biologists. Thus, a critical challenge is how to build accurate computational models for precise developmental stage classification from limited training samples. In addition, identification and visualization of developmental landmarks are required to enable biologists to interpret prediction results and calibrate models. To address these challenges, we propose a deep two-step low-shot learning framework to accurately classify ISH images using limited training images. Specifically, to enable accurate model training on limited training samples, we formulate the task as a deep low-shot learning problem and develop a novel two-step learning approach, including data-level learning and feature-level learning. We use a deep residual network as our base model and achieve improved performance in the precise stage prediction task of ISH images. Furthermore, the deep model can be interpreted by computing saliency maps, which consists of pixel-wise contributions of an image to its prediction result. In our task, saliency maps are used to assist the identification and visualization of developmental landmarks. Our experimental results show that the proposed model can not only make accurate predictions but also yield biologically meaningful interpretations. We anticipate our methods to be easily generalizable to other biological image classification tasks with small training datasets. Our open-source code is available at https://github.com/divelab/lsl-fly.
Collapse
|
2
|
Long W, Li T, Yang Y, Shen HB. FlyIT: Drosophila Embryogenesis Image Annotation based on Image Tiling and Convolutional Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:194-204. [PMID: 31425122 DOI: 10.1109/tcbb.2019.2935723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the rise of image-based transcriptomics, spatial gene expression data has become increasingly important for understanding gene regulations from the tissue level down to the cell level. Especially, the gene expression images of Drosophila embryos provide a new data source in the study of Drosophila embryogenesis. It is imperative to develop automatic annotation tools since manual annotation is labor-intensive and requires professional knowledge. Although a lot of image annotation methods have been proposed in the computer vision field, they may not work well for gene expression images, due to the great difference between these two annotation tasks. Besides the apparent difference on images, the annotation is performed at the gene level rather than the image level, where the expression patterns of a gene are recorded in multiple images. Moreover, the annotation terms often correspond to local expression patterns of images, yet they are assigned collectively to groups of images and the relations between the terms and single images are unknown. In order to learn the spatial expression patterns comprehensively for genes, we propose a new method, called FlyIT (image annotation based on Image Tiling and convolutional neural networks for fruit Fly). We implement two versions of FlyIT, learning at image-level and gene-level, respectively. The gene-level version employs an image tiling strategy to get a combined image feature representation for each gene. FlyIT uses a pre-trained ResNet model to obtain feature representation and a new loss function to deal with the class imbalance problem. As the annotation of Drosophila images is a multi-label classification problem, the new loss function considers the difficulty levels for recognizing different labels of the same sample and adjusts the sample weights accordingly. The experimental results on the FlyExpress database show that both the image tiling strategy and the deep architecture lead to the great enhancement of the annotation performance. FlyIT outperforms the existing annotators by a large margin (over 9 percent on AUC and 12 percent on macro F1 for predicting the top 10 terms). It also shows advantages over other deep learning models, including both single-instance and multi-instance learning frameworks.
Collapse
|
3
|
Zhang W, Li R, Zeng T, Sun Q, Kumar S, Ye J, Ji S. Deep Model Based Transfer and Multi-Task Learning for Biological Image Analysis. IEEE TRANSACTIONS ON BIG DATA 2020; 6:322-333. [PMID: 36846743 PMCID: PMC9957557 DOI: 10.1109/tbdata.2016.2573280] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
A central theme in learning from image data is to develop appropriate representations for the specific task at hand. Thus, a practical challenge is to determine what features are appropriate for specific tasks. For example, in the study of gene expression patterns in Drosophila, texture features were particularly effective for determining the developmental stages from in situ hybridization images. Such image representation is however not suitable for controlled vocabulary term annotation. Here, we developed feature extraction methods to generate hierarchical representations for ISH images. Our approach is based on the deep convolutional neural networks that can act on image pixels directly. To make the extracted features generic, the models were trained using a natural image set with millions of labeled examples. These models were transferred to the ISH image domain. To account for the differences between the source and target domains, we proposed a partial transfer learning scheme in which only part of the source model is transferred. We employed multi-task learning method to fine-tune the pre-trained models with labeled ISH images. Results showed that feature representations computed by deep models based on transfer and multi-task learning significantly outperformed other methods for annotating gene expression patterns at different stage ranges.
Collapse
Affiliation(s)
- Wenlu Zhang
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Rongjian Li
- Department of Computer Science, Old Dominion University, Norfolk, VA, 23529
| | - Tao Zeng
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99163
| | - Qian Sun
- Department of Computer Science and Engineering, Arizona State University, Tempe, AZ, 85287
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine and the Department of Biology, Temple University, Philadelphia, PA 19122
| | - Jieping Ye
- Department of Electrical Engineering and Computer Science and the Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109
| | - Shuiwang Ji
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, 99163
| |
Collapse
|
4
|
Zeng T, Li R, Mukkamala R, Ye J, Ji S. Deep convolutional neural networks for annotating gene expression patterns in the mouse brain. BMC Bioinformatics 2015; 16:147. [PMID: 25948335 PMCID: PMC4432953 DOI: 10.1186/s12859-015-0553-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 03/27/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Profiling gene expression in brain structures at various spatial and temporal scales is essential to understanding how genes regulate the development of brain structures. The Allen Developing Mouse Brain Atlas provides high-resolution 3-D in situ hybridization (ISH) gene expression patterns in multiple developing stages of the mouse brain. Currently, the ISH images are annotated with anatomical terms manually. In this paper, we propose a computational approach to annotate gene expression pattern images in the mouse brain at various structural levels over the course of development. RESULTS We applied deep convolutional neural network that was trained on a large set of natural images to extract features from the ISH images of developing mouse brain. As a baseline representation, we applied invariant image feature descriptors to capture local statistics from ISH images and used the bag-of-words approach to build image-level representations. Both types of features from multiple ISH image sections of the entire brain were then combined to build 3-D, brain-wide gene expression representations. We employed regularized learning methods for discriminating gene expression patterns in different brain structures. Results show that our approach of using convolutional model as feature extractors achieved superior performance in annotating gene expression patterns at multiple levels of brain structures throughout four developing ages. Overall, we achieved average AUC of 0.894 ± 0.014, as compared with 0.820 ± 0.046 yielded by the bag-of-words approach. CONCLUSIONS Deep convolutional neural network model trained on natural image sets and applied to gene expression pattern annotation tasks yielded superior performance, demonstrating its transfer learning property is applicable to such biological image sets.
Collapse
Affiliation(s)
- Tao Zeng
- Department of Computer Science, Old Dominion University, Norfolk, 23529, VA, USA.
| | - Rongjian Li
- Department of Computer Science, Old Dominion University, Norfolk, 23529, VA, USA.
| | - Ravi Mukkamala
- Department of Computer Science, Old Dominion University, Norfolk, 23529, VA, USA.
| | - Jieping Ye
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48109, MI, USA. .,Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, 48109, MI, USA.
| | - Shuiwang Ji
- Department of Computer Science, Old Dominion University, Norfolk, 23529, VA, USA.
| |
Collapse
|
5
|
Li R, Zhang W, Ji S. Automated identification of cell-type-specific genes in the mouse brain by image computing of expression patterns. BMC Bioinformatics 2014; 15:209. [PMID: 24947138 PMCID: PMC4078975 DOI: 10.1186/1471-2105-15-209] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 05/29/2014] [Indexed: 02/07/2023] Open
Abstract
Background Differential gene expression patterns in cells of the mammalian brain result in the morphological, connectional, and functional diversity of cells. A wide variety of studies have shown that certain genes are expressed only in specific cell-types. Analysis of cell-type-specific gene expression patterns can provide insights into the relationship between genes, connectivity, brain regions, and cell-types. However, automated methods for identifying cell-type-specific genes are lacking to date. Results Here, we describe a set of computational methods for identifying cell-type-specific genes in the mouse brain by automated image computing of in situ hybridization (ISH) expression patterns. We applied invariant image feature descriptors to capture local gene expression information from cellular-resolution ISH images. We then built image-level representations by applying vector quantization on the image descriptors. We employed regularized learning methods for classifying genes specifically expressed in different brain cell-types. These methods can also rank image features based on their discriminative power. We used a data set of 2,872 genes from the Allen Brain Atlas in the experiments. Results showed that our methods are predictive of cell-type-specificity of genes. Our classifiers achieved AUC values of approximately 87% when the enrichment level is set to 20. In addition, we showed that the highly-ranked image features captured the relationship between cell-types. Conclusions Overall, our results showed that automated image computing methods could potentially be used to identify cell-type-specific genes in the mouse brain.
Collapse
Affiliation(s)
| | | | - Shuiwang Ji
- Department of Computer Science, Old Dominion University, 23529 Norfolk, VA, USA.
| |
Collapse
|
6
|
Jug F, Pietzsch T, Preibisch S, Tomancak P. Bioimage Informatics in the context of Drosophila research. Methods 2014; 68:60-73. [PMID: 24732429 DOI: 10.1016/j.ymeth.2014.04.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 04/02/2014] [Accepted: 04/04/2014] [Indexed: 01/05/2023] Open
Abstract
Modern biological research relies heavily on microscopic imaging. The advanced genetic toolkit of Drosophila makes it possible to label molecular and cellular components with unprecedented level of specificity necessitating the application of the most sophisticated imaging technologies. Imaging in Drosophila spans all scales from single molecules to the entire populations of adult organisms, from electron microscopy to live imaging of developmental processes. As the imaging approaches become more complex and ambitious, there is an increasing need for quantitative, computer-mediated image processing and analysis to make sense of the imagery. Bioimage Informatics is an emerging research field that covers all aspects of biological image analysis from data handling, through processing, to quantitative measurements, analysis and data presentation. Some of the most advanced, large scale projects, combining cutting edge imaging with complex bioimage informatics pipelines, are realized in the Drosophila research community. In this review, we discuss the current research in biological image analysis specifically relevant to the type of systems level image datasets that are uniquely available for the Drosophila model system. We focus on how state-of-the-art computer vision algorithms are impacting the ability of Drosophila researchers to analyze biological systems in space and time. We pay particular attention to how these algorithmic advances from computer science are made usable to practicing biologists through open source platforms and how biologists can themselves participate in their further development.
Collapse
Affiliation(s)
- Florian Jug
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
| | - Tobias Pietzsch
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
| | - Stephan Preibisch
- Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147, USA; Department of Anatomy and Structural Biology, Gruss Lipper Biophotonics Center, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Pavel Tomancak
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany.
| |
Collapse
|
7
|
Augmenting multi-instance multilabel learning with sparse bayesian models for skin biopsy image analysis. BIOMED RESEARCH INTERNATIONAL 2014; 2014:305629. [PMID: 24860817 PMCID: PMC3997873 DOI: 10.1155/2014/305629] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2014] [Accepted: 02/03/2014] [Indexed: 11/20/2022]
Abstract
Skin biopsy images can reveal causes and severity of many skin diseases, which is a significant complement for skin surface inspection. Automatic annotation of skin biopsy image is an important problem for increasing efficiency and reducing the subjectiveness in diagnosis. However it is challenging particularly when there exists indirect relationship between annotation terms and local regions of a biopsy image, as well as local structures with different textures. In this paper, a novel method based on a recent proposed machine learning model, named multi-instance multilabel (MIML), is proposed to model the potential knowledge and experience of doctors on skin biopsy image annotation. We first show that the problem of skin biopsy image annotation can naturally be expressed as a MIML problem and then propose an image representation method that can capture both region structure and texture features, and a sparse Bayesian MIML algorithm which can produce probabilities indicating the confidence of annotation. The proposed algorithm framework is evaluated on a real clinical dataset containing 12,700 skin biopsy images. The results show that it is effective and prominent.
Collapse
|
8
|
Zhang W, Feng D, Li R, Chernikov A, Chrisochoides N, Osgood C, Konikoff C, Newfeld S, Kumar S, Ji S. A mesh generation and machine learning framework for Drosophila gene expression pattern image analysis. BMC Bioinformatics 2013; 14:372. [PMID: 24373308 PMCID: PMC3879658 DOI: 10.1186/1471-2105-14-372] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2013] [Accepted: 12/16/2013] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Multicellular organisms consist of cells of many different types that are established during development. Each type of cell is characterized by the unique combination of expressed gene products as a result of spatiotemporal gene regulation. Currently, a fundamental challenge in regulatory biology is to elucidate the gene expression controls that generate the complex body plans during development. Recent advances in high-throughput biotechnologies have generated spatiotemporal expression patterns for thousands of genes in the model organism fruit fly Drosophila melanogaster. Existing qualitative methods enhanced by a quantitative analysis based on computational tools we present in this paper would provide promising ways for addressing key scientific questions. RESULTS We develop a set of computational methods and open source tools for identifying co-expressed embryonic domains and the associated genes simultaneously. To map the expression patterns of many genes into the same coordinate space and account for the embryonic shape variations, we develop a mesh generation method to deform a meshed generic ellipse to each individual embryo. We then develop a co-clustering formulation to cluster the genes and the mesh elements, thereby identifying co-expressed embryonic domains and the associated genes simultaneously. Experimental results indicate that the gene and mesh co-clusters can be correlated to key developmental events during the stages of embryogenesis we study. The open source software tool has been made available at http://compbio.cs.odu.edu/fly/. CONCLUSIONS Our mesh generation and machine learning methods and tools improve upon the flexibility, ease-of-use and accuracy of existing methods.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Shuiwang Ji
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
| |
Collapse
|
9
|
Pruteanu-Malinici I, Majoros WH, Ohler U. Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields. Bioinformatics 2013; 29:i27-35. [PMID: 23812993 PMCID: PMC3694682 DOI: 10.1093/bioinformatics/btt206] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Motivation: Computational approaches for the annotation of phenotypes from image data have shown promising results across many applications, and provide rich and valuable information for studying gene function and interactions. While data are often available both at high spatial resolution and across multiple time points, phenotypes are frequently annotated independently, for individual time points only. In particular, for the analysis of developmental gene expression patterns, it is biologically sensible when images across multiple time points are jointly accounted for, such that spatial and temporal dependencies are captured simultaneously. Methods: We describe a discriminative undirected graphical model to label gene-expression time-series image data, with an efficient training and decoding method based on the junction tree algorithm. The approach is based on an effective feature selection technique, consisting of a non-parametric sparse Bayesian factor analysis model. The result is a flexible framework, which can handle large-scale data with noisy incomplete samples, i.e. it can tolerate data missing from individual time points. Results: Using the annotation of gene expression patterns across stages of Drosophila embryonic development as an example, we demonstrate that our method achieves superior accuracy, gained by jointly annotating phenotype sequences, when compared with previous models that annotate each stage in isolation. The experimental results on missing data indicate that our joint learning method successfully annotates genes for which no expression data are available for one or more stages. Contact: uwe.ohler@duke.edu
Collapse
|
10
|
Sun Q, Muckatira S, Yuan L, Ji S, Newfeld S, Kumar S, Ye J. Image-level and group-level models for Drosophila gene expression pattern annotation. BMC Bioinformatics 2013; 14:350. [PMID: 24299119 PMCID: PMC3924186 DOI: 10.1186/1471-2105-14-350] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2013] [Accepted: 11/06/2013] [Indexed: 12/27/2022] Open
Abstract
Background Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison. Results We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach. Conclusion In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Jieping Ye
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, AZ, 85287, USA.
| |
Collapse
|
11
|
Zhang G, Yin J, Li Z, Su X, Li G, Zhang H. Automated skin biopsy histopathological image annotation using multi-instance representation and learning. BMC Med Genomics 2013; 6 Suppl 3:S10. [PMID: 24565115 PMCID: PMC3980401 DOI: 10.1186/1755-8794-6-s3-s10] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
With digitisation and the development of computer-aided diagnosis, histopathological image analysis has attracted considerable interest in recent years. In this article, we address the problem of the automated annotation of skin biopsy images, a special type of histopathological image analysis. In contrast to previous well-studied methods in histopathology, we propose a novel annotation method based on a multi-instance learning framework. The proposed framework first represents each skin biopsy image as a multi-instance sample using a graph cutting method, decomposing the image to a set of visually disjoint regions. Then, we construct two classification models using multi-instance learning algorithms, among which one provides determinate results and the other calculates a posterior probability. We evaluate the proposed annotation framework using a real dataset containing 6691 skin biopsy images, with 15 properties as target annotation terms. The results indicate that the proposed method is effective and medically acceptable.
Collapse
Affiliation(s)
- Gang Zhang
- School of Information Science and Technology, SUN YAT-SEN University, Guangzhou, 510275, China
- School of Automation, Guangdong University of Technology, Guangzhou, 510006, China
| | - Jian Yin
- School of Information Science and Technology, SUN YAT-SEN University, Guangzhou, 510275, China
| | - Ziping Li
- The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, 510120, China
| | - Xiangyang Su
- The Third Affiliated Hospital of SUN YAT-SEN University, Guangzhou, 510630, China
| | - Guozheng Li
- The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, 510120, China
- Department of Control Science and Engineering, Tongji University, Shanghai, 201804, China
| | - Honglai Zhang
- Guangzhou University of Chinese Medicine, Guangzhou, 510120, China
| |
Collapse
|
12
|
A two-layer framework for appearance based recognition using spatial and discriminant influences. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2013.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
13
|
Cai X, Wang H, Huang H, Ding C. Joint stage recognition and anatomical annotation of Drosophila gene expression patterns. Bioinformatics 2013; 28:i16-24. [PMID: 22689756 PMCID: PMC3371852 DOI: 10.1093/bioinformatics/bts220] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms. Results: In this article, we propose a novel computational model for jointly stage classification and anatomical terms annotation of Drosophila gene expression patterns. We propose a novel Tri-Relational Graph (TG) model that comprises the data graph, anatomical term graph, developmental stage term graph, and connect them by two additional graphs induced from stage or annotation label assignments. Upon the TG model, we introduce a Preferential Random Walk (PRW) method to jointly recognize developmental stage and annotate anatomical terms by utilizing the interrelations between two tasks. The experimental results on two refined BDGP datasets demonstrate that our joint learning method can achieve superior prediction results on both tasks than the state-of-the-art methods. Availability:http://ranger.uta.edu/%7eheng/Drosophila/ Contact:heng@uta.edu
Collapse
Affiliation(s)
- Xiao Cai
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA
| | | | | | | |
Collapse
|
14
|
Mwangi B, Ebmeier KP, Matthews K, Steele JD. Multi-centre diagnostic classification of individual structural neuroimaging scans from patients with major depressive disorder. ACTA ACUST UNITED AC 2012; 135:1508-21. [PMID: 22544901 DOI: 10.1093/brain/aws084] [Citation(s) in RCA: 134] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Quantitative abnormalities of brain structure in patients with major depressive disorder have been reported at a group level for decades. However, these structural differences appear subtle in comparison with conventional radiologically defined abnormalities, with considerable inter-subject variability. Consequently, it has not been possible to readily identify scans from patients with major depressive disorder at an individual level. Recently, machine learning techniques such as relevance vector machines and support vector machines have been applied to predictive classification of individual scans with variable success. Here we describe a novel hybrid method, which combines machine learning with feature selection and characterization, with the latter aimed at maximizing the accuracy of machine learning prediction. The method was tested using a multi-centre dataset of T(1)-weighted 'structural' scans. A total of 62 patients with major depressive disorder and matched controls were recruited from referred secondary care clinical populations in Aberdeen and Edinburgh, UK. The generalization ability and predictive accuracy of the classifiers was tested using data left out of the training process. High prediction accuracy was achieved (~90%). While feature selection was important for maximizing high predictive accuracy with machine learning, feature characterization contributed only a modest improvement to relevance vector machine-based prediction (~5%). Notably, while the only information provided for training the classifiers was T(1)-weighted scans plus a categorical label (major depressive disorder versus controls), both relevance vector machine and support vector machine 'weighting factors' (used for making predictions) correlated strongly with subjective ratings of illness severity. These results indicate that machine learning techniques have the potential to inform clinical practice and research, as they can make accurate predictions about brain scan data from individual subjects. Furthermore, machine learning weighting factors may reflect an objective biomarker of major depressive disorder illness severity, based on abnormalities of brain structure.
Collapse
Affiliation(s)
- Benson Mwangi
- Division of Neuroscience, Medical Research Institute, University of Dundee, Ninewells Hospital and Medical School, Dundee, UK.
| | | | | | | |
Collapse
|
15
|
Yuan L, Woodard A, Ji S, Jiang Y, Zhou ZH, Kumar S, Ye J. Learning sparse representations for fruit-fly gene expression pattern image annotation and retrieval. BMC Bioinformatics 2012; 13:107. [PMID: 22621237 PMCID: PMC3434040 DOI: 10.1186/1471-2105-13-107] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2011] [Accepted: 05/23/2012] [Indexed: 11/10/2022] Open
Abstract
Background Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords. Results In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes. Conclusions We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results.
Collapse
Affiliation(s)
- Lei Yuan
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA
| | | | | | | | | | | | | |
Collapse
|
16
|
Li YX, Ji S, Kumar S, Ye J, Zhou ZH. Drosophila gene expression pattern annotation through multi-instance multi-label learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:98-112. [PMID: 21519115 DOI: 10.1109/tcbb.2011.73] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In the studies of Drosophila embryogenesis, a large number of two-dimensional digital images of gene expression patterns have been produced to build an atlas of spatio-temporal gene expression dynamics across developmental time. Gene expressions captured in these images have been manually annotated with anatomical and developmental ontology terms using a controlled vocabulary (CV), which are useful in research aimed at understanding gene functions, interactions, and networks. With the rapid accumulation of images, the process of manual annotation has become increasingly cumbersome, and computational methods to automate this task are urgently needed. However, the automated annotation of embryo images is challenging. This is because the annotation terms spatially correspond to local expression patterns of images, yet they are assigned collectively to groups of images and it is unknown which term corresponds to which region of which image in the group. In this paper, we address this problem using a new machine learning framework, Multi-Instance Multi-Label (MIML) learning. We first show that the underlying nature of the annotation task is a typical MIML learning problem. Then, we propose two support vector machine algorithms under the MIML framework for the task. Experimental results on the FlyExpress database (a digital library of standardized Drosophila gene expression pattern images) reveal that the exploitation of MIML framework leads to significant performance improvement over state-of-the-art approaches.
Collapse
|
17
|
Li Q, Kambhamettu C. Contour extraction of Drosophila embryos. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1509-1521. [PMID: 21339537 DOI: 10.1109/tcbb.2011.37] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Contour extraction of Drosophila (fruit fly) embryos is an important step to build a computational system for matching expression pattern of embryonic images to assist the discovery of the nature of genes. Automatic contour extraction of embryos is challenging due to severe image variations, including 1) the size, orientation, shape, and appearance of an embryo of interest; 2) the neighboring context of an embryo of interest (such as nontouching and touching neighboring embryos); and 3) illumination circumstance. In this paper, we propose an automatic framework for contour extraction of the embryo of interest in an embryonic image. The proposed framework contains three components. Its first component applies a mixture model of quadratic curves, with statistical features, to initialize the contour of the embryo of interest. An efficient method based on imbalanced image points is proposed to compute model parameters. The second component applies active contour model to refine embryo contours. The third component applies eigen-shape modeling to smooth jaggy contours caused by blurred embryo boundaries. We test the proposed framework on a data set of 8,000 embryonic images, and achieve promising accuracy (88 percent), that is, substantially higher than the-state-of-the-art results.
Collapse
Affiliation(s)
- Qi Li
- Department of Mathematics and Computer Science, Western Kentucky University, 1906 College Height Blvd., Bowling Green, KY 42101, USA.
| | | |
Collapse
|
18
|
Pruteanu-Malinici I, Mace DL, Ohler U. Automatic annotation of spatial expression patterns via sparse Bayesian factor models. PLoS Comput Biol 2011; 7:e1002098. [PMID: 21814502 PMCID: PMC3140966 DOI: 10.1371/journal.pcbi.1002098] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2010] [Accepted: 05/08/2011] [Indexed: 01/24/2023] Open
Abstract
Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions. High throughput image acquisition is a quickly increasing new source of data for problems in computational biology, such as phenotypic screens. Given the very diverse nature of imaging technology, samples, and biological questions, approaches are oftentimes very tailored and ad hoc to a specific data set. In particular, the image-based genome scale profiling of gene expression patterns via approaches like in situ hybridization requires the development of accurate and automatic image analysis systems for understanding regulatory networks and development of multicellular organisms. Here, we present a computational method for automated annotation of Drosophila gene expression images. This framework allows us to extract, identify and compare spatial expression patterns, of essence for higher organisms. Based on a sparse feature extraction technique, we successfully cluster and annotate expression patterns with high reliability, and show that the model represents a “vocabulary” of basic patterns reflecting common function or regulation.
Collapse
Affiliation(s)
- Iulian Pruteanu-Malinici
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America
| | | | | |
Collapse
|
19
|
Frise E, Hammonds AS, Celniker SE. Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape. Mol Syst Biol 2010; 6:345. [PMID: 20087342 PMCID: PMC2824522 DOI: 10.1038/msb.2009.102] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Accepted: 12/21/2009] [Indexed: 11/09/2022] Open
Abstract
Discovery of temporal and spatial patterns of gene expression is essential for understanding the regulatory networks and development in multicellular organisms. We analyzed the images from our large-scale spatial expression data set of early Drosophila embryonic development and present a comprehensive computational image analysis of the expression landscape. For this study, we created an innovative virtual representation of embryonic expression patterns using an elliptically shaped mesh grid that allows us to make quantitative comparisons of gene expression using a common frame of reference. Demonstrating the power of our approach, we used gene co-expression to identify distinct expression domains in the early embryo; the result is surprisingly similar to the fate map determined using laser ablation. We also used a clustering strategy to find genes with similar patterns and developed new analysis tools to detect variation within consensus patterns, adjacent non-overlapping patterns, and anti-correlated patterns. Of the 1800 genes investigated, only half had previously assigned functions. The known genes suggest developmental roles for the clusters, and identification of related patterns predicts requirements for co-occurring biological functions.
Collapse
Affiliation(s)
- Erwin Frise
- Department of Genome Dynamics, Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| | | | | |
Collapse
|
20
|
Ji S, Yuan L, Li YX, Zhou ZH, Kumar S, Ye J. Drosophila Gene Expression Pattern Annotation Using Sparse Features and Term-Term Interactions. KDD : PROCEEDINGS. INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING 2009; 2009:407-415. [PMID: 21614142 DOI: 10.1145/1557019.1557068] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.
Collapse
Affiliation(s)
- Shuiwang Ji
- Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University, Tempe, AZ 85287
| | | | | | | | | | | |
Collapse
|