1
|
Grbic M, Kartelj A, Jankovic S, Matic D, Filipovic V. Variable Neighborhood Search for Partitioning Sparse Biological Networks into the Maximum Edge-Weighted k-Plexes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1822-1831. [PMID: 30736005 DOI: 10.1109/tcbb.2019.2898189] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In a network, a k-plex represents a subset of n vertices where the degree of each vertex in the subnetwork induced by this subset is at least n-k. The maximum edge-weight k-plex partitioning problem is to find the k-plex partitioning in edge-weighted network, such that the sum of edge weights is maximal. The Max-EkPP has an important role in discovering new information in large biological networks. We propose a variable neighborhood search (VNS) algorithm for solving Max-EkPP. The VNS implements a local search based on the 1-swap first improvement strategy and the objective function that takes into account the degree of every vertex in each partition. The objective function favors feasible solutions and enables a gradual increase of the function's value, when moving from slightly infeasible to barely feasible solutions. Experimental computation is performed on real metabolic networks and other benchmark instances from the literature. Comparing to the previously proposed integer linear programming (ILP), VNS succeeds to find all known optimal solutions. For all other instances, the VNS either reaches previous best known solution or improves it. The proposed VNS is also tested on a large-scale dataset not considered up to now.
Collapse
|
2
|
Abstract
The exploitation of potential societal benefits of Earth observations is hampered by users having to engage in often tedious processes to discover data and extract information and knowledge. A concept is introduced for a transition from the current perception of data as passive objects (DPO) to a new perception of data as active subjects (DAS). This transition would greatly increase data usage and exploitation, and support the extraction of knowledge from data products. Enabling the data subjects to actively reach out to potential users would revolutionize data dissemination and sharing and facilitate collaboration in user communities. The three core elements of the transformative DAS concept are: (1) “intelligent semantic data agents” (ISDAs) that have the capabilities to communicate with their human and digital environment. Each ISDA provides a voice to the data product it represents. It has comprehensive knowledge of the represented product including quality, uncertainties, access conditions, previous uses, user feedbacks, etc., and it can engage in transactions with users. (2) A knowledge base that constructs extensive graphs presenting a comprehensive picture of communities of people, applications, models, tools, and resources and provides tools for the analysis of these graphs. (3) An interaction platform that links the ISDAs to the human environment and facilitates transaction including discovery of products, access to products and derived knowledge, modifications and use of products, and the exchange of feedback on the usage. This platform documents the transactions in a secure way maintaining full provenance.
Collapse
|
3
|
Thomas J, Seo D, Sael L. Review on Graph Clustering and Subgraph Similarity Based Analysis of Neurological Disorders. Int J Mol Sci 2016; 17:ijms17060862. [PMID: 27258269 PMCID: PMC4926396 DOI: 10.3390/ijms17060862] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Revised: 05/10/2016] [Accepted: 05/24/2016] [Indexed: 01/03/2023] Open
Abstract
How can complex relationships among molecular or clinico-pathological entities of neurological disorders be represented and analyzed? Graphs seem to be the current answer to the question no matter the type of information: molecular data, brain images or neural signals. We review a wide spectrum of graph representation and graph analysis methods and their application in the study of both the genomic level and the phenotypic level of the neurological disorder. We find numerous research works that create, process and analyze graphs formed from one or a few data types to gain an understanding of specific aspects of the neurological disorders. Furthermore, with the increasing number of data of various types becoming available for neurological disorders, we find that integrative analysis approaches that combine several types of data are being recognized as a way to gain a global understanding of the diseases. Although there are still not many integrative analyses of graphs due to the complexity in analysis, multi-layer graph analysis is a promising framework that can incorporate various data types. We describe and discuss the benefits of the multi-layer graph framework for studies of neurological disease.
Collapse
Affiliation(s)
- Jaya Thomas
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA.
- Department of Computer Science, State University New York Korea, Incheon 406-840, Korea.
| | - Dongmin Seo
- Korea Institute of Science and Technology Information, 245 Daehak-ro, Yuseong-gu, Daejeon 34141, Korea.
| | - Lee Sael
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11794, USA.
- Department of Computer Science, State University New York Korea, Incheon 406-840, Korea.
| |
Collapse
|
4
|
Ong P, Zainuddin Z. Calibrating wavelet neural networks by distance orientation similarity fuzzy C-means for approximation problems. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2016.01.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Abstract
BACKGROUND It is well understood that distinct communities of bacteria are present at different sites of the body, and that changes in the structure of these communities have strong implications for human health. Yet, challenges remain in understanding the complex interconnections between the bacterial taxa within these microbial communities and how they change during the progression of diseases. Many recent studies attempt to analyze the human microbiome using traditional ecological measures and cataloging differences in bacterial community membership. In this paper, we show how to push metagenomic analyses beyond mundane questions related to the bacterial taxonomic profiles that differentiate one sample from another. METHODS We develop tools and techniques that help us to investigate the nature of social interactions in microbial communities, and demonstrate ways of compactly capturing extensive information about these networks and visually conveying them in an effective manner. We define the concept of bacterial "social clubs", which are groups of taxa that tend to appear together in many samples. More importantly, we define the concept of "rival clubs", entire groups that tend to avoid occurring together in many samples. We show how to efficiently compute social clubs and rival clubs and demonstrate their utility with the help of examples including a smokers' dataset and a dataset from the Human Microbiome Project (HMP). RESULTS The tools developed provide a framework for analyzing relationships between bacterial taxa modeled as bacterial co-occurrence networks. The computational techniques also provide a framework for identifying clubs and rival clubs and for studying differences in the microbiomes (and their interactions) of two or more collections of samples. CONCLUSIONS Microbial relationships are similar to those found in social networks. In this work, we assume that strong (positive or negative) tendencies to co-occur or co-infect is likely to have biological, physiological, or ecological significance, possibly as a result of cooperation or competition. As a consequence of the analysis, a variety of biological interpretations are conjectured. In the human microbiome context, the pattern of strength of interactions between bacterial taxa is unique to body site.
Collapse
Affiliation(s)
- Mitch Fernandez
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, and Biomolecular Sciences Institute, Florida International University, 33199 Miami, FL, USA
- Dept. of Computational Medicine and Bioinformatics, College of Medicine, University of Michigan, 48109 Ann Arbor, MI, USA
| | - Juan D Riveros
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, and Biomolecular Sciences Institute, Florida International University, 33199 Miami, FL, USA
| | - Michael Campos
- Pulmonary & Critical Care Medicine, Miller School of Medicine, University of Miami, 33136 Miami, FL, USA
| | - Kalai Mathee
- Human and Molecular Genetics, Herbert Wertheim College of Medicine, and Biomolecular Sciences Institute, Florida International University, 33199 Miami, FL, USA
| | - Giri Narasimhan
- Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, and Biomolecular Sciences Institute, Florida International University, 33199 Miami, FL, USA
| |
Collapse
|
6
|
Zainuddin Z, Pauline O. An effective fuzzy C-means algorithm based on symmetry similarity approach. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2015.06.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
7
|
Wen J, Mohammed J, Bortolamiol-Becet D, Tsai H, Robine N, Westholm JO, Ladewig E, Dai Q, Okamura K, Flynt AS, Zhang D, Andrews J, Cherbas L, Kaufman TC, Cherbas P, Siepel A, Lai EC. Diversity of miRNAs, siRNAs, and piRNAs across 25 Drosophila cell lines. Genome Res 2015; 24:1236-50. [PMID: 24985917 PMCID: PMC4079977 DOI: 10.1101/gr.161554.113] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
We expanded the knowledge base for Drosophila cell line transcriptomes by deeply sequencing their small RNAs. In total, we analyzed more than 1 billion raw reads from 53 libraries across 25 cell lines. We verify reproducibility of biological replicate data sets, determine common and distinct aspects of miRNA expression across cell lines, and infer the global impact of miRNAs on cell line transcriptomes. We next characterize their commonalities and differences in endo-siRNA populations. Interestingly, most cell lines exhibit enhanced TE-siRNA production relative to tissues, suggesting this as a common aspect of cell immortalization. We also broadly extend annotations of cis-NAT-siRNA loci, identifying ones with common expression across diverse cells and tissues, as well as cell-restricted loci. Finally, we characterize small RNAs in a set of ovary-derived cell lines, including somatic cells (OSS and OSC) and a mixed germline/somatic cell population (fGS/OSS) that exhibits ping-pong piRNA signatures. Collectively, the ovary data reveal new genic piRNA loci, including unusual configurations of piRNA-generating regions. Together with the companion analysis of mRNAs described in a previous study, these small RNA data provide comprehensive information on the transcriptional landscape of diverse Drosophila cell lines. These data should encourage broader usage of fly cell lines, beyond the few that are presently in common usage.
Collapse
Affiliation(s)
- Jiayu Wen
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| | - Jaaved Mohammed
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA; Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA; Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York 10065, USA
| | - Diane Bortolamiol-Becet
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| | - Harrison Tsai
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| | - Nicolas Robine
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA; New York Genome Center, New York, New York 10022, USA
| | - Jakub O Westholm
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| | - Erik Ladewig
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| | - Qi Dai
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| | - Katsutomo Okamura
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA; Temasek Life Sciences, Temasek Lifesciences Laboratory, National University of Singapore, 117604 Singapore
| | - Alex S Flynt
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| | - Dayu Zhang
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
| | - Justen Andrews
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
| | - Lucy Cherbas
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
| | - Thomas C Kaufman
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
| | - Peter Cherbas
- Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA
| | - Eric C Lai
- Department of Developmental Biology, Sloan-Kettering Institute, New York, New York 10065, USA
| |
Collapse
|
8
|
Hüffner F, Komusiewicz C, Liebtrau A, Niedermeier R. Partitioning Biological Networks into Highly Connected Clusters with Maximum Edge Coverage. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:455-467. [PMID: 26356014 DOI: 10.1109/tcbb.2013.177] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A popular clustering algorithm for biological networks which was proposed by Hartuv and Shamir identifies nonoverlapping highly connected components. We extend the approach taken by this algorithm by introducing the combinatorial optimization problem Highly Connected Deletion, which asks for removing as few edges as possible from a graph such that the resulting graph consists of highly connected components. We show that Highly Connected Deletion is NP-hard and provide a fixed-parameter algorithm and a kernelization. We propose exact and heuristic solution strategies, based on polynomial-time data reduction rules and integer linear programming with column generation. The data reduction typically identifies 75 percent of the edges that are deleted for an optimal solution; the column generation method can then optimally solve protein interaction networks with up to 6,000 vertices and 13,500 edges within five hours. Additionally, we present a new heuristic that finds more clusters than the method by Hartuv and Shamir.
Collapse
|
9
|
Pizzuti C, Rombo SE. Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods. ACTA ACUST UNITED AC 2014; 30:1343-52. [PMID: 24458952 DOI: 10.1093/bioinformatics/btu034] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Protein-protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. RESULTS We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and then focus on one of them, i.e. population-based stochastic search. We provide an experimental evaluation, based on some validation measures widely used in the literature, of techniques in this class, that are as yet less explored than the others. In particular, we study how the capability of Genetic Algorithms (GAs) to extract clusters in PPI networks varies when different topology-based fitness functions are used, and we compare GAs with the main techniques in the other categories. The experimental campaign shows that predictions returned by GAs are often more accurate than those produced by the contestant methods. Interesting issues still remain open about possible generalizations of GAs allowing for cluster overlapping. AVAILABILITY AND IMPLEMENTATION We point out which methods and tools described here are publicly available. CONTACT simona.rombo@math.unipa.it SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Clara Pizzuti
- Institute for High Performance Computing and Networking (ICAR), National Research Council of Italy (CNR), Via P. Bucci 41C, 87036 Rende (CS) and Department of Mathematics and Computer Science, University of Palermo, Via Archirafi 34, 90123 Palermo (PA), Italy
| | | |
Collapse
|
10
|
Design of wavelet neural networks based on symmetry fuzzy C-means for function approximation. Neural Comput Appl 2013. [DOI: 10.1007/s00521-013-1350-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
11
|
Divina F, Pontes B, Giráldez R, Aguilar-Ruiz JS. An effective measure for assessing the quality of biclusters. Comput Biol Med 2011; 42:245-56. [PMID: 22196882 DOI: 10.1016/j.compbiomed.2011.11.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2011] [Revised: 09/26/2011] [Accepted: 11/26/2011] [Indexed: 10/14/2022]
Abstract
Biclustering is becoming a popular technique for the study of gene expression data. This is mainly due to the capability of biclustering to address the data using various dimensions simultaneously, as opposed to clustering, which can use only one dimension at the time. Different heuristics have been proposed in order to discover interesting biclusters in data. Such heuristics have one common characteristic: they are guided by a measure that determines the quality of biclusters. It follows that defining such a measure is probably the most important aspect. One of the popular quality measure is the mean squared residue (MSR). However, it has been proven that MSR fails at identifying some kind of patterns. This motivates us to introduce a novel measure, called virtual error (VE), that overcomes this limitation. Results obtained by using VE confirm that it can identify interesting patterns that could not be found by MSR.
Collapse
Affiliation(s)
- Federico Divina
- School of Engineering, Pablo de Olavide University, Ctra. Utrera s/n, Seville, Spain.
| | | | | | | |
Collapse
|
12
|
Gusev A, Kenny EE, Lowe JK, Salit J, Saxena R, Kathiresan S, Altshuler DM, Friedman JM, Breslow JL, Pe'er I. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. Am J Hum Genet 2011; 88:706-717. [PMID: 21620352 DOI: 10.1016/j.ajhg.2011.04.023] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Revised: 04/13/2011] [Accepted: 04/26/2011] [Indexed: 02/01/2023] Open
Abstract
Rare variants affecting phenotype pose a unique challenge for human genetics. Although genome-wide association studies have successfully detected many common causal variants, they are underpowered in identifying disease variants that are too rare or population-specific to be imputed from a general reference panel and thus are poorly represented on commercial SNP arrays. We set out to overcome these challenges and detect association between disease and rare alleles using SNP arrays by relying on long stretches of genomic sharing that are identical by descent. We have developed an algorithm, DASH, which builds upon pairwise identical-by-descent shared segments to infer clusters of individuals likely to be sharing a single haplotype. DASH constructs a graph with nodes representing individuals and links on the basis of such segments spanning a locus and uses an iterative minimum cut algorithm to identify densely connected components. We have applied DASH to simulated data and diverse GWAS data sets by constructing haplotype clusters and testing them for association. In simulations we show this approach to be significantly more powerful than single-marker testing in an isolated population that is from Kosrae, Federated States of Micronesia and has abundant IBD, and we provide orthogonal information for rare, recent variants in the outbred Wellcome Trust Case-Control Consortium (WTCCC) data. In both cohorts, we identified a number of haplotype associations, five such loci in the WTCCC data and ten in the isolated, that were conditionally significant beyond any individual nearby markers. We have replicated one of these loci in an independent European cohort and identified putative structural changes in low-pass whole-genome sequence of the cluster carriers.
Collapse
Affiliation(s)
- Alexander Gusev
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Eimear E Kenny
- Department of Computer Science, Columbia University, New York, NY 10027, USA; Medical Sciences and Human Genetics, Rockefeller University, New York, NY 10065, USA
| | - Jennifer K Lowe
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Jaqueline Salit
- Medical Sciences and Human Genetics, Rockefeller University, New York, NY 10065, USA
| | - Richa Saxena
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Sekar Kathiresan
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Cardiovascular Disease Prevention Center, Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - David M Altshuler
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Center for Human Genetic Research and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Jeffrey M Friedman
- Medical Sciences and Human Genetics, Rockefeller University, New York, NY 10065, USA
| | - Jan L Breslow
- Medical Sciences and Human Genetics, Rockefeller University, New York, NY 10065, USA
| | - Itsik Pe'er
- Department of Computer Science, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
13
|
Distance Functions, Clustering Algorithms and Microarray Data Analysis. LECTURE NOTES IN COMPUTER SCIENCE 2010. [DOI: 10.1007/978-3-642-13800-3_10] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
14
|
Madi A, Hecht I, Bransburg-Zabary S, Merbl Y, Pick A, Zucker-Toledano M, Quintana FJ, Tauber AI, Cohen IR, Ben-Jacob E. Organization of the autoantibody repertoire in healthy newborns and adults revealed by system level informatics of antigen microarray data. Proc Natl Acad Sci U S A 2009; 106:14484-9. [PMID: 19667184 PMCID: PMC2732819 DOI: 10.1073/pnas.0901528106] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2009] [Indexed: 11/18/2022] Open
Abstract
The immune system is essential to body defense and maintenance. Specific antibodies to foreign invaders function in body defense, and it has been suggested that autoantibodies binding to self molecules are important in body maintenance. Recently, the autoantibody repertoires in the bloods of healthy mothers and their newborns were studied using an antigen microarray containing hundreds of self molecules. It was found that the mothers expressed diverse repertoires for both IgG and IgM autoantibodies. Each newborn shares with its mother a similar repertoire of IgG antibodies, which cross the placental but its IgM repertoire is more similar to those of other newborns. Here, we took a system-level approach and analyzed the correlations between autoantibody reactivities of the previous data and extended the study to new data from newborns at birth and a week later, and from healthy young women. For the young women, we found modular organization of both IgG and IgM isotypes into antigen cliques-subgroups of highly correlated antigen reactivities. In contrast, the newborns were found to share a universal congenital IgM profile with no modular organization. Moreover, the IgG autoantibodies of the newborns manifested buds of the mothers' antigen cliques, but they were noticeably less structured. These findings suggest that the natural autoantibody repertoire of humans shows relatively little organization at birth, but, by young adulthood, it becomes sorted out into a modular organization of subgroups (cliques) of correlated antigens. These features revealed by antigen microarrays can be used to define personal states of autoantibody organizational motifs.
Collapse
Affiliation(s)
| | - Inbal Hecht
- The Sackler School of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel
- The Center for Theoretical Biological Physics, University of California San Diego, La Jolla, CA 92093
| | - Sharron Bransburg-Zabary
- The Sackler School of Medicine and
- The Sackler School of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel
| | - Yifat Merbl
- Department of Immunology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Adi Pick
- The Sackler School of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel
| | - Merav Zucker-Toledano
- The Sackler School of Medicine and
- Pediatric Department, Dana Children's Hospital, Tel-Aviv Sourasky Medical Center, Tel-Aviv 64239, Israel; and
| | | | - Alfred I. Tauber
- Department of Medicine, School of Medicine, Boston University, Boston, MA 02118
| | - Irun R. Cohen
- Department of Immunology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Eshel Ben-Jacob
- The Sackler School of Physics and Astronomy, Tel Aviv University, Tel Aviv 69978, Israel
- The Center for Theoretical Biological Physics, University of California San Diego, La Jolla, CA 92093
| |
Collapse
|
15
|
Wang K, Zheng J, Zhang J, Dong J. Estimating the number of clusters via system evolution for cluster analysis of gene expression data. ACTA ACUST UNITED AC 2009; 13:848-53. [PMID: 19527960 DOI: 10.1109/titb.2009.2025119] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The estimation of the number of clusters (NC) is one of crucial problems in the cluster analysis of gene expression data. Most approaches available give their answers without the intuitive information about separable degrees between clusters. However, this information is useful for understanding cluster structures. To provide this information, we propose system evolution (SE) method to estimate NC based on partitioning around medoids (PAM) clustering algorithm. SE analyzes cluster structures of a dataset from the viewpoint of a pseudothermodynamics system. The system will go to its stable equilibrium state, at which the optimal NC is found, via its partitioning process and merging process. The experimental results on simulated and real gene expression data demonstrate that the SE works well on the data with well-separated clusters and the one with slightly overlapping clusters.
Collapse
Affiliation(s)
- Kaijun Wang
- School of Mathematics and Computer Science, Fujian Normal University, Fuzhou 350007, China.
| | | | | | | |
Collapse
|
16
|
Giancarlo R, Scaturro D, Utro F. Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer. BMC Bioinformatics 2008; 9:462. [PMID: 18959783 PMCID: PMC2657801 DOI: 10.1186/1471-2105-9-462] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Accepted: 10/29/2008] [Indexed: 12/04/2022] Open
Abstract
Background Inferring cluster structure in microarray datasets is a fundamental task for the so-called -omic sciences. It is also a fundamental question in Statistics, Data Analysis and Classification, in particular with regard to the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. Results We consider five such measures: Clest, Consensus (Consensus Clustering), FOM (Figure of Merit), Gap (Gap Statistics) and ME (Model Explorer), in addition to the classic WCSS (Within Cluster Sum-of-Squares) and KL (Krzanowski and Lai index). We perform extensive experiments on six benchmark microarray datasets, using both Hierarchical and K-means clustering algorithms, and we provide an analysis assessing both the intrinsic ability of a measure to predict the correct number of clusters in a dataset and its merit relative to the other measures. We pay particular attention both to precision and speed. Moreover, we also provide various fast approximation algorithms for the computation of Gap, FOM and WCSS. The main result is a hierarchy of those measures in terms of precision and speed, highlighting some of their merits and limitations not reported before in the literature. Conclusion Based on our analysis, we draw several conclusions for the use of those internal measures on microarray data. We report the main ones. Consensus is by far the best performer in terms of predictive power and remarkably algorithm-independent. Unfortunately, on large datasets, it may be of no use because of its non-trivial computer time demand (weeks on a state of the art PC). FOM is the second best performer although, quite surprisingly, it may not be competitive in this scenario: it has essentially the same predictive power of WCSS but it is from 6 to 100 times slower in time, depending on the dataset. The approximation algorithms for the computation of FOM, Gap and WCSS perform very well, i.e., they are faster while still granting a very close approximation of FOM and WCSS. The approximation algorithm for the computation of Gap deserves to be singled-out since it has a predictive power far better than Gap, it is competitive with the other measures, but it is at least two order of magnitude faster in time with respect to Gap. Another important novel conclusion that can be drawn from our analysis is that all the measures we have considered show severe limitations on large datasets, either due to computational demand (Consensus, as already mentioned, Clest and Gap) or to lack of precision (all of the other measures, including their approximations). The software and datasets are available under the GNU GPL on the supplementary material web page.
Collapse
Affiliation(s)
- Raffaele Giancarlo
- Dipartimento di Matematica ed Applicazioni, Universitá di Palermo, Palermo, Italy.
| | | | | |
Collapse
|
17
|
Discovery and expansion of gene modules by seeking isolated groups in a random graph process. PLoS One 2008; 3:e3358. [PMID: 18843375 PMCID: PMC2559867 DOI: 10.1371/journal.pone.0003358] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Accepted: 09/08/2008] [Indexed: 12/03/2022] Open
Abstract
Background A central problem in systems biology research is the identification and extension of biological modules–groups of genes or proteins participating in a common cellular process or physical complex. As a result, there is a persistent need for practical, principled methods to infer the modular organization of genes from genome-scale data. Results We introduce a novel approach for the identification of modules based on the persistence of isolated gene groups within an evolving graph process. First, the underlying genomic data is summarized in the form of ranked gene–gene relationships, thereby accommodating studies that quantify the relevant biological relationship directly or indirectly. Then, the observed gene–gene relationship ranks are viewed as the outcome of a random graph process and candidate modules are given by the identifiable subgraphs that arise during this process. An isolation index is computed for each module, which quantifies the statistical significance of its survival time. Conclusions The Miso (module isolation) method predicts gene modules from genomic data and the associated isolation index provides a module-specific measure of confidence. Improving on existing alternative, such as graph clustering and the global pruning of dendrograms, this index offers two intuitively appealing features: (1) the score is module-specific; and (2) different choices of threshold correlate logically with the resulting performance, i.e. a stringent cutoff yields high quality predictions, but low sensitivity. Through the analysis of yeast phenotype data, the Miso method is shown to outperform existing alternatives, in terms of the specificity and sensitivity of its predictions.
Collapse
|
18
|
Yang CS, Chuang LY, Ke CH, Yang CH. A Combination of Shuffled Frog-Leaping Algorithm and Genetic Algorithm for Gene Selection. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2008. [DOI: 10.20965/jaciii.2008.p0218] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Microarray data referencing to gene expression profiles provides valuable answers to a variety of problems, and contributes to advances in clinical medicine. The application of microarray data to the classification of cancer types has recently assumed increasing importance. The classification of microarray data samples involves feature selection, whose goal is to identify subsets of differentially expressed gene potentially relevant for distinguishing sample classes and classifier design. We propose an efficient evolutionary approach for selecting gene subsets from gene expression data that effectively achieves higher accuracy for classification problems. Our proposal combines a shuffled frog-leaping algorithm (SFLA) and a genetic algorithm (GA), and chooses genes (features) related to classification. The K-nearest neighbor (KNN) with leave-one-out cross validation (LOOCV) is used to evaluate classification accuracy. We apply a novel hybrid approach based on SFLA-GA and KNN classification and compare 11 classification problems from the literature. Experimental results show that classification accuracy obtained using selected features was higher than the accuracy of datasets without feature selection.
Collapse
|
19
|
Kerr G, Ruskin H, Crane M, Doolan P. Techniques for clustering gene expression data. Comput Biol Med 2008; 38:283-93. [DOI: 10.1016/j.compbiomed.2007.11.001] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2006] [Revised: 10/26/2007] [Accepted: 11/05/2007] [Indexed: 10/22/2022]
|
20
|
Lalonde S, Ehrhardt DW, Loqué D, Chen J, Rhee SY, Frommer WB. Molecular and cellular approaches for the detection of protein-protein interactions: latest techniques and current limitations. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2008; 53:610-635. [PMID: 18269572 DOI: 10.1111/j.1365-313x.2007.03332.x] [Citation(s) in RCA: 109] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Homotypic and heterotypic protein interactions are crucial for all levels of cellular function, including architecture, regulation, metabolism, and signaling. Therefore, protein interaction maps represent essential components of post-genomic toolkits needed for understanding biological processes at a systems level. Over the past decade, a wide variety of methods have been developed to detect, analyze, and quantify protein interactions, including surface plasmon resonance spectroscopy, NMR, yeast two-hybrid screens, peptide tagging combined with mass spectrometry and fluorescence-based technologies. Fluorescence techniques range from co-localization of tags, which may be limited by the optical resolution of the microscope, to fluorescence resonance energy transfer-based methods that have molecular resolution and can also report on the dynamics and localization of the interactions within a cell. Proteins interact via highly evolved complementary surfaces with affinities that can vary over many orders of magnitude. Some of the techniques described in this review, such as surface plasmon resonance, provide detailed information on physical properties of these interactions, while others, such as two-hybrid techniques and mass spectrometry, are amenable to high-throughput analysis using robotics. In addition to providing an overview of these methods, this review emphasizes techniques that can be applied to determine interactions involving membrane proteins, including the split ubiquitin system and fluorescence-based technologies for characterizing hits obtained with high-throughput approaches. Mass spectrometry-based methods are covered by a review by Miernyk and Thelen (2008; this issue, pp. 597-609). In addition, we discuss the use of interaction data to construct interaction networks and as the basis for the exciting possibility of using to predict interaction surfaces.
Collapse
Affiliation(s)
- Sylvie Lalonde
- Carnegie Institution, 260 Panama Street, Stanford, CA 94305, USA.
| | | | | | | | | | | |
Collapse
|
21
|
|
22
|
FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data. BMC Bioinformatics 2007; 8:3. [PMID: 17204155 PMCID: PMC1774579 DOI: 10.1186/1471-2105-8-3] [Citation(s) in RCA: 156] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2006] [Accepted: 01/04/2007] [Indexed: 11/16/2022] Open
Abstract
Background Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process. Results The clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License. Conclusion The FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis.
Collapse
|
23
|
Kim KJ, Cho SB. Ensemble classifiers based on correlation analysis for DNA microarray classification. Neurocomputing 2006. [DOI: 10.1016/j.neucom.2006.03.002] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
24
|
Cancer classification using ensemble of neural networks with multiple significant gene subsets. APPL INTELL 2006. [DOI: 10.1007/s10489-006-0020-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
25
|
Shah S, Kusiak A. Cancer gene search with data-mining and genetic algorithms. Comput Biol Med 2006; 37:251-61. [PMID: 16616736 DOI: 10.1016/j.compbiomed.2006.01.007] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2005] [Revised: 11/20/2005] [Accepted: 01/24/2006] [Indexed: 12/13/2022]
Abstract
Cancer leads to approximately 25% of all mortalities, making it the second leading cause of death in the United States. Early and accurate detection of cancer is critical to the well being of patients. Analysis of gene expression data leads to cancer identification and classification, which will facilitate proper treatment selection and drug development. Gene expression data sets for ovarian, prostate, and lung cancer were analyzed in this research. An integrated gene-search algorithm for genetic expression data analysis was proposed. This integrated algorithm involves a genetic algorithm and correlation-based heuristics for data preprocessing (on partitioned data sets) and data mining (decision tree and support vector machines algorithms) for making predictions. Knowledge derived by the proposed algorithm has high classification accuracy with the ability to identify the most significant genes. Bagging and stacking algorithms were applied to further enhance the classification accuracy. The results were compared with that reported in the literature. Mapping of genotype information to the phenotype parameters will ultimately reduce the cost and complexity of cancer detection and classification.
Collapse
Affiliation(s)
- Shital Shah
- Intelligent Systems Laboratory, MIE, 2139 Seamans Center, The University of Iowa, Iowa City, IA 52242-1527, USA
| | | |
Collapse
|
26
|
Di Gesú V, Giancarlo R, Lo Bosco G, Raimondi A, Scaturro D. GenClust: a genetic algorithm for clustering gene expression data. BMC Bioinformatics 2005; 6:289. [PMID: 16336639 PMCID: PMC1343581 DOI: 10.1186/1471-2105-6-289] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2005] [Accepted: 12/07/2005] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. RESULTS GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. CONCLUSION Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.
Collapse
Affiliation(s)
- Vito Di Gesú
- Dipartimento di Matematica ed Applicazioni, Universitá di Palermo, Via Archirafi 34, 90123 Palermo, Italy
| | - Raffaele Giancarlo
- Dipartimento di Matematica ed Applicazioni, Universitá di Palermo, Via Archirafi 34, 90123 Palermo, Italy
| | - Giosué Lo Bosco
- Dipartimento di Matematica ed Applicazioni, Universitá di Palermo, Via Archirafi 34, 90123 Palermo, Italy
| | - Alessandra Raimondi
- Dipartimento di Matematica ed Applicazioni, Universitá di Palermo, Via Archirafi 34, 90123 Palermo, Italy
| | - Davide Scaturro
- Dipartimento di Matematica ed Applicazioni, Universitá di Palermo, Via Archirafi 34, 90123 Palermo, Italy
| |
Collapse
|
27
|
Figueroa A, Borneman J, Jiang T. Clustering binary fingerprint vectors with missing values for DNA array data analysis. J Comput Biol 2005; 11:887-901. [PMID: 15700408 DOI: 10.1089/cmb.2004.11.887] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Oligonucleotide fingerprinting is a powerful DNA array-based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version and present an efficient greedy algorithm based on MINIMUM CLIQUE PARTITION on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our preliminary experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.
Collapse
Affiliation(s)
- Andres Figueroa
- Department of Computer Science, University of California, Riverside 92521, USA.
| | | | | |
Collapse
|
28
|
Pancoska P, Moravek Z, Moll UM. Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs. Nucleic Acids Res 2004; 32:4630-45. [PMID: 15333695 PMCID: PMC516071 DOI: 10.1093/nar/gkh802] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Nucleic acids are molecules of choice for both established and emerging nanoscale technologies. These technologies benefit from large functional densities of 'DNA processing elements' that can be readily manufactured. To achieve the desired functionality, polynucleotide sequences are currently designed by a process that involves tedious and laborious filtering of potential candidates against a series of requirements and parameters. Here, we present a complete novel methodology for the rapid rational design of large sets of DNA sequences. This method allows for the direct implementation of very complex and detailed requirements for the generated sequences, thus avoiding 'brute force' filtering. At the same time, these sequences have narrow distributions of melting temperatures. The molecular part of the design process can be done without computer assistance, using an efficient 'human engineering' approach by drawing a single blueprint graph that represents all generated sequences. Moreover, the method eliminates the necessity for extensive thermodynamic calculations. Melting temperature can be calculated only once (or not at all). In addition, the isostability of the sequences is independent of the selection of a particular set of thermodynamic parameters. Applications are presented for DNA sequence designs for microarrays, universal microarray zip sequences and electron transfer experiments.
Collapse
Affiliation(s)
- Petr Pancoska
- Department of Pathology, Stony Brook University, New York, NY 11794, USA.
| | | | | |
Collapse
|
29
|
Shah SC, Kusiak A. Data mining and genetic algorithm based gene/SNP selection. Artif Intell Med 2004; 31:183-96. [PMID: 15302085 DOI: 10.1016/j.artmed.2004.04.002] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2003] [Revised: 02/07/2004] [Accepted: 04/03/2004] [Indexed: 11/19/2022]
Abstract
OBJECTIVE Genomic studies provide large volumes of data with the number of single nucleotide polymorphisms (SNPs) ranging into thousands. The analysis of SNPs permits determining relationships between genotypic and phenotypic information as well as the identification of SNPs related to a disease. The growing wealth of information and advances in biology call for the development of approaches for discovery of new knowledge. One such area is the identification of gene/SNP patterns impacting cure/drug development for various diseases. METHODS A new approach for predicting drug effectiveness is presented. The approach is based on data mining and genetic algorithms. A global search mechanism, weighted decision tree, decision-tree-based wrapper, a correlation-based heuristic, and the identification of intersecting feature sets are employed for selecting significant genes. RESULTS The feature selection approach has resulted in 85% reduction of number of features. The relative increase in cross-validation accuracy and specificity for the significant gene/SNP set was 10% and 3.2%, respectively. CONCLUSION The feature selection approach was successfully applied to data sets for drug and placebo subjects. The number of features has been significantly reduced while the quality of knowledge was enhanced. The feature set intersection approach provided the most significant genes/SNPs. The results reported in the paper discuss associations among SNPs resulting in patient-specific treatment protocols.
Collapse
Affiliation(s)
- Shital C Shah
- Intelligent Systems Laboratory, MIE, 2139 Seamans Center, The University of Iowa, Iowa City, IA 52242-1527, USA
| | | |
Collapse
|
30
|
Kim YH, Lee SY, Moon BR. A Genetic Approach for Gene Selection on Microarray Expression Data. GENETIC AND EVOLUTIONARY COMPUTATION – GECCO 2004 2004. [DOI: 10.1007/978-3-540-24854-5_36] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
31
|
Abstract
The development of functional genomic resources is essential to understand and utilize information generated from genome sequencing projects. Central to the development of this technology is the creation of high-quality cDNA resources and improved technologies for analyzing coding and noncoding mRNA sequences. The isolation and mapping of cDNAs is an entrée to characterizing the information that is of significant biological relevance in the genome of an organism. However, a bottleneck is often encountered when attempting to bring to full-length (or at least full-coding) a number of incomplete cDNAs in parallel, since this involves the nonsystematic, time consuming, and labor-intensive iterative screening of a number of cDNA libraries of variable quality and/or directed strategies to process individual clones (e.g., 5' rapid amplification of cDNA ends). Here, we review the current state of the art in cDNA library generation, as well as present an analysis of the different steps involved in cDNA library generation.
Collapse
Affiliation(s)
- M Das
- Department of Biochemistry, McGill Cancer Center, McGill University, Montreal, Quebec, Canada H3G 1Y6
| | | | | | | | | |
Collapse
|
32
|
|