1
|
de Vega WC, Erdman L, Vernon SD, Goldenberg A, McGowan PO. Integration of DNA methylation & health scores identifies subtypes in myalgic encephalomyelitis/chronic fatigue syndrome. Epigenomics 2018; 10:539-557. [PMID: 29692205 DOI: 10.2217/epi-2017-0150] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
AIM To identify subtypes in myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) based on DNA methylation profiles and health scores. METHODS DNA methylome profiles in immune cells were integrated with symptomatology from 70 women with ME/CFS using similarity network fusion to identify subtypes. RESULTS We discovered four ME/CFS subtypes associated with DNA methylation modifications in 1939 CpG sites, three RAND-36 categories and five DePaul Symptom Questionnaire measures. Methylation patterns of immune response genes and differences in physical functioning and postexertional malaise differentiated the subtypes. CONCLUSION ME/CFS subtypes are associated with specific DNA methylation differences and health symptomatology and provide additional evidence of the potential relevance of metabolic and immune differences in ME/CFS with respect to specific symptoms.
Collapse
Affiliation(s)
- Wilfred C de Vega
- Department of Biological Sciences, University of Toronto, Scarborough, Toronto, Ontario, Canada.,Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - Lauren Erdman
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics & Genome Biology, SickKids Research Institute, Toronto, Ontario, Canada
| | - Suzanne D Vernon
- The Bateman Horne Center of Excellence, Salt Lake City, UT 84102, USA
| | - Anna Goldenberg
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.,Genetics & Genome Biology, SickKids Research Institute, Toronto, Ontario, Canada
| | - Patrick O McGowan
- Department of Biological Sciences, University of Toronto, Scarborough, Toronto, Ontario, Canada.,Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada.,Department of Psychology, University of Toronto, Toronto, Ontario, Canada.,Department of Physiology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
2
|
Forsati R, Barjasteh I, Ross D, Esfahanian AH, Radha H. Network completion by leveraging similarity of nodes. SOCIAL NETWORK ANALYSIS AND MINING 2016. [DOI: 10.1007/s13278-016-0405-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
3
|
Schneider RF, Meyer A. How plasticity, genetic assimilation and cryptic genetic variation may contribute to adaptive radiations. Mol Ecol 2016; 26:330-350. [PMID: 27747962 DOI: 10.1111/mec.13880] [Citation(s) in RCA: 104] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Revised: 09/30/2016] [Accepted: 10/07/2016] [Indexed: 12/13/2022]
Abstract
There is increasing evidence that phenotypic plasticity can promote population divergence by facilitating phenotypic diversification and, eventually, genetic divergence. When a 'plastic' population colonizes a new habitat, it has the possibility to occupy multiple niches by expressing several distinct phenotypes. These initially reflect the population's plastic range but may later become genetically fixed by selection via the process of 'genetic assimilation' (GA). Through this process multiple specialized sister lineages can arise that share a common plastic ancestor - the 'flexible stem'. Here, we review possible molecular mechanisms through which natural selection could fix an initially plastic trait during GA. These mechanisms could also explain how GA may contribute to cryptic genetic variation that can subsequently be coopted into other phenotypes or traits, but also lead to nonadaptive responses. We outline the predicted patterns of genetic and transcriptional divergence accompanying flexible stem radiations. The analysis of such patterns of (retained) adaptive and nonadaptive plastic responses within and across radiating lineages can inform on the state of ongoing GA. We conclude that, depending on the stability of the environment, the molecular architecture underlying plastic traits can facilitate diversification, followed by fixation and consolidation of an adaptive phenotype and degeneration of nonadaptive ones. Additionally, the process of GA may increase the cryptic genetic variation of populations, which on one hand may serve as substrate for evolution, but on another may be responsible for nonadaptive responses that consolidate local allopatry and thus reproductive isolation.
Collapse
Affiliation(s)
- Ralf F Schneider
- Lehrstuhl für Zoologie und Evolutionsbiologie, Department of Biology, University of Konstanz, Universitaetstrasse 10, 78457, Konstanz, Germany
| | - Axel Meyer
- Lehrstuhl für Zoologie und Evolutionsbiologie, Department of Biology, University of Konstanz, Universitaetstrasse 10, 78457, Konstanz, Germany
| |
Collapse
|
4
|
Santoni D, Swiercz A, Zmieńko A, Kasprzak M, Blazewicz M, Bertolazzi P, Felici G. An integrated approach (CLuster Analysis Integration Method) to combine expression data and protein-protein interaction networks in agrigenomics: application on Arabidopsis thaliana. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:155-65. [PMID: 24404838 DOI: 10.1089/omi.2013.0050] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Experimental co-expression data and protein-protein interaction networks are frequently used to analyze the interactions among genes or proteins. Recent studies have investigated methods to integrate these two sources of information. We propose a new method to integrate co-expression data obtained through DNA microarray analysis (MA) and protein-protein interaction (PPI) network data, and apply it to Arabidopsis thaliana. The proposed method identifies small subsets of highly interacting proteins. Based on the analysis of the basis of co-localization and mRNA developmental expression, we show that these groups provide important biological insights; additionally, these subsets are significantly enriched with respect to KEGG Pathways and can be used to predict successfully whether proteins belong to known pathways. Thus, the method is able to provide relevant biological information and support the functional identification of complex genetic traits of economic value in plant agrigenomics research. The method has been implemented in a prototype software tool named CLAIM (CLuster Analysis Integration Method) and can be downloaded from http://bio.cs.put.poznan.pl/research_fields . CLAIM is based on the separate clustering of MA and PPI data; the clusters are merged in a special graph; cliques of this graph are subsets of strongly connected proteins. The proposed method was successfully compared with existing methods. CLAIM appears to be a useful semi-automated tool for protein functional analysis and warrants further evaluation in agrigenomics research.
Collapse
Affiliation(s)
- Daniele Santoni
- 1 Institute for Systems Analysis and Computer Science "Antonio Ruberti" , National Research Council of Italy, Rome, Italy
| | | | | | | | | | | | | |
Collapse
|
5
|
Gu J, Feng W, Zeng J, Mamitsuka H, Zhu S. Efficient Semisupervised MEDLINE Document Clustering With MeSH-Semantic and Global-Content Constraints. IEEE TRANSACTIONS ON CYBERNETICS 2013; 43:1265-1276. [PMID: 26502435 DOI: 10.1109/tsmcb.2012.2227998] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
For clustering biomedical documents, we can consider three different types of information: the local-content (LC) information from documents, the global-content (GC) information from the whole MEDLINE collections, and the medical subject heading (MeSH)-semantic (MS) information. Previous methods for clustering biomedical documents are not necessarily effective for integrating different types of information, by which only one or two types of information have been used. Recently, the performance of MEDLINE document clustering has been enhanced by linearly combining both the LC and MS information. However, the simple linear combination could be ineffective because of the limitation of the representation space for combining different types of information (similarities) with different reliability. To overcome the limitation, we propose a new semisupervised spectral clustering method, i.e., SSNCut, for clustering over the LC similarities, with two types of constraints: must-link (ML) constraints on document pairs with high MS (or GC) similarities and cannot-link (CL) constraints on those with low similarities. We empirically demonstrate the performance of SSNCut on MEDLINE document clustering, by using 100 data sets of MEDLINE records. Experimental results show that SSNCut outperformed a linear combination method and several well-known semisupervised clustering methods, being statistically significant. Furthermore, the performance of SSNCut with constraints from both MS and GC similarities outperformed that from only one type of similarities. Another interesting finding was that ML constraints more effectively worked than CL constraints, since CL constraints include around 10% incorrect ones, whereas this number was only 1% for ML constraints.
Collapse
|
6
|
Rau CD, Wisniewski N, Orozco LD, Bennett B, Weiss J, Lusis AJ. Maximal information component analysis: a novel non-linear network analysis method. Front Genet 2013; 4:28. [PMID: 23487572 PMCID: PMC3594742 DOI: 10.3389/fgene.2013.00028] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Accepted: 02/21/2013] [Indexed: 11/26/2022] Open
Abstract
Background: Network construction and analysis algorithms provide scientists with the ability to sift through high-throughput biological outputs, such as transcription microarrays, for small groups of genes (modules) that are relevant for further research. Most of these algorithms ignore the important role of non-linear interactions in the data, and the ability for genes to operate in multiple functional groups at once, despite clear evidence for both of these phenomena in observed biological systems. Results: We have created a novel co-expression network analysis algorithm that incorporates both of these principles by combining the information-theoretic association measure of the maximal information coefficient (MIC) with an Interaction Component Model. We evaluate the performance of this approach on two datasets collected from a large panel of mice, one from macrophages and the other from liver by comparing the two measures based on a measure of module entropy, Gene Ontology (GO) enrichment, and scale-free topology (SFT) fit. Our algorithm outperforms a widely used co-expression analysis method, weighted gene co-expression network analysis (WGCNA), in the macrophage data, while returning comparable results in the liver dataset when using these criteria. We demonstrate that the macrophage data has more non-linear interactions than the liver dataset, which may explain the increased performance of our method, termed Maximal Information Component Analysis (MICA) in that case. Conclusions: In making our network algorithm more accurately reflect known biological principles, we are able to generate modules with improved relevance, particularly in networks with confounding factors such as gene by environment interactions.
Collapse
Affiliation(s)
- Christoph D Rau
- Division of Cardiology, Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, CA, USA ; Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, CA, USA
| | | | | | | | | | | |
Collapse
|
7
|
Cho JH, Wang K, Galas DJ. An integrative approach to inferring biologically meaningful gene modules. BMC SYSTEMS BIOLOGY 2011; 5:117. [PMID: 21791051 PMCID: PMC3156758 DOI: 10.1186/1752-0509-5-117] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2011] [Accepted: 07/26/2011] [Indexed: 12/21/2022]
Abstract
Background The ability to construct biologically meaningful gene networks and modules is critical for contemporary systems biology. Though recent studies have demonstrated the power of using gene modules to shed light on the functioning of complex biological systems, most modules in these networks have shown little association with meaningful biological function. We have devised a method which directly incorporates gene ontology (GO) annotation in construction of gene modules in order to gain better functional association. Results We have devised a method, Semantic Similarity-Integrated approach for Modularization (SSIM) that integrates various gene-gene pairwise similarity values, including information obtained from gene expression, protein-protein interactions and GO annotations, in the construction of modules using affinity propagation clustering. We demonstrated the performance of the proposed method using data from two complex biological responses: 1. the osmotic shock response in Saccharomyces cerevisiae, and 2. the prion-induced pathogenic mouse model. In comparison with two previously reported algorithms, modules identified by SSIM showed significantly stronger association with biological functions. Conclusions The incorporation of semantic similarity based on GO annotation with gene expression and protein-protein interaction data can greatly enhance the functional relevance of inferred gene modules. In addition, the SSIM approach can also reveal the hierarchical structure of gene modules to gain a broader functional view of the biological system. Hence, the proposed method can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.
Collapse
Affiliation(s)
- Ji-Hoon Cho
- Institute for Systems Biology, Seattle, WA 98109, USA
| | | | | |
Collapse
|
8
|
Warsow G, Greber B, Falk SSI, Harder C, Siatkowski M, Schordan S, Som A, Endlich N, Schöler H, Repsilber D, Endlich K, Fuellen G. ExprEssence--revealing the essence of differential experimental data in the context of an interaction/regulation net-work. BMC SYSTEMS BIOLOGY 2010; 4:164. [PMID: 21118483 PMCID: PMC3012047 DOI: 10.1186/1752-0509-4-164] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2010] [Accepted: 11/30/2010] [Indexed: 12/15/2022]
Abstract
Background Experimentalists are overwhelmed by high-throughput data and there is an urgent need to condense information into simple hypotheses. For example, large amounts of microarray and deep sequencing data are becoming available, describing a variety of experimental conditions such as gene knockout and knockdown, the effect of interventions, and the differences between tissues and cell lines. Results To address this challenge, we developed a method, implemented as a Cytoscape plugin called ExprEssence. As input we take a network of interaction, stimulation and/or inhibition links between genes/proteins, and differential data, such as gene expression data, tracking an intervention or development in time. We condense the network, highlighting those links across which the largest changes can be observed. Highlighting is based on a simple formula inspired by the law of mass action. We can interactively modify the threshold for highlighting and instantaneously visualize results. We applied ExprEssence to three scenarios describing kidney podocyte biology, pluripotency and ageing: 1) We identify putative processes involved in podocyte (de-)differentiation and validate one prediction experimentally. 2) We predict and validate the expression level of a transcription factor involved in pluripotency. 3) Finally, we generate plausible hypotheses on the role of apoptosis, cell cycle deregulation and DNA repair in ageing data obtained from the hippocampus. Conclusion Reducing the size of gene/protein networks to the few links affected by large changes allows to screen for putative mechanistic relationships among the genes/proteins that are involved in adaptation to different experimental conditions, yielding important hypotheses, insights and suggestions for new experiments. We note that we do not focus on the identification of 'active subnetworks'. Instead we focus on the identification of single links (which may or may not form subnetworks), and these single links are much easier to validate experimentally than submodules. ExprEssence is available at http://sourceforge.net/projects/expressence/.
Collapse
Affiliation(s)
- Gregor Warsow
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Ernst-Heydemann-Strasse 8, Rostock, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Lahti L, Knuuttila JEA, Kaski S. Global modeling of transcriptional responses in interaction networks. ACTA ACUST UNITED AC 2010; 26:2713-20. [PMID: 20813878 DOI: 10.1093/bioinformatics/btq500] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
MOTIVATION Cell-biological processes are regulated through a complex network of interactions between genes and their products. The processes, their activating conditions and the associated transcriptional responses are often unknown. Organism-wide modeling of network activation can reveal unique and shared mechanisms between tissues, and potentially as yet unknown processes. The same method can also be applied to cell-biological conditions in one or more tissues. RESULTS We introduce a novel approach for organism-wide discovery and analysis of transcriptional responses in interaction networks. The method searches for local, connected regions in a network that exhibit coordinated transcriptional response in a subset of tissues. Known interactions between genes are used to limit the search space and to guide the analysis. Validation on a human pathway network reveals physiologically coherent responses, functional relatedness between tissues and coordinated, context-specific regulation of the genes. AVAILABILITY Implementation is freely available in R and Matlab at http://www.cis.hut.fi/projects/mi/software/NetResponse
Collapse
Affiliation(s)
- Leo Lahti
- Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University School of Science and Technology, Aalto, Finland.
| | | | | |
Collapse
|
10
|
Zeng J, Zhu S, Liew AWC, Yan H. Multiconstrained gene clustering based on generalized projections. BMC Bioinformatics 2010; 11:164. [PMID: 20356386 PMCID: PMC3098054 DOI: 10.1186/1471-2105-11-164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2009] [Accepted: 03/31/2010] [Indexed: 11/10/2022] Open
Abstract
Background Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. Results We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. Conclusions The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions.
Collapse
Affiliation(s)
- Jia Zeng
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China.
| | | | | | | |
Collapse
|
11
|
Parkkinen JA, Kaski S. Searching for functional gene modules with interaction component models. BMC SYSTEMS BIOLOGY 2010; 4:4. [PMID: 20100324 PMCID: PMC2823677 DOI: 10.1186/1752-0509-4-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Accepted: 01/25/2010] [Indexed: 12/04/2022]
Abstract
BACKGROUND Functional gene modules and protein complexes are being sought from combinations of gene expression and protein-protein interaction data with various clustering-type methods. Central features missing from most of these methods are handling of uncertainty in both protein interaction and gene expression measurements, and in particular capability of modeling overlapping clusters. It would make sense to assume that proteins may play different roles in different functional modules, and the roles are evidenced in their interactions. RESULTS We formulate a generative probabilistic model for protein-protein interaction links and introduce two ways for including gene expression data into the model. The model finds interaction components, which can be interpreted as overlapping clusters or functional modules. We demonstrate the performance on two data sets of yeast Saccharomyces cerevisiae. Our methods outperform a representative set of earlier models in the task of finding biologically relevant modules having enriched functional classes. CONCLUSIONS Combining protein interaction and gene expression data with a probabilistic generative model improves discovery of modules compared to approaches based on either data source alone. With a fairly simple model we can find biologically relevant modules better than with alternative methods, and in addition the modules may be inherently overlapping in the sense that different interactions may belong to different modules.
Collapse
Affiliation(s)
- Juuso A Parkkinen
- Helsinki Institute for Information Technology HIIT and Adaptive Informatics Research Centre, Department of Information and Computer Science, Helsinki University of Technology, P.O. Box 5400, FI-02015 TKK, Finland
- Department of Computer Science, P.O. Box 68, FI-00014, University of Helsinki, Finland
| | - Samuel Kaski
- Helsinki Institute for Information Technology HIIT and Adaptive Informatics Research Centre, Department of Information and Computer Science, Helsinki University of Technology, P.O. Box 5400, FI-02015 TKK, Finland
| |
Collapse
|
12
|
Aubin-Horth N, Renn SCP. Genomic reaction norms: using integrative biology to understand molecular mechanisms of phenotypic plasticity. Mol Ecol 2009; 18:3763-80. [PMID: 19732339 DOI: 10.1111/j.1365-294x.2009.04313.x] [Citation(s) in RCA: 256] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Phenotypic plasticity is the development of different phenotypes from a single genotype, depending on the environment. Such plasticity is a pervasive feature of life, is observed for various traits and is often argued to be the result of natural selection. A thorough study of phenotypic plasticity should thus include an ecological and an evolutionary perspective. Recent advances in large-scale gene expression technology make it possible to also study plasticity from a molecular perspective, and the addition of these data will help answer long-standing questions about this widespread phenomenon. In this review, we present examples of integrative studies that illustrate the molecular and cellular mechanisms underlying plastic traits, and show how new techniques will grow in importance in the study of these plastic molecular processes. These techniques include: (i) heterologous hybridization to DNA microarrays; (ii) next generation sequencing technologies applied to transcriptomics; (iii) techniques for studying the function of noncoding small RNAs; and (iv) proteomic tools. We also present recent studies on genetic model systems that uncover how environmental cues triggering different plastic responses are sensed and integrated by the organism. Finally, we describe recent work on changes in gene expression in response to an environmental cue that persist after the cue is removed. Such long-term responses are made possible by epigenetic molecular mechanisms, including DNA methylation. The results of these current studies help us outline future avenues for the study of plasticity.
Collapse
Affiliation(s)
- Nadia Aubin-Horth
- Département de Sciences biologiques, Université de Montréal, Québec, Canada.
| | | |
Collapse
|
13
|
Dierssen M, Herault Y, Estivill X. Aneuploidy: from a physiological mechanism of variance to Down syndrome. Physiol Rev 2009; 89:887-920. [PMID: 19584316 DOI: 10.1152/physrev.00032.2007] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Quantitative differences in gene expression emerge as a significant source of variation in natural populations, representing an important substrate for evolution and accounting for a considerable fraction of phenotypic diversity. However, perturbation of gene expression is also the main factor in determining the molecular pathogenesis of numerous aneuploid disorders. In this review, we focus on Down syndrome (DS) as the prototype of "genomic disorder" induced by copy number change. The understanding of the pathogenicity of the extra genomic material in trisomy 21 has accelerated in the last years due to the recent advances in genome sequencing, comparative genome analysis, functional genome exploration, and the use of model organisms. We present recent data on the role of genome-altering processes in the generation of diversity in DS neural phenotypes focusing on the impact of trisomy on brain structure and mental retardation and on biological pathways and cell types in target brain regions (including prefrontal cortex, hippocampus, cerebellum, and basal ganglia). We also review the potential that genetically engineered mouse models of DS bring into the understanding of the molecular biology of human learning disorders.
Collapse
Affiliation(s)
- Mara Dierssen
- Genes and Disease Program, Genomic Regulation Center-CRG, Pompeu Fabra University, Barcelona Biomedical Research Park, Dr Aiguader 88, PRBB building E, Barcelona 08003, Catalonia, Spain.
| | | | | |
Collapse
|
14
|
Guyer RA, Hellman MD, Emami K, Kadlecek S, Cadman RV, Yu J, Vadhat V, Ishii M, Woodburn JM, Law M, Rizi RR. A robust method for estimating regional pulmonary parameters in presence of noise. Acad Radiol 2008; 15:740-52. [PMID: 18486010 DOI: 10.1016/j.acra.2008.03.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2007] [Revised: 03/11/2008] [Accepted: 03/17/2008] [Indexed: 10/22/2022]
Abstract
RATIONALE AND OBJECTIVES Estimation of regional lung function parameters from hyperpolarized gas magnetic resonance images can be very sensitive to presence of noise. Clustering pixels and averaging over the resulting groups is an effective method for reducing the effects of noise in these images, commonly performed by grouping proximal pixels together, thus creating large groups called "bins." This method has several drawbacks, primarily that it can group dissimilar pixels together, and it degrades spatial resolution. This study presents an improved approach to simplifying data via principal component analysis (PCA) when noise level prohibits a pixel-by-pixel treatment of data, by clustering them based on similarity to one another rather than spatial proximity. The application to this technique is demonstrated in measurements of regional lung oxygen tension using hyperpolarized (3)He magnetic resonance imaging (MRI). MATERIALS AND METHODS A synthetic dataset was generated from an experimental set of oxygen tension measurements by treating the experimentally derived parameters as "true" values, and then solving backwards to generate "noiseless" images. Artificial noise was added to the synthetic data, and both traditional binning and PCA-based clustering were performed. For both methods, the root-mean-square (RMS) error between each pixel's "estimated" and "true" parameters was computed and the resulting effects were compared. RESULTS At high signal-to-noise ratios (SNRs), clustering did not enhance accuracy. Clustering did, however, improve parameter estimations for moderate SNR values (below 100). For SNR values between 100 and 20, the PCA-based K-means clustering analysis yielded greater accuracy than Cartesian binning. In extreme cases (SNR<5), Cartesian binning can be more accurate. CONCLUSIONS The reliability of parameters estimation in imaging-based regional functional measurements can be improved in the presence of noise by utilizing principal component analysis-based clustering without sacrificing spatial resolution compared to Cartesian binning. Results suggest that this approach has a great potential for robust grouping of pixels in hyperpolarized (3)He MRI maps of lung oxygen tension.
Collapse
|
15
|
Zhao XM, Chen L, Aihara K. Protein function prediction with high-throughput data. Amino Acids 2008; 35:517-30. [PMID: 18427717 DOI: 10.1007/s00726-008-0077-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Accepted: 03/13/2008] [Indexed: 12/12/2022]
Abstract
Protein function prediction is one of the main challenges in post-genomic era. The availability of large amounts of high-throughput data provides an alternative approach to handling this problem from the computational viewpoint. In this review, we provide a comprehensive description of the computational methods that are currently applicable to protein function prediction, especially from the perspective of machine learning. Machine learning techniques can generally be classified as supervised learning, semi-supervised learning and unsupervised learning. By classifying the existing computational methods for protein annotation into these three groups, we are able to present a comprehensive framework on protein annotation based on machine learning techniques. In addition to describing recently developed theoretical methodologies, we also cover representative databases and software tools that are widely utilized in the prediction of protein function.
Collapse
Affiliation(s)
- Xing-Ming Zhao
- ERATO Aihara Complexity Modelling Project, JST, Tokyo, 151-0064, Japan
| | | | | |
Collapse
|