5351
|
Abstract
The bioinformatics requirements within the clinical environment are very specific, and analytic techniques need to be fit for purpose, robust, and predictable. At the same time, the bewildering amount of information produced during these analyses needs to be carefully managed, used and interpreted correctly. The challenge for clinical laboratories now is to implement production analytical processes that are capable of handling different experimental approaches on current equipment, as well as to incorporate ways for these systems to evolve to take account of developments likely to make impacts in the near future. This is complicated by the many options available at each of the critical processing steps and a clear method needs to be developed to assemble appropriate pipelines. Here, I discuss the issues relevant to the development of an informatics pipeline that meets these criteria that should allow individual laboratories to assess their proposed strategies.
Collapse
Affiliation(s)
- Richard James Nigel Allcock
- School of Pathology and Laboratory Medicine, University of Western Australia, M574 Stirling Highway, Nedlands, WA, 6009, Australia,
| |
Collapse
|
5352
|
Christenson SA, Brandsma CA, Campbell JD, Knight DA, Pechkovsky DV, Hogg JC, Timens W, Postma DS, Lenburg M, Spira A. miR-638 regulates gene expression networks associated with emphysematous lung destruction. Genome Med 2013; 5:114. [PMID: 24380442 PMCID: PMC3971345 DOI: 10.1186/gm519] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 12/20/2013] [Indexed: 12/21/2022] Open
Abstract
Background Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease characterized by varying degrees of emphysematous lung destruction and small airway disease, each with distinct effects on clinical outcomes. There is little known about how microRNAs contribute specifically to the emphysema phenotype. We examined how genome-wide microRNA expression is altered with regional emphysema severity and how these microRNAs regulate disease-associated gene expression networks. Methods We profiled microRNAs in different regions of the lung with varying degrees of emphysema from 6 smokers with COPD and 2 controls (8 regions × 8 lungs = 64 samples). Regional emphysema severity was quantified by mean linear intercept. Whole genome microRNA and gene expression data were integrated in the same samples to build co-expression networks. Candidate microRNAs were perturbed in human lung fibroblasts in order to validate these networks. Results The expression levels of 63 microRNAs (P < 0.05) were altered with regional emphysema. A subset, including miR-638, miR-30c, and miR-181d, had expression levels that were associated with those of their predicted mRNA targets. Genes correlated with these microRNAs were enriched in pathways associated with emphysema pathophysiology (for example, oxidative stress and accelerated aging). Inhibition of miR-638 expression in lung fibroblasts led to modulation of these same emphysema-related pathways. Gene targets of miR-638 in these pathways were amongst those negatively correlated with miR-638 expression in emphysema. Conclusions Our findings demonstrate that microRNAs are altered with regional emphysema severity and modulate disease-associated gene expression networks. Furthermore, miR-638 may regulate gene expression pathways related to the oxidative stress response and aging in emphysematous lung tissue and lung fibroblasts.
Collapse
Affiliation(s)
- Stephanie A Christenson
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, 72 East Concord Street Boston, MA 02118, USA ; Department of Pulmonary and Critical Care Medicine, University of California, San Francisco, 513 Parnassus Ave, San Francisco, CA 94143, USA
| | - Corry-Anke Brandsma
- Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713 Groningen, Netherlands ; University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD (GRIAC), Hanzeplein 1, 9713 Groningen, Netherlands
| | - Joshua D Campbell
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, 72 East Concord Street Boston, MA 02118, USA ; Bioinformatics Program, Boston University, 44 Cummington Street Boston, MA 02215, USA
| | - Darryl A Knight
- UBC James Hogg Research Centre, Institute for Heart and Lung Health, St Paul's Hospital and Department of Pathology and Laboratory Medicine, University of British Columbia, 1081 Burrard St Vancouver, BC V6Z 1Y6, Canada ; School of Biomedical Sciences and Pharmacy, University of Newcastle, University Drive Callaghan, New South Wales 2308, Australia
| | - Dmitri V Pechkovsky
- UBC James Hogg Research Centre, Institute for Heart and Lung Health, St Paul's Hospital and Department of Pathology and Laboratory Medicine, University of British Columbia, 1081 Burrard St Vancouver, BC V6Z 1Y6, Canada ; Respiratory Division, Department of Medicine, University of British Columbia, The Jack Bell Research Center, 2660 Oak Street Vancouver, BC V6H 3Z6, Canada
| | - James C Hogg
- UBC James Hogg Research Centre, Institute for Heart and Lung Health, St Paul's Hospital and Department of Pathology and Laboratory Medicine, University of British Columbia, 1081 Burrard St Vancouver, BC V6Z 1Y6, Canada
| | - Wim Timens
- Department of Pathology and Medical Biology, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713 Groningen, Netherlands ; University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD (GRIAC), Hanzeplein 1, 9713 Groningen, Netherlands
| | - Dirkje S Postma
- University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD (GRIAC), Hanzeplein 1, 9713 Groningen, Netherlands ; Department of Pulmonary Diseases, University of Groningen, University Medical Center Groningen, Hanzeplein 1, 9713 Groningen, Netherlands
| | - Marc Lenburg
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, 72 East Concord Street Boston, MA 02118, USA ; Bioinformatics Program, Boston University, 44 Cummington Street Boston, MA 02215, USA
| | - Avrum Spira
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, 72 East Concord Street Boston, MA 02118, USA ; Bioinformatics Program, Boston University, 44 Cummington Street Boston, MA 02215, USA
| |
Collapse
|
5353
|
Kim D, Li R, Dudek SM, Ritchie MD. ATHENA: Identifying interactions between different levels of genomic data associated with cancer clinical outcomes using grammatical evolution neural network. BioData Min 2013; 6:23. [PMID: 24359638 PMCID: PMC3912499 DOI: 10.1186/1756-0381-6-23] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2013] [Accepted: 11/27/2013] [Indexed: 12/19/2022] Open
Abstract
Background Gene expression profiles have been broadly used in cancer research as a diagnostic or prognostic signature for the clinical outcome prediction such as stage, grade, metastatic status, recurrence, and patient survival, as well as to potentially improve patient management. However, emerging evidence shows that gene expression-based prediction varies between independent data sets. One possible explanation of this effect is that previous studies were focused on identifying genes with large main effects associated with clinical outcomes. Thus, non-linear interactions without large individual main effects would be missed. The other possible explanation is that gene expression as a single level of genomic data is insufficient to explain the clinical outcomes of interest since cancer can be dysregulated by multiple alterations through genome, epigenome, transcriptome, and proteome levels. In order to overcome the variability of diagnostic or prognostic predictors from gene expression alone and to increase its predictive power, we need to integrate multi-levels of genomic data and identify interactions between them associated with clinical outcomes. Results Here, we proposed an integrative framework for identifying interactions within/between multi-levels of genomic data associated with cancer clinical outcomes using the Grammatical Evolution Neural Networks (GENN). In order to demonstrate the validity of the proposed framework, ovarian cancer data from TCGA was used as a pilot task. We found not only interactions within a single genomic level but also interactions between multi-levels of genomic data associated with survival in ovarian cancer. Notably, the integration model from different levels of genomic data achieved 72.89% balanced accuracy and outperformed the top models with any single level of genomic data. Conclusions Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different levels of genomic data is expected to provide guidance for improved prognostic biomarkers and individualized therapies.
Collapse
Affiliation(s)
| | | | | | - Marylyn D Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park PA, USA.
| |
Collapse
|
5354
|
Jakupciak JP, Wells JM, Karalus RJ, Pawlowski DR, Lin JS, Feldman AB. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis. J Nucleic Acids 2013; 2013:801505. [PMID: 24455204 PMCID: PMC3877622 DOI: 10.1155/2013/801505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 10/01/2013] [Accepted: 10/02/2013] [Indexed: 11/18/2022] Open
Abstract
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Collapse
Affiliation(s)
| | | | | | | | - Jeffrey S. Lin
- The Johns Hopkins University, Applied Physics Laboratory, 11100 Johns Hopkins Road, Laurel, MD 20723, USA
| | - Andrew B. Feldman
- The Johns Hopkins University, Applied Physics Laboratory, 11100 Johns Hopkins Road, Laurel, MD 20723, USA
| |
Collapse
|
5355
|
Manning T, Sleator RD, Walsh P. Biologically inspired intelligent decision making: a commentary on the use of artificial neural networks in bioinformatics. Bioengineered 2013; 5:80-95. [PMID: 24335433 PMCID: PMC4049912 DOI: 10.4161/bioe.26997] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Artificial neural networks (ANNs) are a class of powerful machine learning models for classification and function approximation which have analogs in nature. An ANN learns to map stimuli to responses through repeated evaluation of exemplars of the mapping. This learning approach results in networks which are recognized for their noise tolerance and ability to generalize meaningful responses for novel stimuli. It is these properties of ANNs which make them appealing for applications to bioinformatics problems where interpretation of data may not always be obvious, and where the domain knowledge required for deductive techniques is incomplete or can cause a combinatorial explosion of rules. In this paper, we provide an introduction to artificial neural network theory and review some interesting recent applications to bioinformatics problems.
Collapse
Affiliation(s)
- Timmy Manning
- Department of Computer Science; Cork Institute of Technology; Cork, Ireland
| | - Roy D Sleator
- Department of Biological Sciences; Cork Institute of Technology; Cork, Ireland
| | - Paul Walsh
- NSilico Ltd; Rubicon Innovation Centre; Cork, Ireland
| |
Collapse
|
5356
|
Kim J, Kim JW, Kim Y, Lee KA. Differential association of RANTES-403 and IL-1B-1464 polymorphisms on histological subtypes in male Korean patients with gastric cancer. Tumour Biol 2013; 35:3765-70. [PMID: 24323564 DOI: 10.1007/s13277-013-1498-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Accepted: 11/29/2013] [Indexed: 12/21/2022] Open
Abstract
The aims of this study were to elucidate the association between RANTES-403 and an increased risk of gastric cancer in Korean males and to investigate the gene-gene interaction between IL-1B and RANTES. In total, 218 male patients with gastric cancer (114 diffuse types, 97 intestinal types, and 7 mixed types) and 377 male controls were included. RANTES-403 was genotyped, and age-adjusted odds ratios (ORs) with 95% confidence intervals (CIs) were estimated by logistic regression. A multifactor dimensionality reduction (MDR) test with three-way split interval validation confirmed by likelihood ratio and permutation analysis was carried out. A significant increase in the risk of gastric cancer for the intestinal-type group was observed for IL-1B-1464G carriers (OR = 2.535; 95% CI = 1.121-5.732; P = 0.02) as well as for those with IL-1B-1464 CG (OR = 2.342; 95% CI = 0.998-5.500; P = 0.05) or IL-1B-1464 GG (OR = 2.819; 95% CI = 1.170-6.793; P = 0.02). For the RANTES-403 genotype, there was no significant difference in the risk of gastric cancer between the overall gastric cancer and the control groups. When further stratified according to histological types, RANTES-403A carriers (OR = 1.743; 95% CI = 1.086-2.798; P = 0.021) or heterozygotes (OR = 1.791; 95% CI = 1.092-2.935; P = 0.021) showed increased risk for developing diffuse-type gastric cancer. MDR revealed a three-way locus-locus interaction between RANTES-403AA, IL-1B-1464GG, and IL-1B-511CT for diffuse-type gastric cancer in Korean males. We demonstrated that RANTES-403 was significantly associated with the risk of developing diffuse-type gastric cancer in men and found a possible gene-gene interaction between RANTES and IL-1B polymorphisms in gastric cancer carcinogenesis.
Collapse
Affiliation(s)
- Juwon Kim
- Department of Laboratory Wonju Severance Christian Hospital, Yonsei University Wonju College of Medicine, Wonju, Korea
| | | | | | | |
Collapse
|
5357
|
McKinney BA, White BC, Grill DE, Li PW, Kennedy RB, Poland GA, Oberg AL. ReliefSeq: a gene-wise adaptive-K nearest-neighbor feature selection tool for finding gene-gene interactions and main effects in mRNA-Seq gene expression data. PLoS One 2013; 8:e81527. [PMID: 24339943 PMCID: PMC3858248 DOI: 10.1371/journal.pone.0081527] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Accepted: 10/14/2013] [Indexed: 11/29/2022] Open
Abstract
Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php.
Collapse
Affiliation(s)
- Brett A. McKinney
- Tandy School of Computer Science, Department of Mathematics, University of Tulsa, Tulsa, Oklahoma, United States of America
- Laureate Institute for Brain Research, Tulsa, Oklahoma, United States of America
| | - Bill C. White
- Tandy School of Computer Science, Department of Mathematics, University of Tulsa, Tulsa, Oklahoma, United States of America
| | - Diane E. Grill
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Peter W. Li
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Richard B. Kennedy
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
- Department of Medicine, Mayo Clinic, Rochester, Minnesota, United States of America
- Program in Translational Immunovirology and Biodefense, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Gregory A. Poland
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
- Department of Medicine, Mayo Clinic, Rochester, Minnesota, United States of America
- Program in Translational Immunovirology and Biodefense, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Ann L. Oberg
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
- Mayo Clinic Vaccine Research Group, Mayo Clinic, Rochester, Minnesota, United States of America
| |
Collapse
|
5358
|
Li W, Chen L, Li W, Qu X, He W, He Y, Feng C, Jia X, Zhou Y, Lv J, Liang B, Chen B, Jiang J. Unraveling the characteristics of microRNA regulation in the developmental and aging process of the human brain. BMC Med Genomics 2013; 6:55. [PMID: 24321625 PMCID: PMC3878884 DOI: 10.1186/1755-8794-6-55] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 12/03/2013] [Indexed: 01/06/2023] Open
Abstract
Background Structure and function of the human brain are subjected to dramatic changes during its development and aging. Studies have demonstrated that microRNAs (miRNAs) play an important role in the regulation of brain development and have a significant impact on brain aging and neurodegeneration. However, the underling molecular mechanisms are not well understood. In general, development and aging are conventionally studied separately, which may not completely address the physiological mechanism over the entire lifespan. Thus, we study the regulatory effect between miRNAs and mRNAs in the developmental and aging process of the human brain by integrating miRNA and mRNA expression profiles throughout the lifetime. Methods In this study, we integrated miRNA and mRNA expression profiles in the human brain across lifespan from the network perspective. First, we chose the age-related miRNAs by polynomial regression models. Second, we constructed the bipartite miRNA-mRNA regulatory network by pair-wise correlation coefficient analysis between miRNA and mRNA expression profiles. At last, we constructed the miRNA-miRNA synergistic network from the miRNA-mRNA network, considering not only the enrichment of target genes but also GO function enrichment of co-regulated target genes. Results We found that the average degree of age-related miRNAs was significantly higher than that of non age-related miRNAs in the miRNA-mRNA regulatory network. The topological features between age-related and non age-related miRNAs were significantly different, and 34 reliable age-related miRNA synergistic modules were identified using Cfinder in the miRNA-miRNA synergistic network. The synergistic regulations of module genes were verified by reviewing miRNA target databases and previous studies. Conclusions Age-related miRNAs play a more important role than non age-related mrRNAs in the developmental and aging process of the human brain. The age-related miRNAs have synergism, which tend to work together as small modules. These results may provide a new insight into the regulation of miRNAs in the developmental and aging process of the human brain.
Collapse
Affiliation(s)
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5359
|
Hopp L, Lembcke K, Binder H, Wirth H. Portraying the Expression Landscapes of B-CellLymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes. BIOLOGY 2013; 2:1411-37. [PMID: 24833231 PMCID: PMC4009791 DOI: 10.3390/biology2041411] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 10/01/2013] [Accepted: 11/05/2013] [Indexed: 01/03/2023]
Abstract
We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics.
Collapse
Affiliation(s)
- Lydia Hopp
- Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstr. 16-18, Leipzig 04107, Germany.
| | - Kathrin Lembcke
- Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstr. 16-18, Leipzig 04107, Germany.
| | - Hans Binder
- Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstr. 16-18, Leipzig 04107, Germany.
| | - Henry Wirth
- Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstr. 16-18, Leipzig 04107, Germany.
| |
Collapse
|
5360
|
Maguire E, Rocca-Serra P, Sansone SA, Davies J, Chen M. Visual compression of workflow visualizations with automated detection of macro motifs. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2013; 19:2576-2585. [PMID: 24051824 DOI: 10.1109/tvcg.2013.225] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
This paper is concerned with the creation of 'macros' in workflow visualization as a support tool to increase the efficiency of data curation tasks. We propose computation of candidate macros based on their usage in large collections of workflows in data repositories. We describe an efficient algorithm for extracting macro motifs from workflow graphs. We discovered that the state transition information, used to identify macro candidates, characterizes the structural pattern of the macro and can be harnessed as part of the visual design of the corresponding macro glyph. This facilitates partial automation and consistency in glyph design applicable to a large set of macro glyphs. We tested this approach against a repository of biological data holding some 9,670 workflows and found that the algorithmically generated candidate macros are in keeping with domain expert expectations.
Collapse
|
5361
|
Babaei A, Siwiec RM, Kern M, Ward BD, Li SJ, Shaker R. Intrinsic functional connectivity of the brain swallowing network during subliminal esophageal acid stimulation. Neurogastroenterol Motil 2013; 25:992-e779. [PMID: 24251873 PMCID: PMC3864683 DOI: 10.1111/nmo.12238] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Accepted: 08/26/2013] [Indexed: 01/11/2023]
Abstract
BACKGROUND Intrinsic synchronous fluctuations of the functional magnetic resonance imaging signal are indicative of the underlying 'functional connectivity' (FC) and serve as a technique to study dynamics of the neuronal networks of the human brain. Earlier studies have characterized the functional connectivity of a distributed network of brain regions involved in swallowing, called brain swallowing network (BSN). The potential modulatory effect of esophageal afferent signals on the BSN, however, has not been systematically studied. METHODS Fourteen healthy volunteers underwent steady state functional magnetic resonance imaging across three conditions: (i) transnasal catheter placed in the esophagus without infusion; (ii) buffer solution infused at 1 mL/min; and (iii) acidic solution infused at 1 mL/min. Data were preprocessed according to the standard FC analysis pipeline. We determined the correlation coefficient values of pairs of brain regions involved in swallowing and calculated average group FC matrices across conditions. Effects of subliminal esophageal acidification and nasopharyngeal intubation were determined. KEY RESULTS Subliminal esophageal acid stimulation augmented the overall FC of the right anterior insula and specifically the FC to the left inferior parietal lobule. Conscious stimulation by nasopharyngeal intubation reduced the overall FC of the right posterior insula, particularly the FC to the right prefrontal operculum. CONCLUSIONS & INFERENCES The FC of BSN is amenable to modulation by sensory input. The modulatory effect of sensory pharyngoesophageal stimulation on BSN is mainly mediated through changes in the FC of the insula. The alteration induced by subliminal visceral esophageal acid stimulation is in different insular connections compared with that of conscious somatic pharyngeal stimulation.
Collapse
Affiliation(s)
- Arash Babaei
- Gastroenterology and Hepatology, Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Robert M. Siwiec
- Gastroenterology and Hepatology, Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Mark Kern
- Gastroenterology and Hepatology, Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, USA
| | - B. Douglas Ward
- Department of Biophysics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Shi-Jiang Li
- Department of Biophysics, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Reza Shaker
- Gastroenterology and Hepatology, Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, USA
| |
Collapse
|
5362
|
Flores JL, Inza I, Larrañaga P, Calvo B. A new measure for gene expression biclustering based on non-parametric correlation. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2013; 112:367-397. [PMID: 24079964 DOI: 10.1016/j.cmpb.2013.07.025] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/16/2012] [Revised: 06/14/2013] [Accepted: 07/26/2013] [Indexed: 06/02/2023]
Abstract
BACKGROUND One of the emerging techniques for performing the analysis of the DNA microarray data known as biclustering is the search of subsets of genes and conditions which are coherently expressed. These subgroups provide clues about the main biological processes. Until now, different approaches to this problem have been proposed. Most of them use the mean squared residue as quality measure but relevant and interesting patterns can not be detected such as shifting, or scaling patterns. Furthermore, recent papers show that there exist new coherence patterns involved in different kinds of cancer and tumors such as inverse relationships between genes which can not be captured. RESULTS The proposed measure is called Spearman's biclustering measure (SBM) which performs an estimation of the quality of a bicluster based on the non-linear correlation among genes and conditions simultaneously. The search of biclusters is performed by using a evolutionary technique called estimation of distribution algorithms which uses the SBM measure as fitness function. This approach has been examined from different points of view by using artificial and real microarrays. The assessment process has involved the use of quality indexes, a set of bicluster patterns of reference including new patterns and a set of statistical tests. It has been also examined the performance using real microarrays and comparing to different algorithmic approaches such as Bimax, CC, OPSM, Plaid and xMotifs. CONCLUSIONS SBM shows several advantages such as the ability to recognize more complex coherence patterns such as shifting, scaling and inversion and the capability to selectively marginalize genes and conditions depending on the statistical significance.
Collapse
Affiliation(s)
- Jose L Flores
- Intelligent Systems Group, Department of Computer Sciences and Artificial Intelligence, University of the Basque Country, P.O. Box 649, 20080 Donostia - San Sebastian, Spain.
| | | | | | | |
Collapse
|
5363
|
Fierro-Monti I, Echeverria P, Racle J, Hernandez C, Picard D, Quadroni M. Dynamic impacts of the inhibition of the molecular chaperone Hsp90 on the T-cell proteome have implications for anti-cancer therapy. PLoS One 2013; 8:e80425. [PMID: 24312219 PMCID: PMC3842317 DOI: 10.1371/journal.pone.0080425] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 10/02/2013] [Indexed: 11/19/2022] Open
Abstract
The molecular chaperone Hsp90-dependent proteome represents a complex protein network of critical biological and medical relevance. Known to associate with proteins with a broad variety of functions termed clients, Hsp90 maintains key essential and oncogenic signalling pathways. Consequently, Hsp90 inhibitors are being tested as anti-cancer drugs. Using an integrated systematic approach to analyse the effects of Hsp90 inhibition in T-cells, we quantified differential changes in the Hsp90-dependent proteome, Hsp90 interactome, and a selection of the transcriptome. Kinetic behaviours in the Hsp90-dependent proteome were assessed using a novel pulse-chase strategy (Fierro-Monti et al., accompanying article), detecting effects on both protein stability and synthesis. Global and specific dynamic impacts, including proteostatic responses, are due to direct inhibition of Hsp90 as well as indirect effects. As a result, a decrease was detected in most proteins that changed their levels, including known Hsp90 clients. Most likely, consequences of the role of Hsp90 in gene expression determined a global reduction in net de novo protein synthesis. This decrease appeared to be greater in magnitude than a concomitantly observed global increase in protein decay rates. Several novel putative Hsp90 clients were validated, and interestingly, protein families with critical functions, particularly the Hsp90 family and cofactors themselves as well as protein kinases, displayed strongly increased decay rates due to Hsp90 inhibitor treatment. Remarkably, an upsurge in survival pathways, involving molecular chaperones and several oncoproteins, and decreased levels of some tumour suppressors, have implications for anti-cancer therapy with Hsp90 inhibitors. The diversity of global effects may represent a paradigm of mechanisms that are operating to shield cells from proteotoxic stress, by promoting pro-survival and anti-proliferative functions. Data are available via ProteomeXchange with identifier PXD000537.
Collapse
Affiliation(s)
- Ivo Fierro-Monti
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Pablo Echeverria
- Département de Biologie Cellulaire, Université de Genève, Genève, Switzerland
| | - Julien Racle
- Laboratory of Computational Systems Biotechnology, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Vital-IT Group, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Celine Hernandez
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- Vital-IT Group, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Didier Picard
- Département de Biologie Cellulaire, Université de Genève, Genève, Switzerland
| | - Manfredo Quadroni
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
5364
|
Imboden M, Probst-Hensch NM. Biobanking across the phenome - at the center of chronic disease research. BMC Public Health 2013; 13:1094. [PMID: 24274136 PMCID: PMC4222669 DOI: 10.1186/1471-2458-13-1094] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 09/25/2013] [Indexed: 11/10/2022] Open
Abstract
Background Recognized public health relevant risk factors such as obesity, physical inactivity, smoking or air pollution are common to many non-communicable diseases (NCDs). NCDs cluster and co-morbidities increase in parallel to age. Pleiotropic genes and genetic variants have been identified by genome-wide association studies (GWAS) linking NCD entities hitherto thought to be distant in etiology. These different lines of evidence suggest that NCD disease mechanisms are in part shared. Discussion Identification of common exogenous and endogenous risk patterns may promote efficient prevention, an urgent need in the light of the global NCD epidemic. The prerequisite to investigate causal risk patterns including biologic, genetic and environmental factors across different NCDs are well characterized cohorts with associated biobanks. Prospectively collected data and biospecimen from subjects of various age, sociodemographic, and cultural groups, both healthy and affected by one or more NCD, are essential for exploring biologic mechanisms and susceptibilities interlinking different environmental and lifestyle exposures, co-morbidities, as well as cellular senescence and aging. A paradigm shift in the research activities can currently be observed, moving from focused investigations on the effect of a single risk factor on an isolated health outcome to a more comprehensive assessment of risk patterns and a broader phenome approach. Though important methodological and analytical challenges need to be resolved, the ongoing international efforts to establish large-scale population-based biobank cohorts are a critical basis for moving NCD disease etiology forward. Summary Future epidemiologic and public health research should aim at sustaining a comprehensive systems view on health and disease. The political and public discussions about the utilitarian aspect of investing in and contributing to cohort and biobank research are essential and are indirectly linked to the achievement of public health programs effectively addressing the global NCD epidemic.
Collapse
Affiliation(s)
- Medea Imboden
- Swiss Tropical and Public Health Institute, Basel, Switzerland.
| | | |
Collapse
|
5365
|
Goodloe R, Brown-Gentry K, Gillani NB, Jin H, Mayo P, Allen M, McClellan B, Boston J, Sutcliffe C, Schnetz-Boutaud N, Dilks HH, Crawford DC. Lipid trait-associated genetic variation is associated with gallstone disease in the diverse Third National Health and Nutrition Examination Survey (NHANES III). BMC MEDICAL GENETICS 2013; 14:120. [PMID: 24256507 PMCID: PMC3870971 DOI: 10.1186/1471-2350-14-120] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Accepted: 10/22/2013] [Indexed: 01/10/2023]
Abstract
BACKGROUND Gallstone disease is one of the most common digestive disorders, affecting more than 30 million Americans. Previous twin studies suggest a heritability of 25% for gallstone formation. To date, one genome-wide association study (GWAS) has been performed in a population of European-descent. Several candidate gene studies have been performed in various populations, but most have been inconclusive. Given that gallstones consist of up to 80% cholesterol, we hypothesized that common genetic variants associated with high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglycerides (TG) would also be associated with gallstone risk. METHODS To test this hypothesis, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study as part of the Population Architecture using Genomics and Epidemiology (PAGE) study performed tests of association between 49 GWAS-identified lipid trait SNPs and gallstone disease in non-Hispanic whites (446 cases and 1,962 controls), non-Hispanic blacks (179 cases and 1,540 controls), and Mexican Americans (227 cases and 1,478 controls) ascertained for the population-based Third National Health and Nutrition Examination Survey (NHANES III). RESULTS At a liberal significance threshold of 0.05, five, four, and four SNP(s) were associated with disease risk in non-Hispanic whites, non-Hispanic blacks, and Mexican Americans, respectively. No one SNP was associated with gallstone disease risk in all three racial/ethnic groups. The most significant association was observed for ABCG5 rs6756629 in non-Hispanic whites [odds ratio (OR) = 1.89; 95% confidence interval (CI) = 1.44-2.49; p = 0.0001). ABCG5 rs6756629 is in strong linkage disequilibrium with rs11887534 (D19H), a variant previously associated with gallstone disease risk in populations of European-descent. CONCLUSIONS We replicated a previously associated variant for gallstone disease risk in non-Hispanic whites. Further discovery and fine-mapping efforts in diverse populations are needed to fully describe the genetic architecture of gallstone disease risk in humans.
Collapse
Affiliation(s)
- Robert Goodloe
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Kristin Brown-Gentry
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Niloufar B Gillani
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Hailing Jin
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Ping Mayo
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Melissa Allen
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Bob McClellan
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Jonathan Boston
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Cara Sutcliffe
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Nathalie Schnetz-Boutaud
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
| | - Holli H Dilks
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, USA
| | - Dana C Crawford
- Center for Human Genetics Research, Vanderbilt University, 2215 Garland Avenue, 519 Light Hall, Nashville, Tennessee 37232, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
5366
|
Hamilton JJ, Reed JL. Software platforms to facilitate reconstructing genome-scale metabolic networks. Environ Microbiol 2013; 16:49-59. [PMID: 24148076 DOI: 10.1111/1462-2920.12312] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2013] [Accepted: 10/12/2013] [Indexed: 12/24/2022]
Abstract
System-level analyses of microbial metabolism are facilitated by genome-scale reconstructions of microbial biochemical networks. A reconstruction provides a structured representation of the biochemical transformations occurring within an organism, as well as the genes necessary to carry out these transformations, as determined by the annotated genome sequence and experimental data. Network reconstructions also serve as platforms for constraint-based computational techniques, which facilitate biological studies in a variety of applications, including evaluation of network properties, metabolic engineering and drug discovery. Bottom-up metabolic network reconstructions have been developed for dozens of organisms, but until recently, the pace of reconstruction has failed to keep up with advances in genome sequencing. To address this problem, a number of software platforms have been developed to automate parts of the reconstruction process, thereby alleviating much of the manual effort previously required. Here, we review four such platforms in the context of established guidelines for network reconstruction. While many steps of the reconstruction process have been successfully automated, some manual evaluation of the results is still required to ensure a high-quality reconstruction. Widespread adoption of these platforms by the scientific community is underway and will be further enabled by exchangeable formats across platforms.
Collapse
Affiliation(s)
- Joshua J Hamilton
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | | |
Collapse
|
5367
|
Blohm P, Frishman G, Smialowski P, Goebels F, Wachinger B, Ruepp A, Frishman D. Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis. Nucleic Acids Res 2013; 42:D396-400. [PMID: 24214996 PMCID: PMC3965096 DOI: 10.1093/nar/gkt1079] [Citation(s) in RCA: 99] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Knowledge about non-interacting proteins (NIPs) is important for training the algorithms to predict protein-protein interactions (PPIs) and for assessing the false positive rates of PPI detection efforts. We present the second version of Negatome, a database of proteins and protein domains that are unlikely to engage in physical interactions (available online at http://mips.helmholtz-muenchen.de/proj/ppi/negatome). Negatome is derived by manual curation of literature and by analyzing three-dimensional structures of protein complexes. The main methodological innovation in Negatome 2.0 is the utilization of an advanced text mining procedure to guide the manual annotation process. Potential non-interactions were identified by a modified version of Excerbt, a text mining tool based on semantic sentence analysis. Manual verification shows that nearly a half of the text mining results with the highest confidence values correspond to NIP pairs. Compared to the first version the contents of the database have grown by over 300%.
Collapse
Affiliation(s)
- Philipp Blohm
- Institute for Bioinformatics and Systems Biology/MIPS, HMGU - German Research Center for Environmental Health, Ingolstaedter Landstrasse 1, 85764 Neuherberg, Germany, Clueda AG, Elsenheimerstraße 59, 80687 Munich, Germany and Department of Genome Oriented Bioinformatics, Technische Universitaet Muenchen Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | | | | | | | | | | | | |
Collapse
|
5368
|
Bysani M, Wallerman O, Bornelöv S, Zatloukal K, Komorowski J, Wadelius C. ChIP-seq in steatohepatitis and normal liver tissue identifies candidate disease mechanisms related to progression to cancer. BMC Med Genomics 2013; 6:50. [PMID: 24206787 PMCID: PMC3831757 DOI: 10.1186/1755-8794-6-50] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Accepted: 10/31/2013] [Indexed: 02/06/2023] Open
Abstract
Background Steatohepatitis occurs in alcoholic liver disease and may progress to liver cirrhosis and hepatocellular carcinoma. Its molecular pathogenesis is to a large degree unknown. Histone modifications play a key role in transcriptional regulations as marks for silencing and activation of gene expression and as marks for functional elements. Many transcription factors (TFs) are crucial for the control of the genes involved in metabolism, and abnormality in their function may lead to disease. Methods We performed ChIP-seq of the histone modifications H3K4me1, H3K4me3 and H3K27ac and a candidate transcription factor (USF1) in liver tissue from patients with steatohepatitis and normal livers and correlated results to mRNA-expression and genotypes. Results We found several regions that are differentially enriched for histone modifications between disease and normal tissue, and qRT-PCR results indicated that the expression of the tested genes strongly correlated with differential enrichment of histone modifications but is independent of USF1 enrichment. By gene ontology analysis of differentially modified genes we found many disease associated genes, some of which had previously been implicated in the etiology of steatohepatitis. Importantly, the genes associated to the strongest histone peaks in the patient were over-represented in cancer specific pathways suggesting that the tissue was on a path to develop to cancer, a common complication to the disease. We also found several novel SNPs and GWAS catalogue SNPs that are candidates to be functional and therefore needs further study. Conclusion In summary we find that analysis of chromatin features in tissue samples provides insight into disease mechanisms.
Collapse
Affiliation(s)
| | | | | | | | | | - Claes Wadelius
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, BMC, Uppsala University, PO BOX 815, Uppsala, SE 751 08, Sweden.
| |
Collapse
|
5369
|
Marzukhi S, Browne WN, Zhang M. Adaptive artificial datasets through learning classifier systems for classification tasks. EVOLUTIONARY INTELLIGENCE 2013. [DOI: 10.1007/s12065-013-0094-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
5370
|
Tanos T, Sflomos G, Echeverria PC, Ayyanan A, Gutierrez M, Delaloye JF, Raffoul W, Fiche M, Dougall W, Schneider P, Yalcin-Ozuysal O, Brisken C. Progesterone/RANKL is a major regulatory axis in the human breast. Sci Transl Med 2013; 5:182ra55. [PMID: 23616122 DOI: 10.1126/scitranslmed.3005654] [Citation(s) in RCA: 148] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Estrogens and progesterones are major drivers of breast development but also promote carcinogenesis in this organ. Yet, their respective roles and the mechanisms underlying their action in the human breast are unclear. Receptor activator of nuclear factor κB ligand (RANKL) has been identified as a pivotal paracrine mediator of progesterone function in mouse mammary gland development and mammary carcinogenesis. Whether the factor has the same role in humans is of clinical interest because an inhibitor for RANKL, denosumab, is already used for the treatment of bone disease and might benefit breast cancer patients. We show that progesterone receptor (PR) signaling failed to induce RANKL in PR(+) breast cancer cell lines and in dissociated, cultured breast epithelial cells. In clinical specimens from healthy donors and intact breast tissue microstructures, hormone response was maintained and RANKL expression was under progesterone control, which increased RNA stability. RANKL was sufficient to trigger cell proliferation and was required for progesterone-induced proliferation. The findings were validated in vivo where RANKL protein expression in the breast epithelium correlated with serum progesterone levels and the protein was expressed in a subset of luminal cells that express PR. Thus, important hormonal control mechanisms are conserved across species, making RANKL a potential target in breast cancer treatment and prevention.
Collapse
Affiliation(s)
- Tamara Tanos
- Swiss Institute for Experimental Cancer Research, National Center of Competence in Research Molecular Oncology, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5371
|
Yu W, Yan Y, Liu Q, Wang J, Jiang Z. Predicting drug–target interaction networks of human diseases based on multiple feature information. Pharmacogenomics 2013; 14:1701-7. [DOI: 10.2217/pgs.13.162] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Aim: Drug–target interaction is crucial in the drug design process. Predicting the drug–target interaction networks of important human diseases can provide valuable clues for the characterization of the mechanism of action of diseases. Materials & methods: A new graph-based semisupervised learning (GBSSL) method is proposed to predict the drug–target interaction networks involved in 13 types of diseases. According to the method, each drug–target pair is initially described with different biological features including sequence, structure, function and network topology information. Then, the optimal feature selection procedures based on the relief and minimum redundancy maximum relevance are executed, respectively. Finally, unknown drug–target interactions can be predicted by the GBSSL method effectively. Results: The proposed method can effectively predict drug–target interactions (with a receiver operating characteristic score of 94.8% and a precision-recall score of 76.5%). Conclusion: Compared with the existing methods, the GBSSL method provides an efficient means of generating optimal features obtained from the combination of multiple sources of feature information. Original submitted 22 April 2013; Revision submitted 14 August 2013.
Collapse
Affiliation(s)
- Weiming Yu
- Department of Computer Science & Technology, East China Normal University, 200241, Shanghai, China
| | - Yan Yan
- Department of Computer Science & Technology, East China Normal University, 200241, Shanghai, China
| | - Qing Liu
- Department of Computer Science & Technology, East China Normal University, 200241, Shanghai, China
| | - Junxiang Wang
- Department of Computer Science & Technology, East China Normal University, 200241, Shanghai, China
| | - Zhenran Jiang
- Department of Computer Science & Technology, East China Normal University, 200241, Shanghai, China
| |
Collapse
|
5372
|
Bolotin E, Armendariz A, Kim K, Heo SJ, Boffelli D, Tantisira K, Rotter JI, Krauss RM, Medina MW. Statin-induced changes in gene expression in EBV-transformed and native B-cells. Hum Mol Genet 2013; 23:1202-10. [PMID: 24179175 DOI: 10.1093/hmg/ddt512] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Human lymphoblastoid cell lines (LCLs), generated through Epstein-Barr Virus (EBV) transformation of B-lymphocytes (B-cells), are a commonly used model system for identifying genetic influences on human diseases and on drug responses. We have previously used LCLs to examine the cellular effects of genetic variants that modulate the efficacy of statins, the most prescribed class of cholesterol-lowering drugs used for the prevention and treatment of cardiovascular disease. However, statin-induced gene expression differences observed in LCLs may be influenced by their transformation, and thus differ from those observed in native B-cells. To assess this possibility, we prepared LCLs and purified B-cells from the same donors, and compared mRNA profiles after 24 h incubation with simvastatin (2 µm) or sham buffer. Genes involved in cholesterol metabolism were similarly regulated between the two cell types under both the statin and sham-treated conditions, and the statin-induced changes were significantly correlated. Genes whose expression differed between the native and transformed cells were primarily implicated in cell cycle, apoptosis and alternative splicing. We found that ChIP-seq signals for MYC and EBNA2 (an EBV transcriptional co-activator) were significantly enriched in the promoters of genes up-regulated in the LCLs compared with the B-cells, and could be involved in the regulation of cell cycle and alternative splicing. Taken together, the results support the use of LCLs for the study of statin effects on cholesterol metabolism, but suggest that drug effects on cell cycle, apoptosis and alternative splicing may be affected by EBV transformation. This dataset is now uploaded to GEO at the accession number GSE51444.
Collapse
Affiliation(s)
- Eugene Bolotin
- Children's Hospital Oakland Research Institute, 5700 Martin Luther King Jr. Way, Oakland, CA 94609, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
5373
|
Cheng XR, Cui XL, Zheng Y, Zhang GR, Li P, Huang H, Zhao YY, Bo XC, Wang SQ, Zhou WX, Zhang YX. Nodes and biological processes identified on the basis of network analysis in the brain of the senescence accelerated mice as an Alzheimer's disease animal model. Front Aging Neurosci 2013; 5:65. [PMID: 24194717 PMCID: PMC3810591 DOI: 10.3389/fnagi.2013.00065] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 10/10/2013] [Indexed: 12/11/2022] Open
Abstract
Harboring the behavioral and histopathological signatures of Alzheimer's disease (AD), senescence accelerated mouse-prone 8 (SAMP8) mice are currently considered a robust model for studying AD. However, the underlying mechanisms, prioritized pathways and genes in SAMP8 mice linked to AD remain unclear. In this study, we provide a biological interpretation of the molecular underpinnings of SAMP8 mice. Our results were derived from differentially expressed genes in the hippocampus and cerebral cortex of SAMP8 mice compared to age-matched SAMR1 mice at 2, 6, and 12 months of age using cDNA microarray analysis. On the basis of PPI, MetaCore and the co-expression network, we constructed a distinct genetic sub-network in the brains of SAMP8 mice. Next, we determined that the regulation of synaptic transmission and apoptosis were disrupted in the brains of SAMP8 mice. We found abnormal gene expression of RAF1, MAPT, PTGS2, CDKN2A, CAMK2A, NTRK2, AGER, ADRBK1, MCM3AP, and STUB1, which may have initiated the dysfunction of biological processes in the brains of SAMP8 mice. Specifically, we found microRNAs, including miR-20a, miR-17, miR-34a, miR-155, miR-18a, miR-22, miR-26a, miR-101, miR-106b, and miR-125b, that might regulate the expression of nodes in the sub-network. Taken together, these results provide new insights into the biological and genetic mechanisms of SAMP8 mice and add an important dimension to our understanding of the neuro-pathogenesis in SAMP8 mice from a systems perspective.
Collapse
Affiliation(s)
- Xiao-Rui Cheng
- Department of Neuroimmunopharmacology, Beijing Institute of Pharmacology and Toxicology Beijing, China
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5374
|
A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. BIOMED RESEARCH INTERNATIONAL 2013; 2013:432375. [PMID: 24228248 PMCID: PMC3818807 DOI: 10.1155/2013/432375] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 08/26/2013] [Accepted: 08/27/2013] [Indexed: 01/04/2023]
Abstract
Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.
Collapse
|
5375
|
Rudd J, Moore JH, Urbanowicz RJ. A Multi-Core Parallelization Strategy for Statistical Significance Testing in Learning Classifier Systems. EVOLUTIONARY INTELLIGENCE 2013; 6. [PMID: 24358057 DOI: 10.1007/s12065-013-0092-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Permutation-based statistics for evaluating the significance of class prediction, predictive attributes, and patterns of association have only appeared within the learning classifier system (LCS) literature since 2012. While still not widely utilized by the LCS research community, formal evaluations of test statistic confidence are imperative to large and complex real world applications such as genetic epidemiology where it is standard practice to quantify the likelihood that a seemingly meaningful statistic could have been obtained purely by chance. LCS algorithms are relatively computationally expensive on their own. The compounding requirements for generating permutation-based statistics may be a limiting factor for some researchers interested in applying LCS algorithms to real world problems. Technology has made LCS parallelization strategies more accessible and thus more popular in recent years. In the present study we examine the benefits of externally parallelizing a series of independent LCS runs such that permutation testing with cross validation becomes more feasible to complete on a single multi-core workstation. We test our python implementation of this strategy in the context of a simulated complex genetic epidemiological data mining problem. Our evaluations indicate that as long as the number of concurrent processes does not exceed the number of CPU cores, the speedup achieved is approximately linear.
Collapse
Affiliation(s)
- James Rudd
- Dartmouth College, 1 Medical Center Dr., Lebanon, NH 03755,USA,
| | - Jason H Moore
- Dartmouth College, 1 Medical Center Dr., Lebanon, NH 03755,USA,
| | | |
Collapse
|
5376
|
Winham SJ, Biernacka JM. Gene-environment interactions in genome-wide association studies: current approaches and new directions. J Child Psychol Psychiatry 2013; 54:1120-34. [PMID: 23808649 PMCID: PMC3829379 DOI: 10.1111/jcpp.12114] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/03/2013] [Indexed: 01/20/2023]
Abstract
BACKGROUND Complex psychiatric traits have long been thought to be the result of a combination of genetic and environmental factors, and gene-environment interactions are thought to play a crucial role in behavioral phenotypes and the susceptibility and progression of psychiatric disorders. Candidate gene studies to investigate hypothesized gene-environment interactions are now fairly common in human genetic research, and with the shift toward genome-wide association studies, genome-wide gene-environment interaction studies are beginning to emerge. METHODS We summarize the basic ideas behind gene-environment interaction, and provide an overview of possible study designs and traditional analysis methods in the context of genome-wide analysis. We then discuss novel approaches beyond the traditional strategy of analyzing the interaction between the environmental factor and each polymorphism individually. RESULTS Two-step filtering approaches that reduce the number of polymorphisms tested for interactions can substantially increase the power of genome-wide gene-environment studies. New analytical methods including data-mining approaches, and gene-level and pathway-level analyses, also have the capacity to improve our understanding of how complex genetic and environmental factors interact to influence psychologic and psychiatric traits. Such methods, however, have not yet been utilized much in behavioral and mental health research. CONCLUSIONS Although methods to investigate gene-environment interactions are available, there is a need for further development and extension of these methods to identify gene-environment interactions in the context of genome-wide association studies. These novel approaches need to be applied in studies of psychology and psychiatry.
Collapse
Affiliation(s)
- Stacey J Winham
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester MN 55905
| | - Joanna M. Biernacka
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester MN 55905,Department of Psychiatry and Psychology, Mayo Clinic, Rochester MN 55905
| |
Collapse
|
5377
|
Roy S, Bhattacharyya DK, Kalita JK. CoBi: Pattern Based Co-Regulated Biclustering of Gene Expression Data. Pattern Recognit Lett 2013. [DOI: 10.1016/j.patrec.2013.03.018] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
5378
|
Mazandu GK, Mulder NJ. DaGO-Fun: tool for Gene Ontology-based functional analysis using term information content measures. BMC Bioinformatics 2013; 14:284. [PMID: 24067102 PMCID: PMC3849277 DOI: 10.1186/1471-2105-14-284] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 09/17/2013] [Indexed: 11/30/2022] Open
Abstract
Background The use of Gene Ontology (GO) data in protein analyses have largely contributed to
the improved outcomes of these analyses. Several GO semantic similarity measures
have been proposed in recent years and provide tools that allow the integration of
biological knowledge embedded in the GO structure into different biological
analyses. There is a need for a unified tool that provides the scientific
community with the opportunity to explore these different GO similarity measure
approaches and their biological applications. Results We have developed DaGO-Fun, an online tool available at
http://web.cbio.uct.ac.za/ITGOM, which incorporates many different
GO similarity measures for exploring, analyzing and comparing GO terms and
proteins within the context of GO. It uses GO data and UniProt proteins with their
GO annotations as provided by the Gene Ontology Annotation (GOA) project to
precompute GO term information content (IC), enabling rapid response to user
queries. Conclusions The DaGO-Fun online tool presents the advantage of integrating all the relevant
IC-based GO similarity measures, including topology- and annotation-based
approaches to facilitate effective exploration of these measures, thus enabling
users to choose the most relevant approach for their application. Furthermore,
this tool includes several biological applications related to GO semantic
similarity scores, including the retrieval of genes based on their GO annotations,
the clustering of functionally related genes within a set, and term enrichment
analysis.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Medical School, Observatory, Cape Town, 7925, South Africa.
| | | |
Collapse
|
5379
|
Gu P, Chen H. Modern bioinformatics meets traditional Chinese medicine. Brief Bioinform 2013; 15:984-1003. [DOI: 10.1093/bib/bbt063] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
|
5380
|
CoCiter: an efficient tool to infer gene function by assessing the significance of literature co-citation. PLoS One 2013; 8:e74074. [PMID: 24086311 PMCID: PMC3781068 DOI: 10.1371/journal.pone.0074074] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 07/30/2013] [Indexed: 01/17/2023] Open
Abstract
A routine approach to inferring functions for a gene set is by using function enrichment analysis based on GO, KEGG or other curated terms and pathways. However, such analysis requires the existence of overlapping genes between the query gene set and those annotated by GO/KEGG. Furthermore, GO/KEGG databases only maintain a very restricted vocabulary. Here, we have developed a tool called "CoCiter" based on literature co-citations to address the limitations in conventional function enrichment analysis. Co-citation analysis is widely used in ranking articles and predicting protein-protein interactions (PPIs). Our algorithm can further assess the co-citation significance of a gene set with any other user-defined gene sets, or with free terms. We show that compared with the traditional approaches, CoCiter is a more accurate and flexible function enrichment analysis method. CoCiter is freely available at www.picb.ac.cn/hanlab/cociter/.
Collapse
|
5381
|
Yu X, Sun S. Comparing a few SNP calling algorithms using low-coverage sequencing data. BMC Bioinformatics 2013; 14:274. [PMID: 24044377 PMCID: PMC3848615 DOI: 10.1186/1471-2105-14-274] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Accepted: 09/12/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many Single Nucleotide Polymorphism (SNP) calling programs have been developed to identify Single Nucleotide Variations (SNVs) in next-generation sequencing (NGS) data. However, low sequencing coverage presents challenges to accurate SNV identification, especially in single-sample data. Moreover, commonly used SNP calling programs usually include several metrics in their output files for each potential SNP. These metrics are highly correlated in complex patterns, making it extremely difficult to select SNPs for further experimental validations. RESULTS To explore solutions to the above challenges, we compare the performance of four SNP calling algorithm, SOAPsnp, Atlas-SNP2, SAMtools, and GATK, in a low-coverage single-sample sequencing dataset. Without any post-output filtering, SOAPsnp calls more SNVs than the other programs since it has fewer internal filtering criteria. Atlas-SNP2 has stringent internal filtering criteria; thus it reports the least number of SNVs. The numbers of SNVs called by GATK and SAMtools fall between SOAPsnp and Atlas-SNP2. Moreover, we explore the values of key metrics related to SNVs' quality in each algorithm and use them as post-output filtering criteria to filter out low quality SNVs. Under different coverage cutoff values, we compare four algorithms and calculate the empirical positive calling rate and sensitivity. Our results show that: 1) the overall agreement of the four calling algorithms is low, especially in non-dbSNPs; 2) the agreement of the four algorithms is similar when using different coverage cutoffs, except that the non-dbSNPs agreement level tends to increase slightly with increasing coverage; 3) SOAPsnp, SAMtools, and GATK have a higher empirical calling rate for dbSNPs compared to non-dbSNPs; and 4) overall, GATK and Atlas-SNP2 have a relatively higher positive calling rate and sensitivity, but GATK calls more SNVs. CONCLUSIONS Our results show that the agreement between different calling algorithms is relatively low. Thus, more caution should be used in choosing algorithms, setting filtering parameters, and designing validation studies. For reliable SNV calling results, we recommend that users employ more than one algorithm and use metrics related to calling quality and coverage as filtering criteria.
Collapse
Affiliation(s)
- Xiaoqing Yu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio 44106, USA.
| | | |
Collapse
|
5382
|
Wilhelm-Benartzi CS, Koestler DC, Karagas MR, Flanagan JM, Christensen BC, Kelsey KT, Marsit CJ, Houseman EA, Brown R. Review of processing and analysis methods for DNA methylation array data. Br J Cancer 2013; 109:1394-402. [PMID: 23982603 PMCID: PMC3777004 DOI: 10.1038/bjc.2013.496] [Citation(s) in RCA: 129] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 07/23/2013] [Accepted: 07/30/2013] [Indexed: 12/21/2022] Open
Abstract
The promise of epigenome-wide association studies and cancer-specific somatic DNA methylation changes in improving our understanding of cancer, coupled with the decreasing cost and increasing coverage of DNA methylation microarrays, has brought about a surge in the use of these technologies. Here, we aim to provide both a review of issues encountered in the processing and analysis of array-based DNA methylation data and a summary of the advantages of recent approaches proposed for handling those issues, focusing on approaches publicly available in open-source environments such as R and Bioconductor. We hope that the processing tools and analysis flowchart described herein will facilitate researchers to effectively use these powerful DNA methylation array-based platforms, thereby advancing our understanding of human health and disease.
Collapse
Affiliation(s)
- C S Wilhelm-Benartzi
- Epigenetics Unit, Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Ovarian Cancer Action Research Centre, Imperial College London, 4th floor IRDB, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - D C Koestler
- Section of Biostatistics and Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH 03755, USA
| | - M R Karagas
- Section of Biostatistics and Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH 03755, USA
| | - J M Flanagan
- Epigenetics Unit, Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Ovarian Cancer Action Research Centre, Imperial College London, 4th floor IRDB, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
| | - B C Christensen
- Section of Biostatistics and Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH 03755, USA
- Department of Pharmacology and Toxicology, Geisel School of Medicine at Dartmouth College, Hanover, NH 03755, USA
| | - K T Kelsey
- Department of Pathology and Laboratory Medicine, Brown University, Providence, RI, USA
- Department of Epidemiology, Brown University, Providence, RI, USA
| | - C J Marsit
- Section of Biostatistics and Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, NH 03755, USA
- Department of Pharmacology and Toxicology, Geisel School of Medicine at Dartmouth College, Hanover, NH 03755, USA
| | - E A Houseman
- Department of Public Health, Oregon State University, Corvallis, OR, USA
| | - R Brown
- Epigenetics Unit, Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Ovarian Cancer Action Research Centre, Imperial College London, 4th floor IRDB, Hammersmith Campus, Du Cane Road, London W12 0NN, UK
- Section of Molecular Pathology, Institute for Cancer Research, Sutton, UK
| |
Collapse
|
5383
|
Winterbach W, Mieghem PV, Reinders M, Wang H, Ridder DD. Topology of molecular interaction networks. BMC SYSTEMS BIOLOGY 2013; 7:90. [PMID: 24041013 PMCID: PMC4231395 DOI: 10.1186/1752-0509-7-90] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Accepted: 08/01/2013] [Indexed: 12/23/2022]
Abstract
Molecular interactions are often represented as network models which have become the common language of many areas of biology. Graphs serve as convenient mathematical representations of network models and have themselves become objects of study. Their topology has been intensively researched over the last decade after evidence was found that they share underlying design principles with many other types of networks.Initial studies suggested that molecular interaction network topology is related to biological function and evolution. However, further whole-network analyses did not lead to a unified view on what this relation may look like, with conclusions highly dependent on the type of molecular interactions considered and the metrics used to study them. It is unclear whether global network topology drives function, as suggested by some researchers, or whether it is simply a byproduct of evolution or even an artefact of representing complex molecular interaction networks as graphs.Nevertheless, network biology has progressed significantly over the last years. We review the literature, focusing on two major developments. First, realizing that molecular interaction networks can be naturally decomposed into subsystems (such as modules and pathways), topology is increasingly studied locally rather than globally. Second, there is a move from a descriptive approach to a predictive one: rather than correlating biological network topology to generic properties such as robustness, it is used to predict specific functions or phenotypes.Taken together, this change in focus from globally descriptive to locally predictive points to new avenues of research. In particular, multi-scale approaches are developments promising to drive the study of molecular interaction networks further.
Collapse
Affiliation(s)
- Wynand Winterbach
- Network Architectures and Services, Department of Intelligent Systems, Faculty of
Electrical Engineering, Mathematics and Computer Science, Delft University of
Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands
- Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty of Electrical
Engineering, Mathematics and Computer Science, Delft University of Technology,
P.O. Box 5031, 2600 GA Delft, The Netherlands
| | - Piet Van Mieghem
- Network Architectures and Services, Department of Intelligent Systems, Faculty of
Electrical Engineering, Mathematics and Computer Science, Delft University of
Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands
| | - Marcel Reinders
- Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty of Electrical
Engineering, Mathematics and Computer Science, Delft University of Technology,
P.O. Box 5031, 2600 GA Delft, The Netherlands
- Netherlands Bioinformatics Center, 6500 HB Nijmegen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, 2600 GA Delft, The
Netherlands
| | - Huijuan Wang
- Network Architectures and Services, Department of Intelligent Systems, Faculty of
Electrical Engineering, Mathematics and Computer Science, Delft University of
Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands
| | - Dick de Ridder
- Delft Bioinformatics Lab, Department of Intelligent Systems, Faculty of Electrical
Engineering, Mathematics and Computer Science, Delft University of Technology,
P.O. Box 5031, 2600 GA Delft, The Netherlands
- Netherlands Bioinformatics Center, 6500 HB Nijmegen, The Netherlands
- Kluyver Centre for Genomics of Industrial Fermentation, 2600 GA Delft, The
Netherlands
| |
Collapse
|
5384
|
Abstract
MOTIVATION Data quality is a critical issue in the analyses of DNA copy number alterations obtained from microarrays. It is commonly assumed that copy number alteration data can be modeled as piecewise constant and the measurement errors of different probes are independent. However, these assumptions do not always hold in practice. In some published datasets, we find that measurement errors are highly correlated between probes that interrogate nearby genomic loci, and the piecewise-constant model does not fit the data well. The correlated errors cause problems in downstream analysis, leading to a large number of DNA segments falsely identified as having copy number gains and losses. METHOD We developed a simple tool, called autocorrelation scanning profile, to assess the dependence of measurement error between neighboring probes. RESULTS Autocorrelation scanning profile can be used to check data quality and refine the analysis of DNA copy number data, which we demonstrate in some typical datasets. CONTACT lzhangli@mdanderson.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Liangcai Zhang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77230, USA and Department of Biophysics, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | | |
Collapse
|
5385
|
Surgucheva I, Gunewardena S, Rao HS, Surguchov A. Cell-specific post-transcriptional regulation of γ-synuclein gene by micro-RNAs. PLoS One 2013; 8:e73786. [PMID: 24040069 PMCID: PMC3770685 DOI: 10.1371/journal.pone.0073786] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 07/28/2013] [Indexed: 11/18/2022] Open
Abstract
γ-Synuclein is a member of the synucleins family of small proteins, which consists of three members:α, β- and γ-synuclein. γ-Synuclein is abnormally expressed in a high percentage of advanced and metastatic tumors, but not in normal or benign tissues. Furthermore, γ-synuclein expression is strongly correlated with disease progression, and can stimulate proliferation, induce invasion and metastasis of cancer cells. γ-Synuclein transcription is regulated basically through the binding of AP-1 to specific sequences in intron 1. Here we show that γ-synuclein expression may be also regulated by micro RNAs (miRs) on post-transcriptional level. According to prediction by several methods, the 3′-untranslated region (UTR) of γ-synuclein gene contains targets for miRs. Insertion of γ-synuclein 3′-UTR downstream of the reporter luciferase (LUC) gene causes a 51% reduction of LUC activity after transfection into SKBR3 and Y79 cells, confirming the presence of efficient targets for miRs in this fragment. Expression of miR-4437 and miR-4674 for which putative targets in 3′-UTR were predicted caused a 61.2% and 60.1% reduction of endogenous γ-synuclein expression confirming their role in gene expression regulation. On the other hand, in cells overexpressing γ-synuclein no significant effect of miRs on γ-synuclein expression was found suggesting that miRs exert their regulatory effect only at low or moderate, but not at high level of γ-synuclein expression. Elevated level of γ-synuclein differentially changes the level of several miRs expression, upregulating the level of some miRs and downregulating the level of others. Three miRs upregulated as a result of γ-synuclein overexpression, i.e., miR-885-3p, miR-138 and miR-497 have putative targets in 3′-UTR of the γ-synuclein gene. Some of miRs differentially regulated by γ-synuclein may modulate signaling pathways and cancer related gene expression. This study demonstrates that miRs might provide cell-specific regulation of γ-synuclein expression and set the stage to further evaluate their role in pathophysiological processes.
Collapse
Affiliation(s)
- Irina Surgucheva
- Retinal Biology Research Laboratory, Veterans Administration Medical Center, Kansas City, Missouri, United States of America
- Department of Neurology, Kansas University Medical Center, Kansas City, Kansas, United States of America
| | - Sumedha Gunewardena
- Department of Molecular and Integrative Physiology, Kansas University Medical Center, Kansas City, Kansas, United States of America
| | - H. Shanker Rao
- Department of Molecular and Integrative Physiology, Kansas University Medical Center, Kansas City, Kansas, United States of America
| | - Andrei Surguchov
- Retinal Biology Research Laboratory, Veterans Administration Medical Center, Kansas City, Missouri, United States of America
- Department of Neurology, Kansas University Medical Center, Kansas City, Kansas, United States of America
- * E-mail:
| |
Collapse
|
5386
|
Horvát EÁ, Zhang JD, Uhlmann S, Sahin Ö, Zweig KA. A network-based method to assess the statistical significance of mild co-regulation effects. PLoS One 2013; 8:e73413. [PMID: 24039936 PMCID: PMC3767771 DOI: 10.1371/journal.pone.0073413] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2013] [Accepted: 07/19/2013] [Indexed: 01/17/2023] Open
Abstract
Recent development of high-throughput, multiplexing technology has initiated projects that systematically investigate interactions between two types of components in biological networks, for instance transcription factors and promoter sequences, or microRNAs (miRNAs) and mRNAs. In terms of network biology, such screening approaches primarily attempt to elucidate relations between biological components of two distinct types, which can be represented as edges between nodes in a bipartite graph. However, it is often desirable not only to determine regulatory relationships between nodes of different types, but also to understand the connection patterns of nodes of the same type. Especially interesting is the co-occurrence of two nodes of the same type, i.e., the number of their common neighbours, which current high-throughput screening analysis fails to address. The co-occurrence gives the number of circumstances under which both of the biological components are influenced in the same way. Here we present SICORE, a novel network-based method to detect pairs of nodes with a statistically significant co-occurrence. We first show the stability of the proposed method on artificial data sets: when randomly adding and deleting observations we obtain reliable results even with noise exceeding the expected level in large-scale experiments. Subsequently, we illustrate the viability of the method based on the analysis of a proteomic screening data set to reveal regulatory patterns of human microRNAs targeting proteins in the EGFR-driven cell cycle signalling system. Since statistically significant co-occurrence may indicate functional synergy and the mechanisms underlying canalization, and thus hold promise in drug target identification and therapeutic development, we provide a platform-independent implementation of SICORE with a graphical user interface as a novel tool in the arsenal of high-throughput screening analysis.
Collapse
Affiliation(s)
- Emőke-Ágnes Horvát
- Interdisciplinary Center for Scientific Computing, University of Heidelberg, Heidelberg, Germany ; Network Analysis and Graph Theory, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | | | | | | | | |
Collapse
|
5387
|
Schauwecker PE. Microarray-assisted fine mapping of quantitative trait loci on chromosome 15 for susceptibility to seizure-induced cell death in mice. Eur J Neurosci 2013; 38:3679-90. [PMID: 24001120 DOI: 10.1111/ejn.12351] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Accepted: 08/08/2013] [Indexed: 11/30/2022]
Abstract
Prior studies with crosses of the FVB/NJ (FVB; seizure-induced cell death-susceptible) mouse and the C57BL/6J (B6; seizure-induced cell death-resistant) mouse revealed the presence of a quantitative trait locus (QTL) on chromosome 15 that influenced susceptibility to kainic acid-induced cell death (Sicd2). In an earlier study, we confirmed that the Sicd2 interval harbors gene(s) conferring strong protection against seizure-induced cell death through the creation of the FVB.B6-Sicd2 congenic strain, and created three interval-specific congenic lines (ISCLs) that encompass Sicd2 on chromosome 15 to fine-map this locus. To further localise this Sicd2 QTL, an additional congenic line carrying overlapping intervals of the B6 segment was created (ISCL-4), and compared with the previously created ISCL-1-ISCL-3 and assessed for seizure-induced cell death phenotype. Whereas all of the ISCLs showed reduced cell death associated with the B6 phenotype, ISCL-4, showed the most extensive reduction in seizure-induced cell death throughout all hippocampal subfields. In order to characterise the susceptibility loci on Sicd2 by use of this ISCL and identify compelling candidate genes, we undertook an integrative genomic strategy of comparing exon transcript abundance in the hippocampus of this newly developed chromosome 15 subcongenic line (ISCL-4) and FVB-like littermates. We identified 10 putative candidate genes that are alternatively spliced between the strains and may govern strain-dependent differences in susceptibility to seizure-induced excitotoxic cell death. These results illustrate the importance of identifying transcriptomics variants in expression studies, and implicate novel candidate genes conferring susceptibility to seizure-induced cell death.
Collapse
Affiliation(s)
- P E Schauwecker
- Department of Cell and Neurobiology, USC Keck School of Medicine, 1333 San Pablo Street, BMT 403, Los Angeles, CA, 90033, USA
| |
Collapse
|
5388
|
Muqbil I, Bao B, Abou-Samra AB, Mohammad RM, Azmi AS. Nuclear export mediated regulation of microRNAs: potential target for drug intervention. Curr Drug Targets 2013; 14:1094-100. [PMID: 23834155 PMCID: PMC4167361 DOI: 10.2174/1389450111314100002] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Accepted: 07/01/2013] [Indexed: 11/22/2022]
Abstract
MicroRNAs (miRNAs) are short non-coding RNAs that have been recognized to regulate the expression of uncountable number of genes. Their aberrant expression has been found to be linked to the pathology of many diseases including cancer. There is a drive to develop miRNA targeted therapeutics for different diseases especially cancer. Nevertheless, reining in these short non-coding RNAs is not as straightforward as originally thought. This is in view of the recent discoveries that miRNAs are under epigenetic regulations at multiple levels. Exportin 5 protein (XPO5) nuclear export mediated regulation of miRNAs is one such important epigenetic mechanism. XPO5 is responsible for exporting precursor miRNAs through the nuclear membrane to the cytoplasm, and is thus a critical step in miRNA biogenesis. A number of studies have shown that variations in components of the miRNA biogenesis pathways, particularly the aberrant expression of XPO5, increase the risk of developing cancer. In addition to XPO5, the Exportin 1 protein (XPO1) or chromosome region maintenance 1 (CRM1) can also carry miRNA export function. These findings are supported by pathway analyses that reveal certain miRNAs as direct interaction partners of CRM1. An in depth understanding of miRNA export mediated regulatory mechanisms is important for the successful design of clinically viable therapeutics. In this review, we describe the current knowledge on the mechanisms of miRNA nuclear transport mediated regulation and propose strategies to selectively block this important mechanism in cancer.
Collapse
Affiliation(s)
- Irfana Muqbil
- Department of Biochemistry Faculty of Life Sciences, AMU Aligarh, 202002 UP, INDIA
| | - Bin Bao
- Department of Pathology, Wayne State University, Detroit, MI USA
| | | | - Ramzi M. Mohammad
- Hamad Medical Corporation, Doha Qatar
- Department of Oncology, Wayne State University, Detroit MI, USA
| | - Asfar S. Azmi
- Department of Pathology, Wayne State University, Detroit, MI USA
| |
Collapse
|
5389
|
Lewis JP, Stephens SH, Horenstein RB, O'Connell JR, Ryan K, Peer CJ, Figg WD, Spencer SD, Pacanowski MA, Mitchell BD, Shuldiner AR. The CYP2C19*17 variant is not independently associated with clopidogrel response. J Thromb Haemost 2013; 11:1640-6. [PMID: 23809542 PMCID: PMC3773276 DOI: 10.1111/jth.12342] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 06/04/2013] [Indexed: 11/30/2022]
Abstract
BACKGROUND Cytochrome P450 2C19 (CYP2C19) is the principal enzyme responsible for converting clopidogrel into its active metabolite, and common genetic variants have been identified, most notably CYP2C19*2 and CYP2C19*17, that are believed to alter its activity and expression, respectively. OBJECTIVE We evaluated whether the consequences of the CYP2C19*2 and CYP2C19*17 variants on clopidogrel response were independent of each other or genetically linked through linkage disequilibrium (LD). PATIENTS/METHODS We genotyped the CYP2C19*2 and CYP2C19*17 variants in 621 members of the Pharmacogenomics of Anti-Platelet Intervention (PAPI) Study and evaluated the effects of these polymorphisms singly and then jointly, taking into account LD, on clopidogrel prodrug level, clopidogrel active metabolite level, and adenosine 5'-diphosphate (ADP)-stimulated platelet aggregation before and after clopidogrel exposure. RESULTS The CYP2C19*2 and CYP2C19*17 variants were in LD (|D'| = 1.0; r(2) = 0.07). In association analyses that did and did not account for the effects of CYP2C19*17, CYP2C19*2 was strongly associated with levels of clopidogrel active metabolite (β = -5.24, P = 3.0 × 10(-9) and β = -5.36, P = 3.3 × 10(-14) , respectively) and posttreatment ADP-stimulated platelet aggregation (β = 7.55, P = 2.9 × 10(-16) and β = 7.51, P = 7.0 × 10(-15) , respectively). In contrast, CYP2C19*17 was marginally associated with clopidogrel active metabolite levels and ADP-stimulated platelet aggregation before (β = 1.57, P = 0.04 and β = -1.98, P = 0.01, respectively) but not after (β = 0.40, P = 0.59 and β = -0.13, P = 0.69, respectively) adjustment for the CYP2C19*2 variant. Stratified analyses of CYP2C19*2/CYP2C19*17 genotype combinations revealed that CYP2C19*2, and not CYP2C19*17, was the primary determinant in altering clopidogrel response. CONCLUSIONS Our results suggest that CYP2C19*17 has a small (if any) effect on clopidogrel-related traits and that the observed effect of this variant is due to LD with the CYP2C19*2 loss-of-function variant.
Collapse
Affiliation(s)
- J P Lewis
- Division of Endocrinology, Diabetes, and Nutrition and Program in Personalized and Genomic Medicine, School of Medicine, University of Maryland, Baltimore, MD, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5390
|
Sun S, Noviski A, Yu X. MethyQA: a pipeline for bisulfite-treated methylation sequencing quality assessment. BMC Bioinformatics 2013; 14:259. [PMID: 23968174 PMCID: PMC3765750 DOI: 10.1186/1471-2105-14-259] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Accepted: 08/09/2013] [Indexed: 11/10/2022] Open
Abstract
Background DNA methylation is an epigenetic event that adds a methyl-group to the 5’ cytosine. This epigenetic modification can significantly affect gene expression in both normal and diseased cells. Hence, it is important to study methylation signals at the single cytosine site level, which is now possible utilizing bisulfite conversion technique (i.e., converting unmethylated Cs to Us and then to Ts after PCR amplification) and next generation sequencing (NGS) technologies. Despite the advances of NGS technologies, certain quality issues remain. Some of the more prevalent quality issues involve low per-base sequencing quality at the 3’ end, PCR amplification bias, and bisulfite conversion rates. Therefore, it is important to conduct quality assessment before downstream analysis. To the best of our knowledge, no existing software packages can generally assess the quality of methylation sequencing data generated based on different bisulfite-treated protocols. Results To conduct the quality assessment of bisulfite methylation sequencing data, we have developed a pipeline named MethyQA. MethyQA combines currently available open-source software packages with our own custom programs written in Perl and R. The pipeline can provide quality assessment results for tens of millions of reads in under an hour. The novelty of our pipeline lies in its examination of bisulfite conversion rates and of the DNA sequence structure of regions that have different conversion rates or coverage. Conclusions MethyQA is a new software package that provides users with a unique insight into the methylation sequencing data they are researching. It allows the users to determine the quality of their data and better prepares them to address the research questions that lie ahead. Due to the speed and efficiency at which MethyQA operates, it will become an important tool for studies dealing with bisulfite methylation sequencing data.
Collapse
Affiliation(s)
- Shuying Sun
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland 44106, Ohio, USA.
| | | | | |
Collapse
|
5391
|
Identification of novel autoantibodies for detection of malignant mesothelioma. PLoS One 2013; 8:e72458. [PMID: 23977302 PMCID: PMC3747111 DOI: 10.1371/journal.pone.0072458] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Accepted: 07/11/2013] [Indexed: 12/16/2022] Open
Abstract
Background The malignant mesothelioma (MM) survival rate has been hampered by the lack of efficient and accurate early detection methods. The immune system may detect the early changes of tumor progression by responding with tumor-associated autoantibody production. Hence, in this study, we translated the humoral immune response to cancer proteins into a potential blood test for MM. Methodology/Principal Findings A T7 phage MM cDNA library was constructed using MM tumor tissues and biopanned for tumor-associated antigens (TAAs) using pooled MM patient and normal serum samples. About 1008 individual phage TAA clones from the biopanned library were subjected to protein microarray construction and tested with 53 MM and 52 control serum samples as a training group. Nine candidate autoantibody markers were selected from the training group using Tclass system and logistic regression statistical analysis, which achieved 94.3% sensitivity and 90.4% specificity with an AUC value of 0.89 in receiver operating characteristic analysis. The classifier was further evaluated with 50 patient and 50 normal serum samples as an independent blind validation, and the sensitivity of 86.0% and the specificity of 86.0% were obtained with an AUC of 0.82. Sequencing and BLASTN analysis of the classifier revealed that five of these nine candidate markers were found to have strong homology to cancer related proteins (PDIA6, MEG3, SDCCAG3, IGHG3, IGHG1). Conclusions/Significance Our results indicated that using a panel of 9 autoantibody markers presented a promising accuracy for MM detection. Although the results need further validation in high-risk groups, they provided the potentials in developing a serum-based assay for MM diagnosis.
Collapse
|
5392
|
Wu Y, Zhu X, Chen J, Zhang X. EINVis: a visualization tool for analyzing and exploring genetic interactions in large-scale association studies. Genet Epidemiol 2013; 37:675-85. [PMID: 23934759 DOI: 10.1002/gepi.21754] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Revised: 06/24/2013] [Accepted: 07/11/2013] [Indexed: 01/12/2023]
Abstract
Epistasis (gene-gene interaction) detection in large-scale genetic association studies has recently drawn extensive research interests as many complex traits are likely caused by the joint effect of multiple genetic factors. The large number of possible interactions poses both statistical and computational challenges. A variety of approaches have been developed to address the analytical challenges in epistatic interaction detection. These methods usually output the identified genetic interactions and store them in flat file formats. It is highly desirable to develop an effective visualization tool to further investigate the detected interactions and unravel hidden interaction patterns. We have developed EINVis, a novel visualization tool that is specifically designed to analyze and explore genetic interactions. EINVis displays interactions among genetic markers as a network. It utilizes a circular layout (specially, a tree ring view) to simultaneously visualize the hierarchical interactions between single nucleotide polymorphisms (SNPs), genes, and chromosomes, and the network structure formed by these interactions. Using EINVis, the user can distinguish marginal effects from interactions, track interactions involving more than two markers, visualize interactions at different levels, and detect proxy SNPs based on linkage disequilibrium. EINVis is an effective and user-friendly free visualization tool for analyzing and exploring genetic interactions. It is publicly available with detailed documentation and online tutorial on the web at http://filer.case.edu/yxw407/einvis/.
Collapse
Affiliation(s)
- Yubao Wu
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio
| | | | | | | |
Collapse
|
5393
|
Early Trichinella spiralis and Trichinella nativa infections induce similar gene expression profiles in rat jejunal mucosa. Exp Parasitol 2013; 135:363-9. [PMID: 23932900 DOI: 10.1016/j.exppara.2013.07.024] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Revised: 06/17/2013] [Accepted: 07/22/2013] [Indexed: 12/13/2022]
Abstract
Trichinella spiralis causes a significantly higher parasite burden in rat muscle than Trichinella nativa. To assess whether the difference in infectivity is due to the early intestinal response, we analyzed gene expression changes in the rat jejunum during Trichinella infection with a whole-genome microarray. The rats were euthanized on day five of infection, and their jejunal mucosa was sampled for microarray analysis. In addition, intestinal histology and hematology were examined. Against our expectations, the gene expression changes were similar in both T.nativa- and T. spiralis-infected groups. The two groups were hence pooled, and in the combined Trichinella-infected group, 551 genes were overexpressed and 427 underexpressed when compared to controls (false discovery rate ≤ 0.001 and fold change at least 2 in either direction). Pathway analysis identified seven pathways significantly associated with Trichinella infection (p < 0.05). The microarray data suggested nonspecific damage and an inflammatory response in the jejunal mucosa. Histological findings, including hyperemia, hemorrhage and a marked infiltration of inflammatory cells, supported the microarray data. Trichinella infection caused complex gene expression changes that indicate a host response to tissue damage in the mucosa of the jejunum, but the changes were not notably dependent on the studied species of Trichinella.
Collapse
|
5394
|
Hijazi H, Chan C. A classification framework applied to cancer gene expression profiles. JOURNAL OF HEALTHCARE ENGINEERING 2013; 4:255-83. [PMID: 23778014 DOI: 10.1260/2040-2295.4.2.255] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.
Collapse
Affiliation(s)
- Hussein Hijazi
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.
| | | |
Collapse
|
5395
|
Inference of global HIV-1 sequence patterns and preliminary feature analysis. Virol Sin 2013; 28:228-38. [PMID: 23913180 DOI: 10.1007/s12250-013-3348-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 07/26/2013] [Indexed: 12/12/2022] Open
Abstract
The epidemiology of HIV-1 varies in different areas of the world, and it is possible that this complexity may leave unique footprints in the viral genome. Thus, we attempted to find significant patterns in global HIV-1 genome sequences. By applying the rule inference algorithm RIPPER (Repeated Incremental Pruning to Produce Error Reduction) to multiple sequence alignments of Env sequences from four classes of compiled datasets, we generated four sets of signature patterns. We found that these patterns were able to distinguish southeastern Asian from nonsoutheastern Asian sequences with 97.5% accuracy, Chinese from non-Chinese sequences with 98.3% accuracy, African from non-African sequences with 88.4% accuracy, and southern African from non-southern African sequences with 91.2% accuracy. These patterns showed different associations with subtypes and with amino acid positions. In addition, some signature patterns were characteristic of the geographic area from which the sample was taken. Amino acid features corresponding to the phylogenetic clustering of HIV-1 sequences were consistent with some of the deduced patterns. Using a combination of patterns inferred from subtypes B, C, and all subtypes chimeric with CRF01_AE worldwide, we found that signature patterns of subtype C were extremely common in some sampled countries (for example, Zambia in southern Africa), which may hint at the origin of this HIV-1 subtype and the need to pay special attention to this area of Africa. Signature patterns of subtype B sequences were associated with different countries. Even more, there are distinct patterns at single position 21 with glycine, leucine and isoleucine corresponding to subtype C, B and all possible recombination forms chimeric with CRF01_AE, which also indicate distinct geographic features. Our method widens the scope of inference of signature from geographic, genetic, and genomic viewpoints. These findings may provide a valuable reference for epidemiological research or vaccine design.
Collapse
|
5396
|
Börnigen D, Pers TH, Thorrez L, Huttenhower C, Moreau Y, Brunak S. Concordance of gene expression in human protein complexes reveals tissue specificity and pathology. Nucleic Acids Res 2013; 41:e171. [PMID: 23921638 PMCID: PMC3794609 DOI: 10.1093/nar/gkt661] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Disease-causing variants in human genes usually lead to phenotypes specific to only a few tissues. Here, we present a method for predicting tissue specificity based on quantitative deregulation of protein complexes. The underlying assumption is that the degree of coordinated expression among proteins in a complex within a given tissue may pinpoint tissues that will be affected by a mutation in the complex and coordinated expression may reveal the complex to be active in the tissue. We identified known disease genes and their protein complex partners in a high-quality human interactome. Each susceptibility gene's tissue involvement was ranked based on coordinated expression with its interaction partners in a non-disease global map of human tissue-specific expression. The approach demonstrated high overall area under the curve (0.78) and was very successfully benchmarked against a random model and an approach not using protein complexes. This was illustrated by correct tissue predictions for three case studies on leptin, insulin-like-growth-factor 2 and the inhibitor of NF-κB kinase subunit gamma that show high concordant expression in biologically relevant tissues. Our method identifies novel gene-phenotype associations in human diseases and predicts the tissues where associated phenotypic effects may arise.
Collapse
Affiliation(s)
- Daniela Börnigen
- Department of Electrical Engineering, ESAT-SCD, IBBT-KU Leuven Future Health Department, KU Leuven, 3001 Leuven, Belgium, Biostatistics Department, Harvard School of Public Health, Harvard University, Boston, 02115 MA, USA, Broad Institute of MIT and Harvard, Cambridge, 02142 MA, USA, Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark, Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Boston, 02142 MA, USA, Division of Endocrinology and Center for Basic and Translational Obesity Research, Children's Hospital Boston, Boston, 02115 MA, USA, Department of Development and Regeneration @ Kulak, KU Leuven, E. Sabbelaan 53, 8500 Kortrijk, Belgium, and NNF Center for Protein Research, Health Sciences Faculty, University of Copenhagen, DK-2200 Copenhagen, Denmark
| | | | | | | | | | | |
Collapse
|
5397
|
Li W, Chen L, He W, Li W, Qu X, Liang B, Gao Q, Feng C, Jia X, Lv Y, Zhang S, Li X. Prioritizing disease candidate proteins in cardiomyopathy-specific protein-protein interaction networks based on "guilt by association" analysis. PLoS One 2013; 8:e71191. [PMID: 23940716 PMCID: PMC3733802 DOI: 10.1371/journal.pone.0071191] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 06/28/2013] [Indexed: 01/12/2023] Open
Abstract
The cardiomyopathies are a group of heart muscle diseases which can be inherited (familial). Identifying potential disease-related proteins is important to understand mechanisms of cardiomyopathies. Experimental identification of cardiomyophthies is costly and labour-intensive. In contrast, bioinformatics approach has a competitive advantage over experimental method. Based on “guilt by association” analysis, we prioritized candidate proteins involving in human cardiomyopathies. We first built weighted human cardiomyopathy-specific protein-protein interaction networks for three subtypes of cardiomyopathies using the known disease proteins from Online Mendelian Inheritance in Man as seeds. We then developed a method in prioritizing disease candidate proteins to rank candidate proteins in the network based on “guilt by association” analysis. It was found that most candidate proteins with high scores shared disease-related pathways with disease seed proteins. These top ranked candidate proteins were related with the corresponding disease subtypes, and were potential disease-related proteins. Cross-validation and comparison with other methods indicated that our approach could be used for the identification of potentially novel disease proteins, which may provide insights into cardiomyopathy-related mechanisms in a more comprehensive and integrated way.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin, Heilongjiang Province, China
| | - Weiguo Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xiaoli Qu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Binhua Liang
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Qianping Gao
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Chenchen Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xu Jia
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Yana Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Siya Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| |
Collapse
|
5398
|
Bromberg Y. Building a genome analysis pipeline to predict disease risk and prevent disease. J Mol Biol 2013; 425:3993-4005. [PMID: 23928561 DOI: 10.1016/j.jmb.2013.07.038] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Revised: 07/26/2013] [Accepted: 07/28/2013] [Indexed: 12/24/2022]
Abstract
Reduced costs and increased speed and accuracy of sequencing can bring the genome-based evaluation of individual disease risk to the bedside. While past efforts have identified a number of actionable mutations, the bulk of genetic risk remains hidden in sequence data. The biggest challenge facing genomic medicine today is the development of new techniques to predict the specifics of a given human phenome (set of all expressed phenotypes) encoded by each individual variome (full set of genome variants) in the context of the given environment. Numerous tools exist for the computational identification of the functional effects of a single variant. However, the pipelines taking advantage of full genomic, exomic, transcriptomic (and other) sequences have only recently become a reality. This review looks at the building of methodologies for predicting "variome"-defined disease risk. It also discusses some of the challenges for incorporating such a pipeline into everyday medical practice.
Collapse
Affiliation(s)
- Y Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08873, USA.
| |
Collapse
|
5399
|
Criscuolo A, Brisse S. AlienTrimmer: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing reads. Genomics 2013; 102:500-6. [PMID: 23912058 DOI: 10.1016/j.ygeno.2013.07.011] [Citation(s) in RCA: 130] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2012] [Revised: 07/20/2013] [Accepted: 07/25/2013] [Indexed: 02/04/2023]
Abstract
Contaminant oligonucleotide sequences such as primers and adapters can occur in both ends of high-throughput sequencing (HTS) reads. AlienTrimmer was developed in order to detect and remove such contaminants. Based on the decomposition of specified alien nucleotide sequences into k-mers, AlienTrimmer is able to determine whether such alien k-mers are occurring in one or in both read ends by using a simple polynomial algorithm. Therefore, AlienTrimmer can process typical HTS single- or paired-end files with millions of reads in several minutes with very low computer resources. Based on the analysis of both simulated and real-case Illumina®, 454™ and Ion Torrent™ read data, we show that AlienTrimmer performs with excellent accuracy and speed in comparison with other trimming tools. The program is freely available at ftp://ftp.pasteur.fr/pub/gensoft/projects/AlienTrimmer/.
Collapse
Affiliation(s)
- Alexis Criscuolo
- Institut Pasteur, Genotyping of Pathogens and Public Health Platform (PF8), 28 rue du Dr Roux, 75724 Paris Cedex, France; Institut Pasteur, Microbial Evolutionary Genomics Unit, 28 rue du Dr Roux, 75724 Paris Cedex, France; CNRS, UMR3525, 75015 Paris, France.
| | - Sylvain Brisse
- Institut Pasteur, Genotyping of Pathogens and Public Health Platform (PF8), 28 rue du Dr Roux, 75724 Paris Cedex, France; Institut Pasteur, Microbial Evolutionary Genomics Unit, 28 rue du Dr Roux, 75724 Paris Cedex, France; CNRS, UMR3525, 75015 Paris, France
| |
Collapse
|
5400
|
Kumar P, Dezso Z, MacKenzie C, Oestreicher J, Agoulnik S, Byrne M, Bernier F, Yanagimachi M, Aoshima K, Oda Y. Circulating miRNA biomarkers for Alzheimer's disease. PLoS One 2013; 8:e69807. [PMID: 23922807 PMCID: PMC3726785 DOI: 10.1371/journal.pone.0069807] [Citation(s) in RCA: 265] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2013] [Accepted: 06/12/2013] [Indexed: 12/13/2022] Open
Abstract
A minimally invasive diagnostic assay for early detection of Alzheimer's disease (AD) is required to select optimal patient groups in clinical trials, monitor disease progression and response to treatment, and to better plan patient clinical care. Blood is an attractive source for biomarkers due to minimal discomfort to the patient, encouraging greater compliance in clinical trials and frequent testing. MiRNAs belong to the class of non-coding regulatory RNA molecules of ∼22 nt length and are now recognized to regulate ∼60% of all known genes through post-transcriptional gene silencing (RNAi). They have potential as useful biomarkers for clinical use because of their stability and ease of detection in many tissues, especially blood. Circulating profiles of miRNAs have been shown to discriminate different tumor types, indicate staging and progression of the disease and to be useful as prognostic markers. Recently their role in neurodegenerative diseases, both as diagnostic biomarkers as well as explaining basic disease etiology has come into focus. Here we report the discovery and validation of a unique circulating 7-miRNA signature (hsa-let-7d-5p, hsa-let-7g-5p, hsa-miR-15b-5p, hsa-miR-142-3p, hsa-miR-191-5p, hsa-miR-301a-3p and hsa-miR-545-3p) in plasma, which could distinguish AD patients from normal controls (NC) with >95% accuracy (AUC of 0.953). There was a >2 fold difference for all signature miRNAs between the AD and NC samples, with p-values<0.05. Pathway analysis, taking into account enriched target mRNAs for these signature miRNAs was also carried out, suggesting that the disturbance of multiple enzymatic pathways including lipid metabolism could play a role in AD etiology.
Collapse
Affiliation(s)
- Pavan Kumar
- Eisai Inc, Biomarkers and Personalized Medicine Core Function Unit, Eisai Product Creation Systems, Andover, Massachusetts, United States of America.
| | | | | | | | | | | | | | | | | | | |
Collapse
|