5401
|
Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics 2011; 12:364. [PMID: 21910885 PMCID: PMC3215301 DOI: 10.1186/1471-2105-12-364] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2011] [Accepted: 09/12/2011] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Epistasis is recognized ubiquitous in the genetic architecture of complex traits such as disease susceptibility. Experimental studies in model organisms have revealed extensive evidence of biological interactions among genes. Meanwhile, statistical and computational studies in human populations have suggested non-additive effects of genetic variation on complex traits. Although these studies form a baseline for understanding the genetic architecture of complex traits, to date they have only considered interactions among a small number of genetic variants. Our goal here is to use network science to determine the extent to which non-additive interactions exist beyond small subsets of genetic variants. We infer statistical epistasis networks to characterize the global space of pairwise interactions among approximately 1500 Single Nucleotide Polymorphisms (SNPs) spanning nearly 500 cancer susceptibility genes in a large population-based study of bladder cancer. RESULTS The statistical epistasis network was built by linking pairs of SNPs if their pairwise interactions were stronger than a systematically derived threshold. Its topology clearly differentiated this real-data network from networks obtained from permutations of the same data under the null hypothesis that no association exists between genotype and phenotype. The network had a significantly higher number of hub SNPs and, interestingly, these hub SNPs were not necessarily with high main effects. The network had a largest connected component of 39 SNPs that was absent in any other permuted-data networks. In addition, the vertex degrees of this network were distinctively found following an approximate power-law distribution and its topology appeared scale-free. CONCLUSIONS In contrast to many existing techniques focusing on high main-effect SNPs or models of several interacting SNPs, our network approach characterized a global picture of gene-gene interactions in a population-based genetic data. The network was built using pairwise interactions, and its distinctive network topology and large connected components indicated joint effects in a large set of SNPs. Our observations suggested that this particular statistical epistasis network captured important features of the genetic architecture of bladder cancer that have not been described previously.
Collapse
|
5402
|
Disjunctive shared information between ontology concepts: application to Gene Ontology. J Biomed Semantics 2011; 2:5. [PMID: 21884591 PMCID: PMC3200982 DOI: 10.1186/2041-1480-2-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2011] [Accepted: 08/31/2011] [Indexed: 01/12/2023] Open
Abstract
Background The large-scale effort in developing, maintaining and making biomedical ontologies available motivates the application of similarity measures to compare ontology concepts or, by extension, the entities described therein. A common approach, known as semantic similarity, compares ontology concepts through the information content they share in the ontology. However, different disjunctive ancestors in the ontology are frequently neglected, or not properly explored, by semantic similarity measures. Results This paper proposes a novel method, dubbed DiShIn, that effectively exploits the multiple inheritance relationships present in many biomedical ontologies. DiShIn calculates the shared information content of two ontology concepts, based on the information content of the disjunctive common ancestors of the concepts being compared. DiShIn identifies these disjunctive ancestors through the number of distinct paths from the concepts to their common ancestors. Conclusions DiShIn was applied to Gene Ontology and its performance was evaluated against state-of-the-art measures using CESSM, a publicly available evaluation platform of protein similarity measures. By modifying the way traditional semantic similarity measures calculate the shared information content, DiShIn was able to obtain a statistically significant higher correlation between semantic and sequence similarity. Moreover, the incorporation of DiShIn in existing applications that exploit multiple inheritance would reduce their execution time.
Collapse
|
5403
|
Jagannathan V, Robinson-Rechavi M. Meta-analysis of estrogen response in MCF-7 distinguishes early target genes involved in signaling and cell proliferation from later target genes involved in cell cycle and DNA repair. BMC SYSTEMS BIOLOGY 2011; 5:138. [PMID: 21878096 PMCID: PMC3225231 DOI: 10.1186/1752-0509-5-138] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2010] [Accepted: 08/30/2011] [Indexed: 02/08/2023]
Abstract
Background Many studies have been published outlining the global effects of 17β-estradiol (E2) on gene expression in human epithelial breast cancer derived MCF-7 cells. These studies show large variation in results, reporting between ~100 and ~1500 genes regulated by E2, with poor overlap. Results We performed a meta-analysis of these expression studies, using the Rank product method to obtain a more accurate and stable list of the differentially expressed genes, and of pathways regulated by E2. We analyzed 9 time-series data sets, concentrating on response at 3-4 hrs (early) and at 24 hrs (late). We found >1000 statistically significant probe sets after correction for multiple testing at 3-4 hrs, and >2000 significant probe sets at 24 hrs. Differentially expressed genes were examined by pathway analysis. This revealed 15 early response pathways, mostly related to cell signaling and proliferation, and 20 late response pathways, mostly related to breast cancer, cell division, DNA repair and recombination. Conclusions Our results confirm that meta-analysis identified more differentially expressed genes than the individual studies, and that these genes act together in networks. These results provide new insight into E2 regulated mechanisms, especially in the context of breast cancer.
Collapse
Affiliation(s)
- Vidhya Jagannathan
- Department of Ecology and Evolution, University of Lausanne, Switzerland
| | | |
Collapse
|
5404
|
Matzke MM, Waters KM, Metz TO, Jacobs JM, Sims AC, Baric RS, Pounds JG, Webb-Robertson BJM. Improved quality control processing of peptide-centric LC-MS proteomics data. ACTA ACUST UNITED AC 2011; 27:2866-72. [PMID: 21852304 PMCID: PMC3187650 DOI: 10.1093/bioinformatics/btr479] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Motivation: In the analysis of differential peptide peak intensities (i.e. abundance measures), LC-MS analyses with poor quality peptide abundance data can bias downstream statistical analyses and hence the biological interpretation for an otherwise high-quality dataset. Although considerable effort has been placed on assuring the quality of the peptide identification with respect to spectral processing, to date quality assessment of the subsequent peptide abundance data matrix has been limited to a subjective visual inspection of run-by-run correlation or individual peptide components. Identifying statistical outliers is a critical step in the processing of proteomics data as many of the downstream statistical analyses [e.g. analysis of variance (ANOVA)] rely upon accurate estimates of sample variance, and their results are influenced by extreme values. Results: We describe a novel multivariate statistical strategy for the identification of LC-MS runs with extreme peptide abundance distributions. Comparison with current method (run-by-run correlation) demonstrates a significantly better rate of identification of outlier runs by the multivariate strategy. Simulation studies also suggest that this strategy significantly outperforms correlation alone in the identification of statistically extreme liquid chromatography-mass spectrometry (LC-MS) runs. Availability:https://www.biopilot.org/docs/Software/RMD.php Contact:bj@pnl.gov Supplementary information:Supplementary material is available at Bioinformatics online.
Collapse
|
5405
|
Lingappa JR, Dumitrescu L, Zimmer SM, Lynfield R, McNicholl JM, Messonnier NE, Whitney CG, Crawford DC. Identifying host genetic risk factors in the context of public health surveillance for invasive pneumococcal disease. PLoS One 2011; 6:e23413. [PMID: 21858107 PMCID: PMC3156135 DOI: 10.1371/journal.pone.0023413] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 07/16/2011] [Indexed: 11/18/2022] Open
Abstract
Host genetic factors that modify risk of pneumococcal disease may help target future public health interventions to individuals at highest risk of disease. We linked data from population-based surveillance for invasive pneumococcal disease (IPD) with state-based newborn dried bloodspot repositories to identify biological samples from individuals who developed invasive pneumococcal disease. Genomic DNA was extracted from 366 case and 732 anonymous control samples. TagSNPs were selected in 34 candidate genes thought to be associated with host response to invasive pneumococcal disease, and a total of 326 variants were successfully genotyped. Among 543 European Americans (EA) (182 cases and 361 controls), and 166 African Americans (AA) (53 cases and 113 controls), common variants in surfactant protein D (SFTPD) are consistently underrepresented in IPD. SFTPD variants with the strongest association for IPD are intronic rs17886286 (allelic OR 0.45, 95% confidence interval (CI) [0.25, 0.82], with p = 0.007) in EA and 5' flanking rs12219080 (allelic OR 0.32, 95%CI [0.13, 0.78], with p = 0.009) in AA. Variants in CD46 and IL1R1 are also associated with IPD in both EA and AA, but with effects in different directions; FAS, IL1B, IL4, IL10, IL12B, SFTPA1, SFTPB, and PTAFR variants are associated (p≤0.05) with IPD in EA or AA. We conclude that variants in SFTPD may protect against IPD in EA and AA and genetic variation in other host response pathways may also contribute to risk of IPD. While our associations are not corrected for multiple comparisons and therefore must be replicated in additional cohorts, this pilot study underscores the feasibility of integrating public health surveillance with existing, prospectively collected, newborn dried blood spot repositories to identify host genetic factors associated with infectious diseases.
Collapse
Affiliation(s)
- Jairam R Lingappa
- Department of Global Health, University of Washington, Seattle, Washington, United States of America.
| | | | | | | | | | | | | | | |
Collapse
|
5406
|
Sen K, Podder S, Ghosh TC. On the quest for selective constraints shaping the expressivity of the genes casting retropseudogenes in human. BMC Genomics 2011; 12:401. [PMID: 21824418 PMCID: PMC3162935 DOI: 10.1186/1471-2164-12-401] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2010] [Accepted: 08/08/2011] [Indexed: 02/04/2023] Open
Abstract
Background Pseudogenes, the nonfunctional homologues of functional genes are now coming to light as important resources regarding the study of human protein evolution. Processed pseudogenes arising by reverse transcription and reinsertion can provide molecular record on the dynamics and evolution of genomes. Researches on the progenitors of human processed pseudogenes delved out their highly expressed and evolutionarily conserved characters. They are reported to be short and GC-poor indicating their high efficiency for retrotransposition. In this article we focused on their high expressivity and explored the factors contributing for that and their relevance in the milieu of protein sequence evolution. Results We here, analyzed the high expressivity of these genes configuring processed or retropseudogenes by their immense connectivity in protein-protein interaction network, an inclination towards alternative splicing mechanism, a lower rate of mRNA disintegration and a slower evolutionary rate. While the unusual trend of the upraised disorder in contrast with the high expressivity of the proteins encoded by processed pseudogene ancestors is accredited by a predominance of hub-protein encoding genes, a high propensity of repeat sequence containing genes, elevated protein stability and the functional constraint to perform the transcription regulatory jobs. Linear regression analysis demonstrates mRNA decay rate and protein intrinsic disorder as the influential factors controlling the expressivity of these retropseudogene ancestors while the latter one is found to have the most significant regulatory power. Conclusions Our findings imply that, the affluence of disordered regions elevating the network attachment to be involved in important cellular assignments and the stability in transcriptional level are acting as the prevailing forces behind the high expressivity of the human genes configuring processed pseudogenes.
Collapse
Affiliation(s)
- Kamalika Sen
- Bioinformatics Centre, Bose Institute, P 1/12, C,I,T, Scheme VII M, Kolkata- 700 054, India
| | | | | |
Collapse
|
5407
|
Misawa K. A codon substitution model that incorporates the effect of the GC contents, the gene density and the density of CpG islands of human chromosomes. BMC Genomics 2011; 12:397. [PMID: 21819607 PMCID: PMC3169530 DOI: 10.1186/1471-2164-12-397] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Accepted: 08/06/2011] [Indexed: 11/16/2022] Open
Abstract
Background Developing a model for codon substitutions is essential for the analyses of protein sequences. Recent studies on the mutation rates in the non-coding regions have shown that CpG mutation rates in the human genome are negatively correlated to the local GC content and to the densities of functional elements. This study aimed at understanding the effect of genomic features, namely, GC content, gene density, and frequency of CpG islands, on the rates of codon substitution in human chromosomes. Results Codon substitution rates of CpG to TpG mutations, TpG to CpG mutations, and non-CpG transitions and transversions in humans were estimated by comparing the coding regions of thousands of human and chimpanzee genes and inferring their ancestral sequences by using macaque genes as the outgroup. Since the genomic features are depending on each other, partial regression coefficients of these features were obtained. Conclusion The substitution rates of codons depend on gene densities of the chromosomes. Transcription-associated mutation is one such pressure. On the basis of these results, a model of codon substitutions that incorporates the effect of genomic features on codon substitution in human chromosomes was developed.
Collapse
Affiliation(s)
- Kazuharu Misawa
- Research Program for Computational Science, Research and Development Group for Next-Generation Integrated Living Matter Simulation, Fusion of Data and Analysis Research and Development Team, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan.
| |
Collapse
|
5408
|
Jalali-Heravi M, Parastar H. Recent trends in application of multivariate curve resolution approaches for improving gas chromatography–mass spectrometry analysis of essential oils. Talanta 2011; 85:835-49. [DOI: 10.1016/j.talanta.2011.05.045] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2011] [Revised: 05/15/2011] [Accepted: 05/18/2011] [Indexed: 11/16/2022]
|
5409
|
Abstract
Metabolomics can map the large metabolic diversity in species, organs, or cell types. In addition to gains in enzyme specificity, many enzymes have retained substrate and reaction promiscuity. Enzyme promiscuity and the large number of enzymes with unknown enzyme function may explain the presence of a plethora of unidentified compounds in metabolomic studies. Cataloguing the identity and differential abundance of all detectable metabolites in metabolomic repositories may detail which compounds and pathways contribute to vital biological functions. The current status in metabolic databases is reviewed concomitant with tools to map and visualize the metabolome.
Collapse
Affiliation(s)
- Oliver Fiehn
- University of California Davis Genome Center, Davis, California 95616, USA.
| | | | | |
Collapse
|
5410
|
Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z. Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics 2011; 98:1-8. [PMID: 21565265 PMCID: PMC3852939 DOI: 10.1016/j.ygeno.2011.04.006] [Citation(s) in RCA: 164] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2010] [Revised: 03/02/2011] [Accepted: 04/15/2011] [Indexed: 12/25/2022]
Abstract
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis.
Collapse
Affiliation(s)
- Lily Wang
- Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | | | - Xi Chen
- Division of Cancer Biostatistics, Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| |
Collapse
|
5411
|
Gilbert-Diamond D, Moore JH. Analysis of gene-gene interactions. CURRENT PROTOCOLS IN HUMAN GENETICS 2011; Chapter 1:Unit1.14. [PMID: 21735376 PMCID: PMC4086055 DOI: 10.1002/0471142905.hg0114s70] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The goal of this unit is to introduce gene-gene interactions (epistasis) as a significant complicating factor in the search for disease susceptibility genes. This unit begins with an overview of gene-gene interactions and why they are likely to be common. Then, it reviews several statistical and computational methods for detecting and characterizing genes with effects that are dependent on other genes. The focus of this unit is genetic association studies of discrete and quantitative traits because most of the methods for detecting gene-gene interactions have been developed specifically for these study designs.
Collapse
Affiliation(s)
- Diane Gilbert-Diamond
- Computational Genetics Laboratory, Departments of Genetics and Community and Family Medicine, Dartmouth Medical School, Lebanon, New Hampshire, USA
| | | |
Collapse
|
5412
|
Zhang J, Mamlouk AM, Martinetz T, Chang S, Wang J, Hilgenfeld R. PhyloMap: an algorithm for visualizing relationships of large sequence data sets and its application to the influenza A virus genome. BMC Bioinformatics 2011; 12:248. [PMID: 21689434 PMCID: PMC3142226 DOI: 10.1186/1471-2105-12-248] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2011] [Accepted: 06/20/2011] [Indexed: 11/10/2022] Open
Abstract
Background Results of phylogenetic analysis are often visualized as phylogenetic trees. Such a tree can typically only include up to a few hundred sequences. When more than a few thousand sequences are to be included, analyzing the phylogenetic relationships among them becomes a challenging task. The recent frequent outbreaks of influenza A viruses have resulted in the rapid accumulation of corresponding genome sequences. Currently, there are more than 7500 influenza A virus genomes in the database. There are no efficient ways of representing this huge data set as a whole, thus preventing a further understanding of the diversity of the influenza A virus genome. Results Here we present a new algorithm, "PhyloMap", which combines ordination, vector quantization, and phylogenetic tree construction to give an elegant representation of a large sequence data set. The use of PhyloMap on influenza A virus genome sequences reveals the phylogenetic relationships of the internal genes that cannot be seen when only a subset of sequences are analyzed. Conclusions The application of PhyloMap to influenza A virus genome data shows that it is a robust algorithm for analyzing large sequence data sets. It utilizes the entire data set, minimizes bias, and provides intuitive visualization. PhyloMap is implemented in JAVA, and the source code is freely available at http://www.biochem.uni-luebeck.de/public/software/phylomap.html
Collapse
Affiliation(s)
- Jiajie Zhang
- Institute of Biochemistry, Center for Structural and Cell Biology in Medicine, University of Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
| | | | | | | | | | | |
Collapse
|
5413
|
Kriston-Vizi J, Thong NW, Poh CL, Yee KC, Ling JSP, Kraut R, Wasser M. Gebiss: an ImageJ plugin for the specification of ground truth and the performance evaluation of 3D segmentation algorithms. BMC Bioinformatics 2011; 12:232. [PMID: 21668958 PMCID: PMC3225128 DOI: 10.1186/1471-2105-12-232] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Accepted: 06/13/2011] [Indexed: 02/06/2023] Open
Abstract
Background Image segmentation is a crucial step in quantitative microscopy that helps to define regions of tissues, cells or subcellular compartments. Depending on the degree of user interactions, segmentation methods can be divided into manual, automated or semi-automated approaches. 3D image stacks usually require automated methods due to their large number of optical sections. However, certain applications benefit from manual or semi-automated approaches. Scenarios include the quantification of 3D images with poor signal-to-noise ratios or the generation of so-called ground truth segmentations that are used to evaluate the accuracy of automated segmentation methods. Results We have developed Gebiss; an ImageJ plugin for the interactive segmentation, visualisation and quantification of 3D microscopic image stacks. We integrated a variety of existing plugins for threshold-based segmentation and volume visualisation. Conclusions We demonstrate the application of Gebiss to the segmentation of nuclei in live Drosophila embryos and the quantification of neurodegeneration in Drosophila larval brains. Gebiss was developed as a cross-platform ImageJ plugin and is freely available on the web at http://imaging.bii.a-star.edu.sg/projects/gebiss/.
Collapse
Affiliation(s)
- Janos Kriston-Vizi
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 138671, Singapore.
| | | | | | | | | | | | | |
Collapse
|
5414
|
Braun R, Buetow K. Pathways of distinction analysis: a new technique for multi-SNP analysis of GWAS data. PLoS Genet 2011; 7:e1002101. [PMID: 21695280 PMCID: PMC3111473 DOI: 10.1371/journal.pgen.1002101] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Accepted: 03/28/2011] [Indexed: 02/07/2023] Open
Abstract
Genome-wide association studies (GWAS) have become increasingly common due to advances in technology and have permitted the identification of differences in single nucleotide polymorphism (SNP) alleles that are associated with diseases. However, while typical GWAS analysis techniques treat markers individually, complex diseases (cancers, diabetes, and Alzheimers, amongst others) are unlikely to have a single causative gene. Thus, there is a pressing need for multi–SNP analysis methods that can reveal system-level differences in cases and controls. Here, we present a novel multi–SNP GWAS analysis method called Pathways of Distinction Analysis (PoDA). The method uses GWAS data and known pathway–gene and gene–SNP associations to identify pathways that permit, ideally, the distinction of cases from controls. The technique is based upon the hypothesis that, if a pathway is related to disease risk, cases will appear more similar to other cases than to controls (or vice versa) for the SNPs associated with that pathway. By systematically applying the method to all pathways of potential interest, we can identify those for which the hypothesis holds true, i.e., pathways containing SNPs for which the samples exhibit greater within-class similarity than across classes. Importantly, PoDA improves on existing single–SNP and SNP–set enrichment analyses, in that it does not require the SNPs in a pathway to exhibit independent main effects. This permits PoDA to reveal pathways in which epistatic interactions drive risk. In this paper, we detail the PoDA method and apply it to two GWAS: one of breast cancer and the other of liver cancer. The results obtained strongly suggest that there exist pathway-wide genomic differences that contribute to disease susceptibility. PoDA thus provides an analytical tool that is complementary to existing techniques and has the power to enrich our understanding of disease genomics at the systems-level. We present a novel method for multi–SNP analysis of genome-wide association studies. The method is motivated by the intuition that, if a set of SNPs is associated with disease, cases and controls will exhibit more within-group similarity than across-group similarity for the SNPs in the set of interest. Our method, Pathways of Distinction Analysis (PoDA), uses GWAS data and known pathway–gene and gene–SNP associations to identify pathways that permit the distinction of cases from controls. By systematically applying the method to all pathways of potential interest, we can identify pathways containing SNPs for which the cases and controls are distinguished and infer those pathways' role in disease. We detail the PoDA method and describe its results in breast and liver cancer GWAS data, demonstrating its utility as a method for systems-level analysis of GWAS data.
Collapse
Affiliation(s)
- Rosemary Braun
- Laboratory of Population Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| | - Kenneth Buetow
- Laboratory of Population Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
5415
|
PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol 2011; 7:496. [PMID: 21654673 PMCID: PMC3159979 DOI: 10.1038/msb.2011.26] [Citation(s) in RCA: 457] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2011] [Accepted: 04/12/2011] [Indexed: 12/24/2022] Open
Abstract
The authors present a new method, PREDICT, for the large-scale prediction of drug indications, and demonstrate its use on both approved drugs and novel molecules. They also provide a proof-of-concept for its potential utility in predicting patient-specific medications. We present a novel method for the large-scale prediction of drug indications that can handle both approved drugs and novel molecules. Our method utilizes multiple drug–drug and disease–disease similarity measures for the prediction task, obtaining high specificity and sensitivity rates (AUC=0.9). Our drug repositioning predictions cover 27% of the indications currently tested on clinical trials (P<2 × 10−220). We show comparable performance using a gene expression signature-based disease–disease similarity, laying the computational foundation for predicting patient-specific indications.
Predicting indications for new molecules or finding alternative indications for approved drugs is a laborious and costly process (DiMasi et al, 2003), calling for computational solutions that would minimize production time and development costs (Terstappen and Reggiani, 2001). Here, we present a novel method for predicting drug indications, PREDICT, capable of handling both approved drugs and novel molecules. Our method is based on the assumption that similar drugs are indicated for similar diseases. To score a possible drug–disease association, we compute its similarity to known associations by combining drug–drug and disease–disease similarity computations. This strategy achieves high specificity and sensitivity rates in a cross-validation setting, where part of the known associations are hidden and the method is assessed based on how well it can retrieve them based on the rest of the associations. Assessing its predictions of novel indications for existing drugs, we find that it covers a significant portion (27%, P<2 × 10−220) of drug indications currently tested on clinical trials. Examples of such predictions include: (i) Cabergoline, indicated for Hyperprolactinemia, which is predicted to treat Migrane, a prediction supported by two separate studies (Verhelst et al, 1999; Cavestro et al, 2006) and (ii) Progesterone, which is predicted to treat renal cell cancer, non-papillary (npRCC), supported by the study of Izumi et al (2007). In addition, we provide indication predictions for novel molecules. For example, Cycloleucine is predicted for the treatment of Alzheimer's disease (AD); indeed, Cycloleucine was found to be a potent and selective antagonist of NMDA receptor-mediated responses (Hershkowitz and Rogawski, 1989), a new promising class of chemicals for the treatment of AD (Farlow, 2004). As another example, Hyperforin, St John's wort extract, is predicted to treat hyperthermia. Interestingly, St John's wort extract was found to have anxiolytic effects on stress-induced hyperthermia in mice (Grundmann et al, 2006). We further introduce a disease–disease similarity measure based on disease-specific gene signatures and show that such a measure can be used by our method to accurately predict drug indications. Importantly, this suggests the potential utility of our approach also in a personalized medicine setting, whereby future gene expression signatures from individual patients would replace these disease-specific signatures. Inferring potential drug indications, for either novel or approved drugs, is a key step in drug development. Previous computational methods in this domain have focused on either drug repositioning or matching drug and disease gene expression profiles. Here, we present a novel method for the large-scale prediction of drug indications (PREDICT) that can handle both approved drugs and novel molecules. Our method is based on the observation that similar drugs are indicated for similar diseases, and utilizes multiple drug–drug and disease–disease similarity measures for the prediction task. On cross-validation, it obtains high specificity and sensitivity (AUC=0.9) in predicting drug indications, surpassing existing methods. We validate our predictions by their overlap with drug indications that are currently under clinical trials, and by their agreement with tissue-specific expression information on the drug targets. We further show that disease-specific genetic signatures can be used to accurately predict drug indications for new diseases (AUC=0.92). This lays the computational foundation for future personalized drug treatments, where gene expression signatures from individual patients would replace the disease-specific signatures.
Collapse
|
5416
|
Reimand J, Arak T, Vilo J. g:Profiler--a web server for functional interpretation of gene lists (2011 update). Nucleic Acids Res 2011; 39:W307-15. [PMID: 21646343 PMCID: PMC3125778 DOI: 10.1093/nar/gkr378] [Citation(s) in RCA: 386] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Functional interpretation of candidate gene lists is an essential task in modern biomedical research. Here, we present the 2011 update of g:Profiler (http://biit.cs.ut.ee/gprofiler/), a popular collection of web tools for functional analysis. g:GOSt and g:Cocoa combine comprehensive methods for interpreting gene lists, ordered lists and list collections in the context of biomedical ontologies, pathways, transcription factor and microRNA regulatory motifs and protein–protein interactions. Additional tools, namely the biomolecule ID mapping service (g:Convert), gene expression similarity searcher (g:Sorter) and gene homology searcher (g:Orth) provide numerous ways for further analysis and interpretation. In this update, we have implemented several features of interest to the community: (i) functional analysis of single nucleotide polymorphisms and other DNA polymorphisms is supported by chromosomal queries; (ii) network analysis identifies enriched protein–protein interaction modules in gene lists; (iii) functional analysis covers human disease genes; and (iv) improved statistics and filtering provide more concise results. g:Profiler is a regularly updated resource that is available for a wide range of species, including mammals, plants, fungi and insects.
Collapse
Affiliation(s)
- Jüri Reimand
- University of Tartu, Institute of Computer Science, Tartu, Estonia.
| | | | | |
Collapse
|
5417
|
Dumitrescu L, Carty CL, Taylor K, Schumacher FR, Hindorff LA, Ambite JL, Anderson G, Best LG, Brown-Gentry K, Bůžková P, Carlson CS, Cochran B, Cole SA, Devereux RB, Duggan D, Eaton CB, Fornage M, Franceschini N, Haessler J, Howard BV, Johnson KC, Laston S, Kolonel LN, Lee ET, MacCluer JW, Manolio TA, Pendergrass SA, Quibrera M, Shohet RV, Wilkens LR, Haiman CA, Le Marchand L, Buyske S, Kooperberg C, North KE, Crawford DC. Genetic determinants of lipid traits in diverse populations from the population architecture using genomics and epidemiology (PAGE) study. PLoS Genet 2011; 7:e1002138. [PMID: 21738485 PMCID: PMC3128106 DOI: 10.1371/journal.pgen.1002138] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2010] [Accepted: 04/30/2011] [Indexed: 01/06/2023] Open
Abstract
For the past five years, genome-wide association studies (GWAS) have identified hundreds of common variants associated with human diseases and traits, including high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglyceride (TG) levels. Approximately 95 loci associated with lipid levels have been identified primarily among populations of European ancestry. The Population Architecture using Genomics and Epidemiology (PAGE) study was established in 2008 to characterize GWAS-identified variants in diverse population-based studies. We genotyped 49 GWAS-identified SNPs associated with one or more lipid traits in at least two PAGE studies and across six racial/ethnic groups. We performed a meta-analysis testing for SNP associations with fasting HDL-C, LDL-C, and ln(TG) levels in self-identified European American (~20,000), African American (~9,000), American Indian (~6,000), Mexican American/Hispanic (~2,500), Japanese/East Asian (~690), and Pacific Islander/Native Hawaiian (~175) adults, regardless of lipid-lowering medication use. We replicated 55 of 60 (92%) SNP associations tested in European Americans at p<0.05. Despite sufficient power, we were unable to replicate ABCA1 rs4149268 and rs1883025, CETP rs1864163, and TTC39B rs471364 previously associated with HDL-C and MAFB rs6102059 previously associated with LDL-C. Based on significance (p<0.05) and consistent direction of effect, a majority of replicated genotype-phentoype associations for HDL-C, LDL-C, and ln(TG) in European Americans generalized to African Americans (48%, 61%, and 57%), American Indians (45%, 64%, and 77%), and Mexican Americans/Hispanics (57%, 56%, and 86%). Overall, 16 associations generalized across all three populations. For the associations that did not generalize, differences in effect sizes, allele frequencies, and linkage disequilibrium offer clues to the next generation of association studies for these traits.
Collapse
Affiliation(s)
- Logan Dumitrescu
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Cara L. Carty
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Kira Taylor
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Fredrick R. Schumacher
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Lucia A. Hindorff
- Office of Population Genomics, National Human Genome Research Institute, Bethesda, Maryland, United States of America
| | - José L. Ambite
- Information Sciences Institute, University of Southern California, Los Angeles, California, United States of America
| | - Garnet Anderson
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Lyle G. Best
- Missouri Breaks Industries Research, Timber Lake, South Dakota, United States of America
| | - Kristin Brown-Gentry
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Petra Bůžková
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Christopher S. Carlson
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Barbara Cochran
- Sponsored Programs, Baylor College of Medicine, Houston, Texas, United States of America
| | - Shelley A. Cole
- Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio, Texas, United States of America
| | - Richard B. Devereux
- Department of Medicine, Weill Cornell Medical College, New York, New York, United States of America
| | - Dave Duggan
- The Translational Genomics Research Institute, Phoenix, Arizona, United States of America
| | - Charles B. Eaton
- Department of Family Medicine and Community Health, Alpert Medical School of Brown University School of Medicine, Providence, Rhode Island, United States of America
| | - Myriam Fornage
- Institute of Molecular Medicine, University of Texas Health Sciences Center at Houston, Texas, United States of America
- Division of Epidemiology, School of Public Health, University of Texas Health Sciences Center, Houston, Texas, United States of America
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Jeff Haessler
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Barbara V. Howard
- Medstar Research Institute, Washington, D.C., United States of America
| | - Karen C. Johnson
- Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Sandra Laston
- Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio, Texas, United States of America
| | - Laurence N. Kolonel
- Epidemiology Program, University of Hawaii Cancer Center, Department of Medicine, John A. Burns School of Medicine, University of Hawaii, Honolulu, Hawaii, United States of America
| | - Elisa T. Lee
- University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Jean W. MacCluer
- Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio, Texas, United States of America
| | - Teri A. Manolio
- Office of Population Genomics, National Human Genome Research Institute, Bethesda, Maryland, United States of America
| | - Sarah A. Pendergrass
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Miguel Quibrera
- School of Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Ralph V. Shohet
- Center of Cardiovascular Research, Department of Medicine, John A. Burns School of Medicine, University of Hawaii, Honolulu, Hawaii, United States of America
| | - Lynne R. Wilkens
- Epidemiology Program, University of Hawaii Cancer Center, Department of Medicine, John A. Burns School of Medicine, University of Hawaii, Honolulu, Hawaii, United States of America
| | - Christopher A. Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America
| | - Loïc Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Department of Medicine, John A. Burns School of Medicine, University of Hawaii, Honolulu, Hawaii, United States of America
| | - Steven Buyske
- Department of Statistics and Biostatistics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Charles Kooperberg
- Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Kari E. North
- Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Dana C. Crawford
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
5418
|
Anno S, Ohshima K, Abe T. Approaches to understanding adaptations of skin color variation by detecting gene-environment interactions. Expert Rev Mol Diagn 2011; 10:987-91. [PMID: 21080816 DOI: 10.1586/erm.10.90] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Genetic and environmental factors are both part of an elaborate feedback mechanism whereby the human adaptive form reacts to environmental stimuli via internal adjustments. Human survival may ultimately depend on understanding two important components of future environmental adaptation. First, we must elucidate the dynamics of the human genome underpinning the complex human phenotype. Second, we must understand how the environment pressures and affects the genome, helping to determine human traits. This article reviews current approaches to detecting the natural selection of skin color variation in human populations. We include statistical methods for clarifying gene-environment interactions applicable to the interactions with UV radiation levels. We recommend spatial data mining as an efficient approach that applies environmental association rules, extending our knowledge of adaptation to the environment.
Collapse
Affiliation(s)
- Sumiko Anno
- Shibaura Institute of Technology, 3-7-5 Toyosu, Koto-ku, Tokyo 135-8548, Japan.
| | | | | |
Collapse
|
5419
|
Bridges M, Heron EA, O'Dushlaine C, Segurado R, Morris D, Corvin A, Gill M, Pinto C. Genetic classification of populations using supervised learning. PLoS One 2011; 6:e14802. [PMID: 21589856 PMCID: PMC3093382 DOI: 10.1371/journal.pone.0014802] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2010] [Accepted: 12/01/2010] [Indexed: 11/18/2022] Open
Abstract
There are many instances in genetics in which we wish to determine whether two
candidate populations are distinguishable on the basis of their genetic
structure. Examples include populations which are geographically separated,
case–control studies and quality control (when participants in a study
have been genotyped at different laboratories). This latter application is of
particular importance in the era of large scale genome wide association studies,
when collections of individuals genotyped at different locations are being
merged to provide increased power. The traditional method for detecting
structure within a population is some form of exploratory technique such as
principal components analysis. Such methods, which do not utilise our prior
knowledge of the membership of the candidate populations. are termed
unsupervised. Supervised methods, on the other hand are
able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are
a more appropriate tool for detecting genetic differences between populations.
We apply two such methods, (neural networks and support vector machines) to the
classification of three populations (two from Scotland and one from Bulgaria).
The sensitivity exhibited by both these methods is considerably higher than that
attained by principal components analysis and in fact comfortably exceeds a
recently conjectured theoretical limit on the sensitivity of unsupervised
methods. In particular, our methods can distinguish between the two Scottish
populations, where principal components analysis cannot. We suggest, on the
basis of our results that a supervised learning approach should be the method of
choice when classifying individuals into pre-defined populations, particularly
in quality control for large scale genome wide association studies.
Collapse
Affiliation(s)
- Michael Bridges
- Astrophysics Group, Cavendish Laboratory, Cambridge, United
Kingdom
| | - Elizabeth A. Heron
- Neuropsychiatric Genetics Research Group, Department of Psychiatry,
Trinity College, Dublin, Ireland
| | - Colm O'Dushlaine
- Neuropsychiatric Genetics Research Group, Department of Psychiatry,
Trinity College, Dublin, Ireland
| | - Ricardo Segurado
- Neuropsychiatric Genetics Research Group, Department of Psychiatry,
Trinity College, Dublin, Ireland
| | | | - Derek Morris
- Neuropsychiatric Genetics Research Group, Department of Psychiatry,
Trinity College, Dublin, Ireland
| | - Aiden Corvin
- Neuropsychiatric Genetics Research Group, Department of Psychiatry,
Trinity College, Dublin, Ireland
| | - Michael Gill
- Neuropsychiatric Genetics Research Group, Department of Psychiatry,
Trinity College, Dublin, Ireland
| | - Carlos Pinto
- Neuropsychiatric Genetics Research Group, Department of Psychiatry,
Trinity College, Dublin, Ireland
- * E-mail:
| |
Collapse
|
5420
|
Curtis RE, Yuen A, Song L, Goyal A, Xing EP. TVNViewer: an interactive visualization tool for exploring networks that change over time or space. Bioinformatics 2011; 27:1880-1. [PMID: 21551142 DOI: 10.1093/bioinformatics/btr273] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED The relationship between genes and proteins is a dynamic relationship that changes across time and differs in different cells. The study of these differences can reveal various insights into biological processes and disease progression, especially with the aid of proper tools for network visualization. Toward this purpose, we have developed TVNViewer, a novel visualization tool, which is specifically designed to aid in the exploration and analysis of dynamic networks. AVAILABILITY TVNViewer is freely available with documentation and tutorials on the web at http://sailing.cs.cmu.edu/tvnviewer. CONTACT epxing@cs.cmu.edu.
Collapse
Affiliation(s)
- Ross E Curtis
- Joint CMU-Pitt PhD Program in Computational Biology, Lane Center for Computational Biology, School of Computer Science, Language Technologies Institute and Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | |
Collapse
|
5421
|
Aldini G, Regazzoni L, Pedretti A, Carini M, Cho SM, Park KM, Yeum KJ. An integrated high resolution mass spectrometric and informatics approach for the rapid identification of phenolics in plant extract. J Chromatogr A 2011; 1218:2856-64. [DOI: 10.1016/j.chroma.2011.02.065] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2010] [Revised: 01/25/2011] [Accepted: 02/23/2011] [Indexed: 11/27/2022]
|
5422
|
Abstract
AIMS To examine the literature on the associations between alcohol use disorders (AUD) and major depression (MD), and to evaluate the evidence for the existence of a causal relationship between the disorders. METHODS PsycInfo; PubMed; Embase; Scopus; ISI Web of Science database searches for studies pertaining to AUD and MD from the 1980 to the present. Random-effects models were used to derive estimates of the pooled adjusted odds ratios (AOR) for the links between AUD and MD among studies reporting an AOR. RESULTS The analysis revealed that the presence of either disorder doubled the risks of the second disorder, with pooled AORs ranging from 2.00 to 2.09. Epidemiological data suggest that the linkages between the disorders cannot be accounted for fully by common factors that influence both AUD and MD, and that the disorders appear to be linked in a causal manner. Further evidence suggests that the most plausible causal association between AUD and MD is one in which AUD increases the risk of MD, rather than vice versa. Potential mechanisms underlying these causal linkages include neurophysiological and metabolic changes resulting from exposure to alcohol. The need for further research examining mechanisms of linkage, gender differences in associations between AUD and MD and classification issues was identified. CONCLUSIONS The current state of the literature suggests a causal linkage between alcohol use disorders and major depression, such that increasing involvement with alcohol increases risk of depression. Further research is needed in order to clarify the nature of this causal link, in order to develop effective intervention and treatment approaches.
Collapse
Affiliation(s)
- Joseph M Boden
- Christchurch Health and Development Study, University of Otago, Christchurch School of Medicine and Health Sciences, Christchurch, New Zealand
| | | |
Collapse
|
5423
|
Armañanzas R, Saeys Y, Inza I, García-Torres M, Bielza C, van de Peer Y, Larrañaga P. Peakbin selection in mass spectrometry data using a consensus approach with estimation of distribution algorithms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:760-774. [PMID: 21393653 DOI: 10.1109/tcbb.2010.18] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Progress is continuously being made in the quest for stable biomarkers linked to complex diseases. Mass spectrometers are one of the devices for tackling this problem. The data profiles they produce are noisy and unstable. In these profiles, biomarkers are detected as signal regions (peaks), where control and disease samples behave differently. Mass spectrometry (MS) data generally contain a limited number of samples described by a high number of features. In this work, we present a novel class of evolutionary algorithms, estimation of distribution algorithms (EDA), as an efficient peak selector in this MS domain. There is a trade-of f between the reliability of the detected biomarkers and the low number of samples for analysis. For this reason, we introduce a consensus approach, built upon the classical EDA scheme, that improves stability and robustness of the final set of relevant peaks. An entire data workflow is designed to yield unbiased results. Four publicly available MS data sets (two MALDI-TOF and another two SELDI-TOF) are analyzed. The results are compared to the original works, and a new plot (peak frequential plot) for graphically inspecting the relevant peaks is introduced. A complete online supplementary page, which can be found at http://www.sc.ehu.es/ccwbayes/members/ruben/ms, includes extended info and results, in addition to Matlab scripts and references.
Collapse
Affiliation(s)
- Rubén Armañanzas
- Computational Intelligence Group, Departamento de Inteligencia Artificial, Universidad Politécnica de Madrid, Campus de Montegancedo, 28.660 Boadilla del Monte, Spain.
| | | | | | | | | | | | | |
Collapse
|
5424
|
Pavlopoulos GA, Secrier M, Moschopoulos CN, Soldatos TG, Kossida S, Aerts J, Schneider R, Bagos PG. Using graph theory to analyze biological networks. BioData Min 2011; 4:10. [PMID: 21527005 PMCID: PMC3101653 DOI: 10.1186/1756-0381-4-10] [Citation(s) in RCA: 306] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2010] [Accepted: 04/28/2011] [Indexed: 11/10/2022] Open
Abstract
Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network. This network profiling combined with knowledge extraction will help us to better understand the biological significance of the system.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Department of Computer Science and Biomedical Informatics, University of Central Greece, Lamia, 35100, Greece
- Faculty of Engineering - ESAT/SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001, Leuven-Heverlee, Belgium
| | - Maria Secrier
- Structural and Computational Biology Unit, EMBL, Meyerhofstrasse 1, 69117, Heidelberg, Germany
| | - Charalampos N Moschopoulos
- Department of Computer Engineering & Informatics, University of Patras, Rio, 6500, Patras, Greece
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, 11527, Athens, Greece
| | | | - Sophia Kossida
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, 11527, Athens, Greece
| | - Jan Aerts
- Faculty of Engineering - ESAT/SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001, Leuven-Heverlee, Belgium
| | - Reinhard Schneider
- Structural and Computational Biology Unit, EMBL, Meyerhofstrasse 1, 69117, Heidelberg, Germany
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Limpertsberg, 162 A, avenue de la Faïencerie, L-1511 Luxembourg
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Central Greece, Lamia, 35100, Greece
| |
Collapse
|
5425
|
Jeff JM, Brown-Gentry K, Buxbaum SG, Sarpong DF, Taylor HA, George AL, Roden DM, Crawford DC. SCN5A variation is associated with electrocardiographic traits in the Jackson Heart Study. CIRCULATION. CARDIOVASCULAR GENETICS 2011; 4:139-44. [PMID: 21325150 PMCID: PMC3080430 DOI: 10.1161/circgenetics.110.958124] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
BACKGROUND Understanding variation in the normal electric activity of the heart, assessed by the ECG, may provide a starting point for studies of susceptibility to serious arrhythmias such as sudden cardiac death during myocardial infarction or drug therapy. Recent genetic association studies of one ECG trait, the QT interval, have identified common variation in European-descent populations, but little is known about the genetic determinants of ECG traits in populations of African descent. METHODS AND RESULTS To identify genetic risk factors, we have undertaken a candidate gene study of ECG traits in collaboration with the Jackson Heart Study, a longitudinal study of 5301 blacks ascertained from the Jackson, Mississippi, area. Nine quantitative ECG traits were evaluated: P, PR, QRS, QT, and QTc durations, heart rate, and P, QRS, and T axes. We genotyped 72 variations in the predominant sodium channel gene expressed in heart, SCN5A, encoding the Na(v)1.5 voltage-gated sodium channel in 4558 subjects. Both rare and common variants in this gene have previously been associated with inherited arrhythmia syndromes and variable conduction. Adjusting for age, sex, and European ancestry, we performed tests of association in 3054 unrelated participants and identified 14 significant associations (P<1.0×10(-4)), of which 13 are independent, based on linkage disequilibrium. These variants explain up to 2% of the variation in ECG traits in the Jackson Heart Study. CONCLUSIONS These results suggest that SCN5A variation contributes to ECG trait distributions in blacks, and these same variations may be risk or protective factors associated with susceptibility to arrhythmias.
Collapse
Affiliation(s)
- Janina M. Jeff
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232
| | | | - Sarah G. Buxbaum
- Jackson Heart Study, Jackson State University, Jackson, MS 39213
| | | | - Herman A. Taylor
- Jackson Heart Study, Jackson State University, Jackson, MS 39213
| | - Alfred L. George
- Department of Medicine, Vanderbilt University, Nashville, TN 37232
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232
- Institute for Integrative Genomics, Vanderbilt University, Nashville, TN 37232
| | - Dan M. Roden
- Department of Medicine, Vanderbilt University, Nashville, TN 37232
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232
| | - Dana C. Crawford
- Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232
- Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, TN 37232
| |
Collapse
|
5426
|
Abstract
Over the last few years, main effect genetic association analysis has proven to be a successful tool to unravel genetic risk components to a variety of complex diseases. In the quest for disease susceptibility factors and the search for the 'missing heritability', supplementary and complementary efforts have been undertaken. These include the inclusion of several genetic inheritance assumptions in model development, the consideration of different sources of information, and the acknowledgement of disease underlying pathways of networks. The search for epistasis or gene-gene interaction effects on traits of interest is marked by an exponential growth, not only in terms of methodological development, but also in terms of practical applications, translation of statistical epistasis to biological epistasis and integration of omics information sources. The current popularity of the field, as well as its attraction to interdisciplinary teams, each making valuable contributions with sometimes rather unique viewpoints, renders it impossible to give an exhaustive review of to-date available approaches for epistasis screening. The purpose of this work is to give a perspective view on a selection of currently active analysis strategies and concerns in the context of epistasis detection, and to provide an eye to the future of gene-gene interaction analysis.
Collapse
Affiliation(s)
- Kristel Van Steen
- Department of Electrical Engineering and Computer Science (Montefiore Institute), Grande Traverse, Bioinformatique 4000 Liège 1, Belgium.
| |
Collapse
|
5427
|
Shin M, Lee H. Prioritizing candidate genes by weighted network structure for the identification of disease marker genes. BIOCHIP JOURNAL 2011. [DOI: 10.1007/s13206-011-5105-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
5428
|
Grady BJ, Ritchie MD. Statistical Optimization of Pharmacogenomics Association Studies: Key Considerations from Study Design to Analysis. CURRENT PHARMACOGENOMICS AND PERSONALIZED MEDICINE 2011; 9:41-66. [PMID: 21887206 PMCID: PMC3163263 DOI: 10.2174/187569211794728805] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Research in human genetics and genetic epidemiology has grown significantly over the previous decade, particularly in the field of pharmacogenomics. Pharmacogenomics presents an opportunity for rapid translation of associated genetic polymorphisms into diagnostic measures or tests to guide therapy as part of a move towards personalized medicine. Expansion in genotyping technology has cleared the way for widespread use of whole-genome genotyping in the effort to identify novel biology and new genetic markers associated with pharmacokinetic and pharmacodynamic endpoints. With new technology and methodology regularly becoming available for use in genetic studies, a discussion on the application of such tools becomes necessary. In particular, quality control criteria have evolved with the use of GWAS as we have come to understand potential systematic errors which can be introduced into the data during genotyping. There have been several replicated pharmacogenomic associations, some of which have moved to the clinic to enact change in treatment decisions. These examples of translation illustrate the strength of evidence necessary to successfully and effectively translate a genetic discovery. In this review, the design of pharmacogenomic association studies is examined with the goal of optimizing the impact and utility of this research. Issues of ascertainment, genotyping, quality control, analysis and interpretation are considered.
Collapse
Affiliation(s)
- Benjamin J. Grady
- Department of Molecular Physiology & Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, TN, USA
| | - Marylyn D. Ritchie
- Department of Molecular Physiology & Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
5429
|
Baye TM, Butsch Kovacic M, Biagini Myers JM, Martin LJ, Lindsey M, Patterson TL, He H, Ericksen MB, Gupta J, Tsoras AM, Lindsley A, Rothenberg ME, Wills-Karp M, Eissa NT, Borish L, Khurana Hershey GK. Differences in candidate gene association between European ancestry and African American asthmatic children. PLoS One 2011; 6:e16522. [PMID: 21387019 PMCID: PMC3046166 DOI: 10.1371/journal.pone.0016522] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 01/02/2011] [Indexed: 12/31/2022] Open
Abstract
Background Candidate gene case-control studies have identified several single nucleotide polymorphisms (SNPs) that are associated with asthma susceptibility. Most of these studies have been restricted to evaluations of specific SNPs within a single gene and within populations from European ancestry. Recently, there is increasing interest in understanding racial differences in genetic risk associated with childhood asthma. Our aim was to compare association patterns of asthma candidate genes between children of European and African ancestry. Methodology/Principal Findings Using a custom-designed Illumina SNP array, we genotyped 1,485 children within the Greater Cincinnati Pediatric Clinic Repository and Cincinnati Genomic Control Cohort for 259 SNPs in 28 genes and evaluated their associations with asthma. We identified 14 SNPs located in 6 genes that were significantly associated (p-values <0.05) with childhood asthma in African Americans. Among Caucasians, 13 SNPs in 5 genes were associated with childhood asthma. Two SNPs in IL4 were associated with asthma in both races (p-values <0.05). Gene-gene interaction studies identified race specific sets of genes that best discriminate between asthmatic children and non-allergic controls. Conclusions/Significance We identified IL4 as having a role in asthma susceptibility in both African American and Caucasian children. However, while IL4 SNPs were associated with asthma in asthmatic children with European and African ancestry, the relative contributions of the most replicated asthma-associated SNPs varied by ancestry. These data provides valuable insights into the pathways that may predispose to asthma in individuals with European vs. African ancestry.
Collapse
Affiliation(s)
- Tesfaye M. Baye
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Melinda Butsch Kovacic
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Jocelyn M. Biagini Myers
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Lisa J. Martin
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Mark Lindsey
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Tia L. Patterson
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Hua He
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Mark B. Ericksen
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Jayanta Gupta
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Anna M. Tsoras
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Andrew Lindsley
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Marc E. Rothenberg
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Marsha Wills-Karp
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - N. Tony Eissa
- Department of Medicine, Baylor College of Medicine, Houston, Texas, United States of America
| | - Larry Borish
- Department of Medicine, University of Virginia, Charlottesville, Virginia, United States of America
| | - Gurjit K. Khurana Hershey
- Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, Ohio, United States of America
- * E-mail:
| |
Collapse
|
5430
|
|
5431
|
Yang P, Ho JW, Yang YH, Zhou BB. Gene-gene interaction filtering with ensemble of filters. BMC Bioinformatics 2011; 12 Suppl 1:S10. [PMID: 21342539 PMCID: PMC3044264 DOI: 10.1186/1471-2105-12-s1-s10] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Background Complex diseases are commonly caused by multiple genes and their interactions with each other. Genome-wide association (GWA) studies provide us the opportunity to capture those disease associated genes and gene-gene interactions through panels of SNP markers. However, a proper filtering procedure is critical to reduce the search space prior to the computationally intensive gene-gene interaction identification step. In this study, we show that two commonly used SNP-SNP interaction filtering algorithms, ReliefF and tuned ReliefF (TuRF), are sensitive to the order of the samples in the dataset, giving rise to unstable and suboptimal results. However, we observe that the ‘unstable’ results from multiple runs of these algorithms can provide valuable information about the dataset. We therefore hypothesize that aggregating results from multiple runs of the algorithm may improve the filtering performance. Results We propose a simple and effective ensemble approach in which the results from multiple runs of an unstable filter are aggregated based on the general theory of ensemble learning. The ensemble versions of the ReliefF and TuRF algorithms, referred to as ReliefF-E and TuRF-E, are robust to sample order dependency and enable a more informative investigation of data characteristics. Using simulated and real datasets, we demonstrate that both the ensemble of ReliefF and the ensemble of TuRF can generate a much more stable SNP ranking than the original algorithms. Furthermore, the ensemble of TuRF achieved the highest success rate in comparison to many state-of-the-art algorithms as well as traditional χ2-test and odds ratio methods in terms of retaining gene-gene interactions.
Collapse
Affiliation(s)
- Pengyi Yang
- School of Information Technologies, University of Sydney, NSW 2006, Australia.
| | | | | | | |
Collapse
|
5432
|
Yang C, Wan X, He Z, Yang Q, Xue H, Yu W. The choice of null distributions for detecting gene-gene interactions in genome-wide association studies. BMC Bioinformatics 2011; 12 Suppl 1:S26. [PMID: 21342556 PMCID: PMC3044281 DOI: 10.1186/1471-2105-12-s1-s26] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND In genome-wide association studies (GWAS), the number of single-nucleotide polymorphisms (SNPs) typically ranges between 500,000 and 1,000,000. Accordingly, detecting gene-gene interactions in GWAS is computationally challenging because it involves hundreds of billions of SNP pairs. Stage-wise strategies are often used to overcome the computational difficulty. In the first stage, fast screening methods (e.g. Tuning ReliefF) are applied to reduce the whole SNP set to a small subset. In the second stage, sophisticated modeling methods (e.g., multifactor-dimensionality reduction (MDR)) are applied to the subset of SNPs to identify interesting interaction models and the corresponding interaction patterns. In the third stage, the significance of the identified interaction patterns is evaluated by hypothesis testing. RESULTS In this paper, we show that this stage-wise strategy could be problematic in controlling the false positive rate if the null distribution is not appropriately chosen. This is because screening and modeling may change the null distribution used in hypothesis testing. In our simulation study, we use some popular screening methods and the popular modeling method MDR as examples to show the effect of the inappropriate choice of null distributions. To choose appropriate null distributions, we suggest to use the permutation test or testing on the independent data set. We demonstrate their performance using synthetic data and a real genome wide data set from an Aged-related Macular Degeneration (AMD) study. CONCLUSIONS The permutation test or testing on the independent data set can help choosing appropriate null distributions in hypothesis testing, which provides more reliable results in practice.
Collapse
Affiliation(s)
- Can Yang
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong
| | - Xiang Wan
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong
| | - Zengyou He
- School of Software, Dalian University of Technology, China
| | - Qiang Yang
- Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong
| | - Hong Xue
- Department of Biochemistry, Hong Kong University of Science and Technology, Hong Kong
| | - Weichuan Yu
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong
| |
Collapse
|
5433
|
Ritchie MD. Using biological knowledge to uncover the mystery in the search for epistasis in genome-wide association studies. Ann Hum Genet 2011; 75:172-82. [PMID: 21158748 DOI: 10.1111/j.1469-1809.2010.00630.x] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The search for the missing heritability in genome-wide association studies (GWAS) has become an important focus for the human genetics community. One suspected location of these genetic effects is in gene-gene interactions, or epistasis. The computational burden of exploring gene-gene interactions in the wealth of data generated in GWAS, along with small to moderate sample sizes, have led to epistasis being an afterthought, rather than a primary focus of GWAS analyses. In this review, I discuss some potential approaches to filter a GWAS dataset to a smaller, more manageable dataset where searching for epistasis is considerably more feasible. I describe a number of alternative approaches, but primarily focus on the use of prior biological knowledge from databases in the public domain to guide the search for epistasis. The manner in which prior knowledge is incorporated into a GWA study can be many and these data can be extracted from a variety of database sources. I discuss a number of these approaches and propose that a comprehensive approach will likely be most fruitful for searching for epistasis in large-scale genomic studies of the current state-of-the-art and into the future.
Collapse
Affiliation(s)
- Marylyn D Ritchie
- Department of Molecular Physiology, Center for Human Genetics Research, Vanderbilt University, Nashville, TN 37232-0700, USA.
| |
Collapse
|
5434
|
Edwards DRV, Romero R, Kusanovic JP, Hassan SS, Mazaki-Tovi S, Vaisbuch E, Kim CJ, Erez O, Chaiworapongsa T, Pearce BD, Bartlett J, Friel LA, Salisbury BA, Anant MK, Vovis GF, Lee MS, Gomez R, Behnke E, Oyarzun E, Tromp G, Menon R, Williams SM. Polymorphisms in maternal and fetal genes encoding for proteins involved in extracellular matrix metabolism alter the risk for small-for-gestational-age. J Matern Fetal Neonatal Med 2011; 24:362-80. [PMID: 20617897 PMCID: PMC3104673 DOI: 10.3109/14767058.2010.497572] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
OBJECTIVE To examine the association between maternal and fetal genetic variants and small-for-gestational-age (SGA). METHODS A case-control study was conducted in patients with SGA neonates (530 maternal and 436 fetal) and controls (599 maternal and 628 fetal); 190 candidate genes and 775 SNPs were studied. Single-locus, multi-locus and haplotype association analyses were performed on maternal and fetal data with logistic regression, multifactor dimensionality reduction (MDR) analysis, and haplotype-based association with 2 and 3 marker sliding windows, respectively. Ingenuity pathway analysis (IPA) software was used to assess pathways that associate with SGA. RESULTS The most significant single-locus association in maternal data was with a SNP in tissue inhibitor of metalloproteinase 2 (TIMP2) (rs2277698 OR = 1.71, 95% CI [1.26-2.32], p = 0.0006) while in the fetus it was with a SNP in fibronectin 1 isoform 3 preproprotein (FN1) (rs3796123, OR = 1.46, 95% CI [1.20-1.78], p = 0.0001). Both SNPs were adjusted for potential confounders (maternal body mass index and fetal sex). Haplotype analyses resulted in associations in α 1 type I collagen preproprotein (COL1A1, rs1007086-rs2141279-rs17639446, global p = 0.006) in mothers and FN1 (rs2304573-rs1250204-rs1250215, global p = 0.045) in fetuses. Multi-locus analyses with MDR identified a two SNP model with maternal variants collagen type V α 2 (COL5A2) and plasminogen activator urokinase (PLAU) predicting SGA outcome correctly 59% of the time (p = 0.035). CONCLUSIONS Genetic variants in extracellular matrix-related genes showed significant single-locus association with SGA. These data are consistent with other studies that have observed elevated circulating fibronectin concentrations in association with increased risk of SGA. The present study supports the hypothesis that DNA variants can partially explain the risk of SGA in a cohort of Hispanic women.
Collapse
Affiliation(s)
- Digna R. Velez Edwards
- Vanderbilt Epidemiology Center, Institute of Medicine and Public Health, Department of Obstetrics and Gynecology, Vanderbilt University, Nashville, Tennessee, USA
| | - Roberto Romero
- Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, and Detroit, Michigan, USA
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, USA
| | - Juan Pedro Kusanovic
- Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, and Detroit, Michigan, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, USA
| | - Sonia S. Hassan
- Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, and Detroit, Michigan, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, USA
| | - Shali Mazaki-Tovi
- Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, and Detroit, Michigan, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, USA
| | - Edi Vaisbuch
- Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, and Detroit, Michigan, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, USA
| | - Chong Jai Kim
- Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, and Detroit, Michigan, USA
- Department of Pathology, Wayne State University, Detroit, Michigan, USA
| | - Offer Erez
- Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, and Detroit, Michigan, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, USA
| | - Tinnakorn Chaiworapongsa
- Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, and Detroit, Michigan, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, USA
| | - Brad D. Pearce
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - Jacquelaine Bartlett
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, USA
| | - Lara A. Friel
- Perinatology Research Branch, NICHD/NIH/DHHS, Bethesda, Maryland, and Detroit, Michigan, USA
- Department of Obstetrics and Gynecology, Wayne State University, Detroit, Michigan, USA
| | | | | | | | | | - Ricardo Gomez
- CEDIP (Center for Perinatal Diagnosis and Research), Department of Obstetrics and Gynecology, Sotero del Rio Hospital, Santiago, Chile
- Department of Obstetrics and Gynecology, Pontificia Universidad Catolica de Chile, Santiago, Chile
| | - Ernesto Behnke
- CEDIP (Center for Perinatal Diagnosis and Research), Department of Obstetrics and Gynecology, Sotero del Rio Hospital, Santiago, Chile
| | - Enrique Oyarzun
- Department of Obstetrics and Gynecology, Pontificia Universidad Catolica de Chile, Santiago, Chile
| | - Gerard Tromp
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, USA
| | - Ramkumar Menon
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - Scott M. Williams
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
5435
|
Perlman L, Gottlieb A, Atias N, Ruppin E, Sharan R. Combining Drug and Gene Similarity Measures for Drug-Target Elucidation. J Comput Biol 2011; 18:133-45. [DOI: 10.1089/cmb.2010.0213] [Citation(s) in RCA: 125] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Affiliation(s)
| | - Assaf Gottlieb
- The Blavatnik School of Computer Science, Tel Aviv University, Tel-Aviv, Israel
| | - Nir Atias
- The Blavatnik School of Computer Science, Tel Aviv University, Tel-Aviv, Israel
| | - Eytan Ruppin
- The Blavatnik School of Computer Science, Tel Aviv University, Tel-Aviv, Israel
- School of Medicine, Tel Aviv University, Tel-Aviv, Israel
| | - Roded Sharan
- The Blavatnik School of Computer Science, Tel Aviv University, Tel-Aviv, Israel
| |
Collapse
|
5436
|
Ahn J, Yoon Y, Park S. Noise-robust algorithm for identifying functionally associated biclusters from gene expression data. Inf Sci (N Y) 2011. [DOI: 10.1016/j.ins.2010.10.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
5437
|
Dumitrescu L, Glenn K, Brown-Gentry K, Shephard C, Wong M, Rieder MJ, Smith JD, Nickerson DA, Crawford DC. Variation in LPA is associated with Lp(a) levels in three populations from the Third National Health and Nutrition Examination Survey. PLoS One 2011; 6:e16604. [PMID: 21305047 PMCID: PMC3030597 DOI: 10.1371/journal.pone.0016604] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2010] [Accepted: 12/22/2010] [Indexed: 02/06/2023] Open
Abstract
The distribution of lipoprotein(a) [Lp(a)] levels can differ dramatically across diverse racial/ethnic populations. The extent to which genetic variation in LPA can explain these differences is not fully understood. To explore this, 19 LPA tagSNPs were genotyped in 7,159 participants from the Third National Health and Nutrition Examination Survey (NHANES III). NHANES III is a diverse population-based survey with DNA samples linked to hundreds of quantitative traits, including serum Lp(a). Tests of association between LPA variants and transformed Lp(a) levels were performed across the three different NHANES subpopulations (non-Hispanic whites, non-Hispanic blacks, and Mexican Americans). At a significance threshold of p<0.0001, 15 of the 19 SNPs tested were strongly associated with Lp(a) levels in at least one subpopulation, six in at least two subpopulations, and none in all three subpopulations. In non-Hispanic whites, three variants were associated with Lp(a) levels, including previously known rs6919246 (p = 1.18 × 10(-30)). Additionally, 12 and 6 variants had significant associations in non-Hispanic blacks and Mexican Americans, respectively. The additive effects of these associated alleles explained up to 11% of the variance observed for Lp(a) levels in the different racial/ethnic populations. The findings reported here replicate previous candidate gene and genome-wide association studies for Lp(a) levels in European-descent populations and extend these findings to other populations. While we demonstrate that LPA is an important contributor to Lp(a) levels regardless of race/ethnicity, the lack of generalization of associations across all subpopulations suggests that specific LPA variants may be contributing to the observed Lp(a) between-population variance.
Collapse
Affiliation(s)
- Logan Dumitrescu
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Kimberly Glenn
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Kristin Brown-Gentry
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Cynthia Shephard
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Michelle Wong
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Mark J. Rieder
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Joshua D. Smith
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Deborah A. Nickerson
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Dana C. Crawford
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
5438
|
Nepomuceno JA, Troncoso A, Aguilar-Ruiz JS. Biclustering of gene expression data by correlation-based scatter search. BioData Min 2011; 4:3. [PMID: 21261986 PMCID: PMC3037342 DOI: 10.1186/1756-0381-4-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 01/24/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The analysis of data generated by microarray technology is very useful to understand how the genetic information becomes functional gene products. Biclustering algorithms can determine a group of genes which are co-expressed under a set of experimental conditions. Recently, new biclustering methods based on metaheuristics have been proposed. Most of them use the Mean Squared Residue as merit function but interesting and relevant patterns from a biological point of view such as shifting and scaling patterns may not be detected using this measure. However, it is important to discover this type of patterns since commonly the genes can present a similar behavior although their expression levels vary in different ranges or magnitudes. METHODS Scatter Search is an evolutionary technique that is based on the evolution of a small set of solutions which are chosen according to quality and diversity criteria. This paper presents a Scatter Search with the aim of finding biclusters from gene expression data. In this algorithm the proposed fitness function is based on the linear correlation among genes to detect shifting and scaling patterns from genes and an improvement method is included in order to select just positively correlated genes. RESULTS The proposed algorithm has been tested with three real data sets such as Yeast Cell Cycle dataset, human B-cells lymphoma dataset and Yeast Stress dataset, finding a remarkable number of biclusters with shifting and scaling patterns. In addition, the performance of the proposed method and fitness function are compared to that of CC, OPSM, ISA, BiMax, xMotifs and Samba using Gene the Ontology Database.
Collapse
Affiliation(s)
- Juan A Nepomuceno
- Dpt. Lenguajes y Sistemas Informáticos, ETSII, University of Seville, Avd. Reina Mercedes s/n, 41012, Seville, Spain
| | - Alicia Troncoso
- Department of Computer Science, School of Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013, Seville, Spain
| | - Jesús S Aguilar-Ruiz
- Department of Computer Science, School of Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013, Seville, Spain
| |
Collapse
|
5439
|
Martin CW, Tauchen A, Becker A, Nattkemper TW. A Normalized Tree Index for identification of correlated clinical parameters in microarray experiments. BioData Min 2011; 4:2. [PMID: 21247420 PMCID: PMC3035591 DOI: 10.1186/1756-0381-4-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 01/19/2011] [Indexed: 11/10/2022] Open
Abstract
Background Measurements on gene level are widely used to gain new insights in complex diseases e.g. cancer. A promising approach to understand basic biological mechanisms is to combine gene expression profiles and classical clinical parameters. However, the computation of a correlation coefficient between high-dimensional data and such parameters is not covered by traditional statistical methods. Methods We propose a novel index, the Normalized Tree Index (NTI), to compute a correlation coefficient between the clustering result of high-dimensional microarray data and nominal clinical parameters. The NTI detects correlations between hierarchically clustered microarray data and nominal clinical parameters (labels) and gives a measurement of significance in terms of an empiric p-value of the identified correlations. Therefore, the microarray data is clustered by hierarchical agglomerative clustering using standard settings. In a second step, the computed cluster tree is evaluated. For each label, a NTI is computed measuring the correlation between that label and the clustered microarray data. Results The NTI successfully identifies correlated clinical parameters at different levels of significance when applied on two real-world microarray breast cancer data sets. Some of the identified highly correlated labels confirm the actual state of knowledge whereas others help to identify new risk factors and provide a good basis to formulate new hypothesis. Conclusions The NTI is a valuable tool in the domain of biomedical data analysis. It allows the identification of correlations between high-dimensional data and nominal labels, while at the same time a p-value measures the level of significance of the detected correlations.
Collapse
Affiliation(s)
- Christian W Martin
- University of Bielefeld, Faculty of Technology, Biodata Mining & Applied Neuroinformatics Group, P,O,-Box 100131, D-33501 Bielefeld, Germany.
| | | | | | | |
Collapse
|
5440
|
Yang X, Ye Y, Wang G, Huang H, Yu D, Liang S. VeryGene: linking tissue-specific genes to diseases, drugs, and beyond for knowledge discovery. Physiol Genomics 2011; 43:457-60. [PMID: 21245417 DOI: 10.1152/physiolgenomics.00178.2010] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
In addition to many other genes, tissue-specific genes (TSGs) represent a set of genes of great importance for human physiology. However, the links among TSGs, diseases, and potential therapeutic agents are often missing, hidden, or too scattered to find. There is a need to establish a knowledgebase for researchers to share this and additional information in order to speed up discovery and clinical practice. As an initiative toward systems biology, the VeryGene web server was developed to fill this gap. A significant effort has been made to integrate TSGs from two large-scale data analyses with respective information on subcellular localization, Gene Ontology, Reactome, KEGG pathway, Mouse Genome Informatics (MGI) Mammalian Phenotype, disease association, and targeting drugs. The current release carefully selected 3,960 annotated TSGs derived from 127 normal human tissues and cell types, including 5,672 gene-disease and 2,171 drug-target relationships. In addition to being a specialized source for TSGs, VeryGene can be used as a discovery tool by generating novel inferences. Some inherently useful but hidden relations among genes, diseases, drugs, and other important aspects can be inferred to form testable hypotheses. VeryGene is available online at http://www.verygene.com.
Collapse
Affiliation(s)
- Xiaoqin Yang
- Institute of Genetic Engineering, Southern Medical University, Guangzhou, Guangdong Province, China
| | | | | | | | | | | |
Collapse
|
5441
|
Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011; 12:56-68. [PMID: 21164525 DOI: 10.1038/nrg2918] [Citation(s) in RCA: 2801] [Impact Index Per Article: 215.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Given the functional interdependencies between the molecular components in a human cell, a disease is rarely a consequence of an abnormality in a single gene, but reflects the perturbations of the complex intracellular and intercellular network that links tissue and organ systems. The emerging tools of network medicine offer a platform to explore systematically not only the molecular complexity of a particular disease, leading to the identification of disease modules and pathways, but also the molecular relationships among apparently distinct (patho)phenotypes. Advances in this direction are essential for identifying new disease genes, for uncovering the biological significance of disease-associated mutations identified by genome-wide association studies and full-genome sequencing, and for identifying drug targets and biomarkers for complex diseases.
Collapse
Affiliation(s)
- Albert-László Barabási
- Center for Complex Networks Research and Department of Physics, Northeastern University, 110 Forsyth Street, 111 Dana Research Center, Boston, Massachusetts 02115, USA.
| | | | | |
Collapse
|
5442
|
Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, Matasci N, Wang L, Hanlon M, Lenards A, Muir A, Merchant N, Lowry S, Mock S, Helmke M, Kubach A, Narro M, Hopkins N, Micklos D, Hilgert U, Gonzales M, Jordan C, Skidmore E, Dooley R, Cazes J, McLay R, Lu Z, Pasternak S, Koesterke L, Piel WH, Grene R, Noutsos C, Gendler K, Feng X, Tang C, Lent M, Kim SJ, Kvilekval K, Manjunath BS, Tannen V, Stamatakis A, Sanderson M, Welch SM, Cranston KA, Soltis P, Soltis D, O'Meara B, Ane C, Brutnell T, Kleibenstein DJ, White JW, Leebens-Mack J, Donoghue MJ, Spalding EP, Vision TJ, Myers CR, Lowenthal D, Enquist BJ, Boyle B, Akoglu A, Andrews G, Ram S, Ware D, Stein L, Stanzione D. The iPlant Collaborative: Cyberinfrastructure for Plant Biology. FRONTIERS IN PLANT SCIENCE 2011; 2:34. [PMID: 22645531 PMCID: PMC3355756 DOI: 10.3389/fpls.2011.00034] [Citation(s) in RCA: 255] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2011] [Accepted: 07/11/2011] [Indexed: 05/17/2023]
Abstract
The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). iPlant is developing cyberinfrastructure that uniquely enables scientists throughout the diverse fields that comprise plant biology to address Grand Challenges in new ways, to stimulate and facilitate cross-disciplinary research, to promote biology and computer science research interactions, and to train the next generation of scientists on the use of cyberinfrastructure in research and education. Meeting humanity's projected demands for agricultural and forest products and the expectation that natural ecosystems be managed sustainably will require synergies from the application of information technologies. The iPlant cyberinfrastructure design is based on an unprecedented period of research community input, and leverages developments in high-performance computing, data storage, and cyberinfrastructure for the physical sciences. iPlant is an open-source project with application programming interfaces that allow the community to extend the infrastructure to meet its needs. iPlant is sponsoring community-driven workshops addressing specific scientific questions via analysis tool integration and hypothesis testing. These workshops teach researchers how to add bioinformatics tools and/or datasets into the iPlant cyberinfrastructure enabling plant scientists to perform complex analyses on large datasets without the need to master the command-line or high-performance computational services.
Collapse
Affiliation(s)
- Stephen A. Goff
- BIO5 Institute, University of ArizonaTucson, AZ, USA
- *Correspondence: Stephen A. Goff, iPlant Collaborative, BIO5 Institute, University of Arizona, Tucson, AZ 85721, USA. e-mail:
| | - Matthew Vaughn
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | - Sheldon McKay
- BIO5 Institute, University of ArizonaTucson, AZ, USA
| | - Eric Lyons
- BIO5 Institute, University of ArizonaTucson, AZ, USA
| | - Ann E. Stapleton
- Department of Biology, University of North CarolinaWilmington, NC, USA
- Department of Marine Sciences, University of North CarolinaWilmington, NC, USA
| | | | - Naim Matasci
- BIO5 Institute, University of ArizonaTucson, AZ, USA
| | - Liya Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Matthew Hanlon
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | | | - Andy Muir
- BIO5 Institute, University of ArizonaTucson, AZ, USA
| | | | - Sonya Lowry
- BIO5 Institute, University of ArizonaTucson, AZ, USA
| | - Stephen Mock
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | | | - Adam Kubach
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | - Martha Narro
- BIO5 Institute, University of ArizonaTucson, AZ, USA
| | | | - David Micklos
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring HarborNY, USA
| | - Uwe Hilgert
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring HarborNY, USA
| | - Michael Gonzales
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | - Chris Jordan
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | | | - Rion Dooley
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | - John Cazes
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | - Robert McLay
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | - Zhenyuan Lu
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | - Lars Koesterke
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | | | - Ruth Grene
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech UniversityBlacksburg, VA, USA
| | | | - Karla Gendler
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| | - Xin Feng
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- Ontario Center for Cancer ResearchToronto, ON, Canada
| | - Chunlao Tang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Monica Lent
- BIO5 Institute, University of ArizonaTucson, AZ, USA
| | - Seung-Jin Kim
- BIO5 Institute, University of ArizonaTucson, AZ, USA
| | - Kristian Kvilekval
- Center for Bio-image Informatics, University of CaliforniaSanta Barbara, CA, USA
| | - B. S. Manjunath
- Center for Bio-image Informatics, University of CaliforniaSanta Barbara, CA, USA
- Electrical and Computer Engineering, University of CaliforniaSanta Barbara, CA, USA
| | - Val Tannen
- Department of Computer and Information Science, University of PennsylvaniaPhiladelphia, PA, USA
| | - Alexandros Stamatakis
- Scientific Computing Group, Heidelberg Institute for Theoretical StudiesHeidelberg, Germany
| | - Michael Sanderson
- Department of Ecology and Evolutionary Biology, University of ArizonaTucson, AZ, USA
| | - Stephen M. Welch
- Department of Agronomy, Kansas State UniversityManhattan, KS, USA
| | | | - Pamela Soltis
- Florida Museum of Natural History, University of FloridaGainesville, FL, USA
| | - Doug Soltis
- Department of Biology, University of FloridaGainesville, FL, USA
| | - Brian O'Meara
- Department of Ecology and Evolutionary Biology, University of TennesseeKnoxville, TN, USA
| | - Cecile Ane
- Department of Statistics, University of WisconsinMadison, WI, USA
- Department of Botany, University of WisconsinMadison, WI, USA
| | - Tom Brutnell
- Boyce Thompson Institute for Plant Research, Cornell UniversityIthaca, NY, USA
| | | | - Jeffery W. White
- Arid-Land Agricultural Research Center, United States Department of Agriculture-Agricultural Research ServiceMaricopa, AZ, USA
| | | | - Michael J. Donoghue
- Department of Ecology and Evolutionary Biology, Yale UniversityNew Haven, CT, USA
| | | | - Todd J. Vision
- Department of Biology, University of North CarolinaChapel Hill, NC, USA
| | | | - David Lowenthal
- Department of Computer Science, University of ArizonaTucson, AZ, USA
| | - Brian J. Enquist
- Department of Ecology and Evolutionary Biology, University of ArizonaTucson, AZ, USA
| | - Brad Boyle
- Department of Ecology and Evolutionary Biology, University of ArizonaTucson, AZ, USA
| | - Ali Akoglu
- Department of Electrical and Computer Engineering, University of ArizonaTucson, AZ, USA
| | - Greg Andrews
- Department of Computer Science, University of ArizonaTucson, AZ, USA
| | - Sudha Ram
- Eller School of Business, University of ArizonaTucson, AZ, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Lincoln Stein
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- Ontario Center for Cancer ResearchToronto, ON, Canada
| | - Dan Stanzione
- Texas Advanced Computer Center, University of TexasAustin, TX, USA
| |
Collapse
|
5443
|
Bielow C, Gröpl C, Kohlbacher O, Reinert K. Bioinformatics for qualitative and quantitative proteomics. Methods Mol Biol 2011; 719:331-349. [PMID: 21370091 DOI: 10.1007/978-1-61779-027-0_15] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Mass spectrometry is today a key analytical technique to elucidate the amount and content of proteins expressed in a certain cellular context. The degree of automation in proteomics has yet to reach that of genomic techniques, but even current technologies make a manual inspection of the data infeasible. This article addresses the key algorithmic problems bioinformaticians face when handling modern proteomic samples and shows common solutions to them. We provide examples on how algorithms can be combined to build relatively complex analysis pipelines, point out certain pitfalls and aspects worth considering and give a list of current state-of-the-art tools.
Collapse
Affiliation(s)
- Chris Bielow
- AG Algorithmische Bioinformatik, Institut für Informatik, Freie Universität Berlin, Berlin, Germany.
| | | | | | | |
Collapse
|
5444
|
Cannon EKS, Birkett SM, Braun BL, Kodavali S, Jennewein DM, Yilmaz A, Antonescu V, Antonescu C, Harper LC, Gardiner JM, Schaeffer ML, Campbell DA, Andorf CM, Andorf D, Lisch D, Koch KE, McCarty DR, Quackenbush J, Grotewold E, Lushbough CM, Sen TZ, Lawrence CJ. POPcorn: An Online Resource Providing Access to Distributed and Diverse Maize Project Data. INTERNATIONAL JOURNAL OF PLANT GENOMICS 2011; 2011:923035. [PMID: 22253616 PMCID: PMC3255282 DOI: 10.1155/2011/923035] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 11/29/2011] [Indexed: 05/21/2023]
Abstract
The purpose of the online resource presented here, POPcorn (Project Portal for corn), is to enhance accessibility of maize genetic and genomic resources for plant biologists. Currently, many online locations are difficult to find, some are best searched independently, and individual project websites often degrade over time-sometimes disappearing entirely. The POPcorn site makes available (1) a centralized, web-accessible resource to search and browse descriptions of ongoing maize genomics projects, (2) a single, stand-alone tool that uses web Services and minimal data warehousing to search for sequence matches in online resources of diverse offsite projects, and (3) a set of tools that enables researchers to migrate their data to the long-term model organism database for maize genetic and genomic information: MaizeGDB. Examples demonstrating POPcorn's utility are provided herein.
Collapse
Affiliation(s)
- Ethalinda K. S. Cannon
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Scott M. Birkett
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Bremen L. Braun
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Sateesh Kodavali
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Douglas M. Jennewein
- Department of Computer Science, University of South Dakota, Vermillion, SD 57069, USA
| | - Alper Yilmaz
- Plant Biotechnology Center and Department of Molecular Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Valentin Antonescu
- Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Sm822, Boston, MA 02215, USA
| | - Corina Antonescu
- Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Sm822, Boston, MA 02215, USA
| | - Lisa C. Harper
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
- USDA-ARS Plant Gene Expression Center, Albany, CA 94710, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Jack M. Gardiner
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | - Mary L. Schaeffer
- USDA-ARS Plant Genetics Research Unit, University of Missouri, Columbia, MO 65211, USA
- Division of Plant Sciences, Department of Agronomy, University of Missouri, Columbia, MO 65211, USA
| | - Darwin A. Campbell
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Carson M. Andorf
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Destri Andorf
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Damon Lisch
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Karen E. Koch
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Donald R. McCarty
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - John Quackenbush
- Department of Biostatistics and Computational Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Sm822, Boston, MA 02215, USA
| | - Erich Grotewold
- Plant Biotechnology Center and Department of Molecular Genetics, The Ohio State University, Columbus, OH 43210, USA
| | - Carol M. Lushbough
- Department of Computer Science, University of South Dakota, Vermillion, SD 57069, USA
| | - Taner Z. Sen
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
| | - Carolyn J. Lawrence
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA
- *Carolyn J. Lawrence:
| |
Collapse
|
5445
|
Tohge T, Mettler T, Arrivault S, Carroll AJ, Stitt M, Fernie AR. From models to crop species: caveats and solutions for translational metabolomics. FRONTIERS IN PLANT SCIENCE 2011; 2:61. [PMID: 22639601 PMCID: PMC3355600 DOI: 10.3389/fpls.2011.00061] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2011] [Accepted: 09/13/2011] [Indexed: 05/04/2023]
Abstract
Although plant metabolomics is largely carried out on Arabidopsis it is essentially genome-independent, and thus potentially applicable to a wide range of species. However, transfer between species, or even between different tissues of the same species, is not facile. This is because the reliability of protocols for harvesting, handling and analysis depends on the biological features and chemical composition of the plant tissue. In parallel with the diversification of model species it is important to establish good handling and analytic practice, in order to augment computational comparisons between tissues and species. Liquid chromatography-mass spectrometry (LC-MS)-based metabolomics is one of the powerful approaches for metabolite profiling. By using a combination of different extraction methods, separation columns, and ion detection, a very wide range of metabolites can be analyzed. However, its application requires careful attention to exclude potential pitfalls, including artifactual changes in metabolite levels during sample preparation under variations of light or temperature and analytic errors due to ion suppression. Here we provide case studies with two different LC-MS-based metabolomics platforms and four species (Arabidopsis thaliana, Chlamydomonas reinhardtii, Solanum lycopersicum, and Oryza sativa) that illustrate how such dangers can be detected and circumvented.
Collapse
Affiliation(s)
- Takayuki Tohge
- Max-Planck-Institute for Molecular Plant PhysiologyPotsdam-Golm, Germany
- *Correspondence: Takayuki Tohge, Max-Planck-Institute for Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam-Golm, Germany. e-mail:
| | - Tabea Mettler
- Max-Planck-Institute for Molecular Plant PhysiologyPotsdam-Golm, Germany
| | | | - Adam James Carroll
- Australian Research Council Centre of Excellence in Plant Energy Biology, The Australian National UniversityCanberra, ACT, Australia
| | - Mark Stitt
- Max-Planck-Institute for Molecular Plant PhysiologyPotsdam-Golm, Germany
| | - Alisdair R. Fernie
- Max-Planck-Institute for Molecular Plant PhysiologyPotsdam-Golm, Germany
| |
Collapse
|
5446
|
Ozcaglar C, Shabbeer A, Kurepina N, Yener B, Bennett KP. Data-driven insights into deletions of Mycobacterium tuberculosis complex chromosomal DR region using spoligoforests. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2011:75-82. [PMID: 22343484 PMCID: PMC3279189 DOI: 10.1109/bibm.2011.64] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Biomarkers of Mycobacterium tuberculosis complex (MTBC) mutate over time. Among the biomarkers of MTBC, spacer oligonucleotide type (spoligotype) and Mycobacterium Interspersed Repetitive Unit (MIRU) patterns are commonly used to genotype clinical MTBC strains. In this study, we present an evolution model of spoligotype rearrangements using MIRU patterns to disambiguate the ancestors of spoligotypes, in a large patient dataset from the United States Centers for Disease Control and Prevention (CDC). Based on the contiguous deletion assumption and rare observation of convergent evolution, we first generate the most parsimonious forest of spoligotypes, called a spoligoforest, using three genetic distance measures. An analysis of topological attributes of the spoligoforest and number of variations at the direct repeat (DR) locus of each strain reveals interesting properties of deletions in the DR region. First, we compare our mutation model to existing mutation models of spoligotypes and find that our mutation model produces as many within-lineage mutation events as other models, with slightly higher segregation accuracy. Second, based on our mutation model, the number of descendant spoligotypes follows a power law distribution. Third, contrary to prior studies, the power law distribution does not plausibly fit to the mutation length frequency. Finally, the total number of mutation events at consecutive DR loci follows a bimodal distribution, which results in accumulation of shorter deletions in the DR region. The two modes are spacers 13 and 40, which are hotspots for chromosomal rearrangements. The change point in the bimodal distribution is spacer 34, which is absent in most MTBC strains. This bimodal separation results in accumulation of shorter deletions, which explains why a power law distribution is not a plausible fit to the mutation length frequency.
Collapse
Affiliation(s)
- Cagri Ozcaglar
- Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY
| | - Amina Shabbeer
- Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY
| | | | - Bülent Yener
- Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY
| | - Kristin P. Bennett
- Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY
- Mathematical Sciences Department, Rensselaer Polytechnic Institute, Troy, NY
| |
Collapse
|
5447
|
Costa PR, Acencio ML, Lemke N. A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data. BMC Genomics 2010; 11 Suppl 5:S9. [PMID: 21210975 PMCID: PMC3045802 DOI: 10.1186/1471-2164-11-s5-s9] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products. RESULTS In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively. CONCLUSIONS We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.
Collapse
Affiliation(s)
- Pedro R Costa
- Departamento de Física e Biofísica, Instituto de Biociências de Botucatu, UNESP - Univ Estadual Paulista, Distrito de Rubião Jr. s/n. Botucatu, São Paulo, 18618-970, Brazil
| | | | | |
Collapse
|
5448
|
Curtin K, Wolff RK, Herrick JS, Abo R, Slattery ML. Exploring multilocus associations of inflammation genes and colorectal cancer risk using hapConstructor. BMC MEDICAL GENETICS 2010; 11:170. [PMID: 21129206 PMCID: PMC3006374 DOI: 10.1186/1471-2350-11-170] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2009] [Accepted: 12/03/2010] [Indexed: 02/05/2023]
Abstract
BACKGROUND In candidate-gene association studies of single nucleotide polymorphisms (SNPs), multilocus analyses are frequently of high dimensionality when considering haplotypes or haplotype pairs (diplotypes) and differing modes of expression. Often, while candidate genes are selected based on their biological involvement in a given pathway, little is known about the functionality of SNPs to guide association studies. Investigators face the challenge of exploring multiple SNP models to elucidate which variants, independently or in combination, might be associated with a disease of interest. A data mining module, hapConstructor (freely-available in Genie software) performs systematic construction and association testing of multilocus genotype data in a Monte Carlo framework. Our objective was to assess its utility to guide statistical analyses of haplotypes within a candidate region (or combined genotypes across candidate genes) beyond that offered by a standard logistic regression approach. METHODS We applied the hapConstructor method to a multilocus investigation of candidate genes involved in pro-inflammatory cytokine IL6 production, IKBKB, IL6, and NFKB1 (16 SNPs total) hypothesized to operate together to alter colorectal cancer risk. Data come from two U.S. multicenter studies, one of colon cancer (1,556 cases and 1,956 matched controls) and one of rectal cancer (754 cases and 959 matched controls). RESULTS hapConstructor enabled us to identify important associations that were further analyzed in logistic regression models to simultaneously adjust for confounders. The most significant finding (nominal P = 0.0004; false discovery rate q = 0.037) was a combined genotype association across IKBKB SNP rs5029748 (1 or 2 variant alleles), IL6 rs1800797 (1 or 2 variant alleles), and NFKB1 rs4648110 (2 variant alleles) which conferred an ~80% decreased risk of colon cancer. CONCLUSIONS Strengths of hapConstructor were: systematic identification of multiple loci within and across genes important in CRC risk; false discovery rate assessment; and efficient guidance of subsequent logistic regression analyses.
Collapse
Affiliation(s)
- Karen Curtin
- Epidemiology, Department of Internal Medicine, University of Utah Health Sciences Center, Salt Lake City, Utah, USA.
| | | | | | | | | |
Collapse
|
5449
|
Nagaraj SH, Ingham A, Reverter A. The interplay between evolution, regulation and tissue specificity in the Human Hereditary Diseasome. BMC Genomics 2010; 11 Suppl 4:S23. [PMID: 21143807 PMCID: PMC3005915 DOI: 10.1186/1471-2164-11-s4-s23] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Human disease genes can be distinguished from essential (embryonically lethal) and non-disease genes using gene attributes. Such attributes include gene age, tissue specificity of expression, regulatory capacity, sequence length, rate of sequence variation and capacity for interaction. The resulting information has been used to inform data mining approaches seeking to identify novel disease genes. Given the dynamic nature of this field and the rapid rise in relevant information, we have chosen to perform a single integrated mining approach to explore relationships among gene attributes and thereby characterise evolutionary trends associated with disease genes. Results All against all cross comparison of 2,522 disease gene attributes revealed significant relationships existed between the age, disease-association and expression pattern of genes and the tissues within which they are expressed. We found that the over-representation of disease genes among old genes holds for tissue-specific genes, but the correlation between age and disease association vanished when conditioning on tissue-specificity. Of the 32 tissues studied, the genes expressed in pancreas are on average older than the genes expressed in any other tissue, while the testis expressed the lowest proportion of old genes. Following a focussed analysis on the impact of regulatory apparatus on evolution of disease genes, we show that regulators, comprising transcription factors and post-translation modified proteins, are over-represented among ancient disease genes. In addition, we show that the proportion of regulator genes is affected by gene age among disease genes and by tissue-specificity among non-disease genes. Finally, using 55,606 true positive gene interaction data, we find that old disease genes interacts with other old disease genes and interacting new genes interacts with genes originating from higher phylostrata. Conclusion This study supports the non-random nature of the human diseasome. We have identified a variety of distinct features and correlations to other molecular attributes that can be used to distinguish the set of disease causing genes. This was achieved by harnessing the power of mining large scale datasets from OMIM and other databases. Ultimately such knowledge may contribute to the identification of novel human disease genes and an enhanced understanding of human biology.
Collapse
Affiliation(s)
- Shivashankar H Nagaraj
- CSIRO Livestock Industries, Queensland Bioscience Precinct, St. Lucia, Queensland, Australia.
| | | | | |
Collapse
|
5450
|
Taniguchi M, Penner GB, Beauchemin KA, Oba M, Guan LL. Comparative analysis of gene expression profiles in ruminal tissue from Holstein dairy cows fed high or low concentrate diets. COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY D-GENOMICS & PROTEOMICS 2010; 5:274-9. [DOI: 10.1016/j.cbd.2010.07.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2010] [Revised: 07/23/2010] [Accepted: 07/25/2010] [Indexed: 12/14/2022]
|