1
|
Doga H, Raubenolt B, Cumbo F, Joshi J, DiFilippo FP, Qin J, Blankenberg D, Shehab O. A Perspective on Protein Structure Prediction Using Quantum Computers. J Chem Theory Comput 2024. [PMID: 38703105 DOI: 10.1021/acs.jctc.4c00067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/06/2024]
Abstract
Despite the recent advancements by deep learning methods such as AlphaFold2, in silico protein structure prediction remains a challenging problem in biomedical research. With the rapid evolution of quantum computing, it is natural to ask whether quantum computers can offer some meaningful benefits for approaching this problem. Yet, identifying specific problem instances amenable to quantum advantage and estimating the quantum resources required are equally challenging tasks. Here, we share our perspective on how to create a framework for systematically selecting protein structure prediction problems that are amenable for quantum advantage, and estimate quantum resources for such problems on a utility-scale quantum computer. As a proof-of-concept, we validate our problem selection framework by accurately predicting the structure of a catalytic loop of the Zika Virus NS3 Helicase, on quantum hardware.
Collapse
Affiliation(s)
- Hakan Doga
- IBM Quantum, Almaden Research Center, San Jose, California 95120, United States
| | - Bryan Raubenolt
- Center for Computational Life Sciences, Lerner Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Fabio Cumbo
- Center for Computational Life Sciences, Lerner Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Jayadev Joshi
- Center for Computational Life Sciences, Lerner Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Frank P DiFilippo
- Center for Computational Life Sciences, Lerner Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Jun Qin
- Center for Computational Life Sciences, Lerner Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Daniel Blankenberg
- Center for Computational Life Sciences, Lerner Research Institute, The Cleveland Clinic, Cleveland, Ohio 44106, United States
| | - Omar Shehab
- IBM Quantum, IBM Thomas J Watson Research Center, Yorktown Heights, New York 10598, United States
| |
Collapse
|
2
|
Abimannan T, Parthibane V, Le SH, Vijaykrishna N, Fox SD, Karim B, Kunduri G, Blankenberg D, Andresson T, Bamba T, Acharya U, Acharya JK. Sphingolipid biosynthesis is essential for metabolic rewiring during T H17 cell differentiation. Sci Adv 2024; 10:eadk1045. [PMID: 38657065 PMCID: PMC11042737 DOI: 10.1126/sciadv.adk1045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 03/22/2024] [Indexed: 04/26/2024]
Abstract
T helper 17 (TH17) cells are implicated in autoimmune diseases, and several metabolic processes are shown to be important for their development and function. In this study, we report an essential role for sphingolipids synthesized through the de novo pathway in TH17 cell development. Deficiency of SPTLC1, a major subunit of serine palmitoyl transferase enzyme complex that catalyzes the first and rate-limiting step of de novo sphingolipid synthesis, impaired glycolysis in differentiating TH17 cells by increasing intracellular reactive oxygen species (ROS) through enhancement of nicotinamide adenine dinucleotide phosphate oxidase 2 activity. Increased ROS leads to impaired activation of mammalian target of rapamycin C1 and reduced expression of hypoxia-inducible factor 1-alpha and c-Myc-induced glycolytic genes. SPTLCI deficiency protected mice from developing experimental autoimmune encephalomyelitis and experimental T cell transfer colitis. Our results thus show a critical role for de novo sphingolipid biosynthetic pathway in shaping adaptive immune responses with implications in autoimmune diseases.
Collapse
Affiliation(s)
| | - Velayoudame Parthibane
- Cancer and Developmental Biology Laboratory, National Cancer Institute, Frederick, MD, USA
| | - Si-Hung Le
- Division of Metabolomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Nagampalli Vijaykrishna
- Genomic Medicine Institute and Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Stephen D. Fox
- Mass Spectrometry Group, National Cancer Institute, Frederick, MD, USA
| | - Baktiar Karim
- Molecular Histopathology Laboratory, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Govind Kunduri
- Cancer and Developmental Biology Laboratory, National Cancer Institute, Frederick, MD, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute and Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | | | - Takeshi Bamba
- Division of Metabolomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Usha Acharya
- Cancer and Developmental Biology Laboratory, National Cancer Institute, Frederick, MD, USA
| | - Jairaj K. Acharya
- Cancer and Developmental Biology Laboratory, National Cancer Institute, Frederick, MD, USA
| |
Collapse
|
3
|
Kaur H, Minchella P, Alvarez-Carbonell D, Purandare N, Nagampalli VK, Blankenberg D, Hulgan T, Gerschenson M, Karn J, Aras S, Kallianpur AR. Contemporary Antiretroviral Therapy Dysregulates Iron Transport and Augments Mitochondrial Dysfunction in HIV-Infected Human Microglia and Neural-Lineage Cells. Int J Mol Sci 2023; 24:12242. [PMID: 37569616 PMCID: PMC10419149 DOI: 10.3390/ijms241512242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/19/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
HIV-associated cognitive dysfunction during combination antiretroviral therapy (cART) involves mitochondrial dysfunction, but the impact of contemporary cART on chronic metabolic changes in the brain and in latent HIV infection is unclear. We interrogated mitochondrial function in a human microglia (hμglia) cell line harboring inducible HIV provirus and in SH-SY5Y cells after exposure to individual antiretroviral drugs or cART, using the MitoStress assay. cART-induced changes in protein expression, reactive oxygen species (ROS) production, mitochondrial DNA copy number, and cellular iron were also explored. Finally, we evaluated the ability of ROS scavengers or plasmid-mediated overexpression of the antioxidant iron-binding protein, Fth1, to reverse mitochondrial defects. Contemporary antiretroviral drugs, particularly bictegravir, depressed multiple facets of mitochondrial function by 20-30%, with the most pronounced effects in latently infected HIV+ hμglia and SH-SY5Y cells. Latently HIV-infected hμglia exhibited upregulated glycolysis. Increases in total and/or mitochondrial ROS, mitochondrial DNA copy number, and cellular iron accompanied mitochondrial defects in hμglia and SH-SY5Y cells. In SH-SY5Y cells, cART reduced mitochondrial iron-sulfur-cluster-containing supercomplex and subunit expression and increased Nox2 expression. Fth1 overexpression or pre-treatment with N-acetylcysteine prevented cART-induced mitochondrial dysfunction. Contemporary cART impairs mitochondrial bioenergetics in hμglia and SH-SY5Y cells, partly through cellular iron accumulation; some effects differ by HIV latency.
Collapse
Affiliation(s)
- Harpreet Kaur
- Department of Genomic Medicine, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Paige Minchella
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48202, USA
| | - David Alvarez-Carbonell
- Department of Microbiology and Molecular Biology, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Neeraja Purandare
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48202, USA
| | - Vijay K. Nagampalli
- Department of Genomic Medicine, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Daniel Blankenberg
- Department of Genomic Medicine, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Todd Hulgan
- Department of Medicine, Division of Infectious Diseases, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Mariana Gerschenson
- Department of Cell and Molecular Biology, John A. Burns School of Medicine, University of Hawaii, Honolulu, HI 96844, USA
| | - Jonathan Karn
- Department of Microbiology and Molecular Biology, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Siddhesh Aras
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48202, USA
| | - Asha R. Kallianpur
- Department of Genomic Medicine, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH 44195, USA
| |
Collapse
|
4
|
Sharma D, Singh M, Joshi J, Garg M, Chaudhary V, Blankenberg D, Chandna S, Kumar V, Rani R. Design and Synthesis of Thiazole Scaffold-Based Small Molecules as Anticancer Agents Targeting the Human Lactate Dehydrogenase A Enzyme. ACS Omega 2023; 8:17552-17562. [PMID: 37251149 PMCID: PMC10210175 DOI: 10.1021/acsomega.2c07569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 04/24/2023] [Indexed: 05/31/2023]
Abstract
A new series of thiazole central scaffold-based small molecules of hLDHA inhibitors were designed using an in silico approach. Molecular docking analysis of designed molecules with hLDHA (PDB ID: 1I10) demonstrates that Ala 29, Val 30, Arg 98, Gln 99, Gly 96, and Thr 94 possessed strong interaction with the compounds. Compounds 8a, 8b, and 8d showed good binding affinity (-8.1 to -8.8 kcal/mol), whereas an additional interaction of NO2 at the ortho position in compounds 8c with Gln 99 through hydrogen bonding enhanced the affinity to -9.8 kcal/mol. Selected high-scored compounds were synthesized and screened for hLDHA inhibitory activities and in vitro anticancer activity in six cancer cell lines. Biochemical enzyme inhibition assays showed the highest hLDHA inhibitory activity observed with compounds 8b, 8c, and 8l. Compounds 8b, 8c, 8j, 8l, and 8m depicted significant anticancer activities, exhibiting IC50 values in the range of 1.65-8.60 μM in HeLa and SiHa cervical cancer cell lines. Compounds 8j and 8m exhibited notable anticancer activity with IC50 values of 7.90 and 5.15 μM, respectively, in liver cancer cells (HepG2). Interestingly, compounds 8j and 8m did not induce noticeable toxicity in the human embryonic kidney cells (HEK293). Insilico absorption, distribution, metabolism, and excretion profiling demonstrates that the compounds possess drug-likeness, and results may pave the way for the development of novel thiazole-based biologically active small molecules for therapeutics.
Collapse
Affiliation(s)
- Dolly Sharma
- Amity
Institute of Biotechnology, Amity University, Noida 201303, Uttar Pradesh, India
- Amity
Institute of Molecular Medicine and Stem Cell Research, Amity University, Noida 201303, Uttar
Pradesh, India
| | - Mamta Singh
- Amity
Institute of Molecular Medicine and Stem Cell Research, Amity University, Noida 201303, Uttar
Pradesh, India
| | - Jayadev Joshi
- Genomic
Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio 44195, United States
| | - Manoj Garg
- Amity
Institute of Molecular Medicine and Stem Cell Research, Amity University, Noida 201303, Uttar
Pradesh, India
| | | | - Daniel Blankenberg
- Genomic
Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio 44195, United States
| | - Sudhir Chandna
- Institute
of Nuclear Medicine & Allied Science, Defense Research Development Organization, Delhi 110054, India
| | - Vinit Kumar
- Amity
Institute of Molecular Medicine and Stem Cell Research, Amity University, Noida 201303, Uttar
Pradesh, India
| | - Reshma Rani
- Drug Discovery,
Jubilant Biosys, Knowledge
Park-2, Greater Noida 201306, India
| |
Collapse
|
5
|
Macnee M, Pérez-Palma E, Brünger T, Klöckner C, Platzer K, Stefanski A, Montanucci L, Bayat A, Radtke M, Collins RL, Talkowski M, Blankenberg D, Møller RS, Lemke JR, Nothnagel M, May P, Lal D. CNV-ClinViewer: enhancing the clinical interpretation of large copy-number variants online. Bioinformatics 2023; 39:btad290. [PMID: 37104749 PMCID: PMC10174702 DOI: 10.1093/bioinformatics/btad290] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 01/17/2023] [Accepted: 04/26/2023] [Indexed: 04/29/2023] Open
Abstract
MOTIVATION Pathogenic copy-number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype-phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that require the integration and analysis of information from multiple scattered sources by experts. RESULTS Here, we introduce the CNV-ClinViewer, an open-source web application for clinical evaluation and visual exploration of CNVs. The application enables real-time interactive exploration of large CNV datasets in a user-friendly designed interface and facilitates semi-automated clinical CNV interpretation following the ACMG guidelines by integrating the ClassifCNV tool. In combination with clinical judgment, the application enables clinicians and researchers to formulate novel hypotheses and guide their decision-making process. Subsequently, the CNV-ClinViewer enhances for clinical investigators' patient care and for basic scientists' translational genomic research. AVAILABILITY AND IMPLEMENTATION The web application is freely available at https://cnv-ClinViewer.broadinstitute.org and the open-source code can be found at https://github.com/LalResearchGroup/CNV-clinviewer.
Collapse
Affiliation(s)
- Marie Macnee
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany
| | - Eduardo Pérez-Palma
- Universidad del Desarrollo, Centro de Genética y Genómica, Facultad de Medicina Clínica Alemana, Santiago, Chile
| | - Tobias Brünger
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany
| | - Chiara Klöckner
- Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Germany
| | - Konrad Platzer
- Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Germany
| | - Arthur Stefanski
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Ludovica Montanucci
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Allan Bayat
- Department of Epilepsy Genetics and Personalized Medicine, Member of ERN Epicare, Danish Epilepsy Centre, Dianalund, Denmark
- Department of Regional Health Research, Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark
| | - Maximilian Radtke
- Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Germany
| | - Ryan L Collins
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Michael Talkowski
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Rikke S Møller
- Department of Epilepsy Genetics and Personalized Medicine, Member of ERN Epicare, Danish Epilepsy Centre, Dianalund, Denmark
- Department of Regional Health Research, Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark
| | - Johannes R Lemke
- Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Germany
| | - Michael Nothnagel
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany
- University Hospital Cologne, Cologne, Germany
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Dennis Lal
- Cologne Center for Genomics (CCG), University of Cologne, Cologne, Germany
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA
| |
Collapse
|
6
|
Virmani R, Pradhan P, Joshi J, Wang AL, Joshi HC, Sajid A, Singh A, Sharma V, Kundu B, Blankenberg D, Molle V, Singh Y, Arora G. Phosphorylation-mediated regulation of the Bacillus anthracis phosphoglycerate mutase by the Ser/Thr protein kinase PrkC. Biochem Biophys Res Commun 2023; 665:88-97. [PMID: 37149987 DOI: 10.1016/j.bbrc.2023.04.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 04/15/2023] [Indexed: 05/09/2023]
Abstract
Bacillus anthracis Ser/Thr protein kinase PrkC is necessary for phenotypic memory and spore germination, and the loss of PrkC-dependent phosphorylation events affect the spore development. During sporulation, Bacillus sp. can store 3-Phosphoglycerate (3-PGA) that will be required at the onset of germination when ATP will be necessary. The Phosphoglycerate mutase (Pgm) catalyzes the isomerization of 2-PGA and 3-PGA and is important for spore germination as a key metabolic enzyme that maintains 3-PGA pool at later events. Therefore, regulation of Pgm is important for an efficient spore germination process and metabolic switching. While the increased expression of Pgm in B. anthracis decreases spore germination efficiency, it remains unexplored if PrkC could directly influence Pgm activity. Here, we report the phosphorylation and regulation of Pgm by PrkC and its impact on Pgm stability and catalytic activity. Mass spectrometry revealed Pgm phosphorylation on seven threonine residues. In silico mutational analysis highlighted the role of Thr459 residue towards metal and substrate binding. Altogether, we demonstrated that PrkC-mediated Pgm phosphorylation negatively regulates its activity that is essential to maintain Pgm in its apo-like isoform before germination. This study advances the role of Pgm regulation that represents an important switch for B. anthracis resumption of metabolism and spore germination.
Collapse
Affiliation(s)
- Richa Virmani
- Department of Zoology, University of Delhi, Delhi, 110007, India
| | - Prashant Pradhan
- Kusuma School of Biological Sciences, IIT Delhi, Hauz Khas, New Delhi, 110016, India
| | - Jayadev Joshi
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA
| | - Avril Luyang Wang
- Department of Molecular Genetics and Microbiology, University of Toronto, Toronto, M5S1A8, Canada
| | | | - Andaleeb Sajid
- Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, 06520, USA
| | - Anoop Singh
- Department of Zoology, University of Delhi, Delhi, 110007, India
| | - Vishal Sharma
- Department of Zoology, University of Delhi, Delhi, 110007, India
| | - Bishwajit Kundu
- Kusuma School of Biological Sciences, IIT Delhi, Hauz Khas, New Delhi, 110016, India
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA
| | - Virginie Molle
- Laboratory of Pathogen Host Interactions, Université de Montpellier, CNRS, UMR, 5235, Montpellier, France
| | - Yogendra Singh
- Department of Zoology, University of Delhi, Delhi, 110007, India.
| | - Gunjan Arora
- Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, 06520, USA.
| |
Collapse
|
7
|
Reich M, Tabor T, Liefeld J, Joshi J, Kim F, Thorvaldsdottir H, Blankenberg D, Mesirov JP. Genomics to Notebook (g2nb): extending the electronic notebook to address the challenges of bioinformatics analysis. bioRxiv 2023:2023.04.04.535621. [PMID: 37066251 PMCID: PMC10104038 DOI: 10.1101/2023.04.04.535621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
We present Genomics to Notebook (g2nb), an environment that combines the JupyterLab notebook system with widely-used bioinformatics platforms. Galaxy, GenePattern, and the JavaScript versions of IGV and Cytoscape are currently available within g2nb. The analyses and visualizations within those platforms are presented as cells in a notebook, making thousands of genomics methods available within the notebook metaphor and allowing notebooks to contain workflows utilizing multiple software packages on remote servers, all without the need for programming. The g2nb environment is, to our knowledge, the only notebook-based system that incorporates multiple bioinformatics analysis platforms into a notebook interface.
Collapse
Affiliation(s)
- Michael Reich
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Thorin Tabor
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - John Liefeld
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Jayadev Joshi
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Forrest Kim
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
| | | | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Jill P Mesirov
- Department of Medicine, School of Medicine, University of California, San Diego, La Jolla, CA, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
8
|
Reich MM, Tabor T, Liefeld J, Joshi J, Kim F, Thorvaldsdottir H, Blankenberg D, Mesirov JP. Abstract 2073: Genomics to Notebook (g2nb): Extending the electronic notebook to address the needs of cancer bioinformatics. Cancer Res 2023. [DOI: 10.1158/1538-7445.am2023-2073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]
Abstract
Abstract
As the availability of genomic data and analysis tools from large-scale cancer initiatives continues to increase, with single-cell studies adding new dimensions to the potential scientific insights, the need has become more urgent for a software environment that supports the rapid pace of cancer data science. The The Jupyter Notebook environment has become the de facto medium for this purpose due to its ease in combining scientific exposition with executable code to form a single reproducible “research narrative” document. However, analyses are often compute-intensive, requiring more resources than are frequently available within a notebook environment running on a desktop or laptop computer. Additionally, thousands of tools, modules, and plugins are readily available on integrative software platforms such as Galaxy, GenePattern, and Cytoscape, outside the notebook paradigm. Finally, many biomedical investigators lack the programming expertise required to fully realize the benefits and utility of the notebook metaphor. To address these issues, we have released Genomics to Notebook (g2nb), which builds on JupyterLab to add access to bioinformatics platforms and other functionality for the non-programmer through the components described below, while retaining all programmatic features of JupyterLab.The g2nb environment incorporates bioinformatics software platforms within the notebook interface, allowing a single notebook to contain a workflow spanning multiple tools and servers. When run, the entire analysis appears to execute seamlessly within the notebook. To achieve this, we developed a new analysis cell type that provides an interface within the notebook to tools that are hosted on a remote Galaxy or GenePattern server. Analysis cells present a web form-like interface, similar to that of the original platforms, requiring an investigator to provide only the input parameters and data. The popular visualization tools Cytoscape and Integrative Genomics Viewer (IGV) are also supported in their web-based formats as notebook cells, with additional platforms and visualizers added regularly. The g2nb environment is freely available at the g2nb workspace, http://g2nb.org, where scientists can use all of the g2nb functionality with only a web browser. Those who wish to use g2nb locally can use the provided Docker container or install the packages via the conda or pip package managers. The online workspace also includes a library of featured genomic analysis notebooks, including templates for common analysis tasks as well as cancer-specific research scenarios and compute-intensive methods. Scientists can easily copy these notebooks, use them as is, or adapt them for their research purposes.
Citation Format: Michael M. Reich, Thorin Tabor, John Liefeld, Jayadev Joshi, Forrest Kim, Helga Thorvaldsdottir, Daniel Blankenberg, Jill P. Mesirov. Genomics to Notebook (g2nb): Extending the electronic notebook to address the needs of cancer bioinformatics [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 2073.
Collapse
|
9
|
Apollonio N, Blankenberg D, Cumbo F, Franciosa PG, Santoni D. Evaluating homophily in networks via HONTO (HOmophily Network TOol): a case study of chromosomal interactions in human PPI networks. Bioinformatics 2023; 39:6849517. [PMID: 36440918 PMCID: PMC9805585 DOI: 10.1093/bioinformatics/btac763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 11/04/2022] [Accepted: 11/24/2022] [Indexed: 11/30/2022] Open
Abstract
SUMMARY It has been observed in different kinds of networks, such as social or biological ones, a typical behavior inspired by the general principle 'similarity breeds connections'. These networks are defined as homophilic as nodes belonging to the same class preferentially interact with each other. In this work, we present HONTO (HOmophily Network TOol), a user-friendly open-source Python3 package designed to evaluate and analyze homophily in complex networks. The tool takes in input from the network along with a partition of its nodes into classes and yields a matrix whose entries are the homophily/heterophily z-score values. To complement the analysis, the tool also provides z-score values of nodes that do not interact with any other node of the same class. Homophily/heterophily z-scores values are presented as a heatmap allowing a visual at-a-glance interpretation of results. AVAILABILITY AND IMPLEMENTATION Tool's source code is available at https://github.com/cumbof/honto under the MIT license, installable as a package from PyPI (pip install honto) and conda-forge (conda install -c conda-forge honto), and has a wrapper for the Galaxy platform available on the official Galaxy ToolShed (Blankenberg et al., 2014) at https://toolshed.g2.bx.psu.edu/view/fabio/honto.
Collapse
Affiliation(s)
- Nicola Apollonio
- Institute for Applied Mathematics “Mauro Picone”, National Research Council of Italy, Rome 00185, Italy
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Fabio Cumbo
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | | | - Daniele Santoni
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council of Italy, Rome 00185, Italy
| |
Collapse
|
10
|
Vasu K, Khan D, Ramachandiran I, Blankenberg D, Fox P. Analysis of nested alternate open reading frames and their encoded proteins. NAR Genom Bioinform 2022; 4:lqac076. [PMID: 36267124 PMCID: PMC9580016 DOI: 10.1093/nargab/lqac076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 08/14/2022] [Accepted: 09/27/2022] [Indexed: 11/22/2022] Open
Abstract
Transcriptional and post-transcriptional mechanisms diversify the proteome beyond gene number, while maintaining a sequence relationship between original and altered proteins. A new mechanism breaks this paradigm, generating novel proteins by translating alternative open reading frames (Alt-ORFs) within canonical host mRNAs. Uniquely, ‘alt-proteins’ lack sequence homology with host ORF-derived proteins. We show global amino acid frequencies, and consequent biochemical characteristics of Alt-ORFs nested within host ORFs (nAlt-ORFs), are genetically-driven, and predicted by summation of frequencies of hundreds of encompassing host codon-pairs. Analysis of 101 human nAlt-ORFs of length ≥150 codons confirms the theoretical predictions, revealing an extraordinarily high median isoelectric point (pI) of 11.68, due to anomalous charged amino acid levels. Also, nAlt-ORF proteins exhibit a >2-fold preference for reading frame 2 versus 3, predicted mitochondrial and nuclear localization, and elevated codon adaptation index indicative of natural selection. Our results provide a theoretical and conceptual framework for exploration of these largely unannotated, but potentially significant, alternative ORFs and their encoded proteins.
Collapse
Affiliation(s)
- Kommireddy Vasu
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Debjit Khan
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Iyappan Ramachandiran
- Department of Cardiovascular and Metabolic Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Daniel Blankenberg
- Correspondence may also be addressed to Daniel Blankenberg. Tel: +1 216 444 4336;
| | - Paul L Fox
- To whom correspondence should be addressed. Tel: +1 216 444 8053; Fax: +1 216 444 9404;
| |
Collapse
|
11
|
Kunduri G, Le SH, Baena V, Vijaykrishna N, Harned A, Nagashima K, Blankenberg D, Yoshihiro I, Narayan K, Bamba T, Acharya U, Acharya JK. Delivery of ceramide phosphoethanolamine lipids to the cleavage furrow through the endocytic pathway is essential for male meiotic cytokinesis. PLoS Biol 2022; 20:e3001599. [PMID: 36170207 PMCID: PMC9550178 DOI: 10.1371/journal.pbio.3001599] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 10/10/2022] [Accepted: 08/02/2022] [Indexed: 11/18/2022] Open
Abstract
Cell division, wherein 1 cell divides into 2 daughter cells, is fundamental to all living organisms. Cytokinesis, the final step in cell division, begins with the formation of an actomyosin contractile ring, positioned midway between the segregated chromosomes. Constriction of the ring with concomitant membrane deposition in a specified spatiotemporal manner generates a cleavage furrow that physically separates the cytoplasm. Unique lipids with specific biophysical properties have been shown to localize to intercellular bridges (also called midbody) connecting the 2 dividing cells; however, their biological roles and delivery mechanisms remain largely unknown. In this study, we show that ceramide phosphoethanolamine (CPE), the structural analog of sphingomyelin, has unique acyl chain anchors in Drosophila spermatocytes and is essential for meiotic cytokinesis. The head group of CPE is also important for spermatogenesis. We find that aberrant central spindle and contractile ring behavior but not mislocalization of phosphatidylinositol phosphates (PIPs) at the plasma membrane is responsible for the male meiotic cytokinesis defect in CPE-deficient animals. Further, we demonstrate the enrichment of CPE in multivesicular bodies marked by Rab7, which in turn localize to cleavage furrow. Volume electron microscopy analysis using correlative light and focused ion beam scanning electron microscopy shows that CPE-enriched Rab7 positive endosomes are juxtaposed on contractile ring material. Correlative light and transmission electron microscopy reveal Rab7 positive endosomes as a multivesicular body-like organelle that releases its intraluminal vesicles in the vicinity of ingressing furrows. Genetic ablation of Rab7 or Rab35 or expression of dominant negative Rab11 results in significant meiotic cytokinesis defects. Further, we show that Rab11 function is required for localization of CPE positive endosomes to the cleavage furrow. Our results imply that endosomal delivery of CPE to ingressing membranes is crucial for meiotic cytokinesis. During cytokinesis, it is known that unique lipids with specific biophysical properties are localized to intercellular bridges connecting the two dividing cells, but how does this occur? This study shows that multivesicular bodies deliver sphingolipids to the ingressing membranes during male meiotic cytokinesis.
Collapse
Affiliation(s)
- Govind Kunduri
- Cancer and Developmental Biology Laboratory, National Cancer Institute, Frederick, Maryland, United States of America
- * E-mail: (GK); (UA); (JKA)
| | - Si-Hung Le
- Division of Metabolomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Valentina Baena
- Center for Molecular Microscopy, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America
| | - Nagampalli Vijaykrishna
- Genomic Medicine Institute and Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Adam Harned
- Center for Molecular Microscopy, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America
| | - Kunio Nagashima
- Center for Molecular Microscopy, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America
| | - Daniel Blankenberg
- Genomic Medicine Institute and Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Izumi Yoshihiro
- Division of Metabolomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Kedar Narayan
- Center for Molecular Microscopy, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
- Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America
| | - Takeshi Bamba
- Division of Metabolomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
| | - Usha Acharya
- Cancer and Developmental Biology Laboratory, National Cancer Institute, Frederick, Maryland, United States of America
- * E-mail: (GK); (UA); (JKA)
| | - Jairaj K. Acharya
- Cancer and Developmental Biology Laboratory, National Cancer Institute, Frederick, Maryland, United States of America
- * E-mail: (GK); (UA); (JKA)
| |
Collapse
|
12
|
Dubey R, Patra AK, Joshi J, Blankenberg D. Evaluation of vertical and horizontal distribution of particulate matter near an urban roadway using an unmanned aerial vehicle. Sci Total Environ 2022; 836:155600. [PMID: 35504396 DOI: 10.1016/j.scitotenv.2022.155600] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 04/19/2022] [Accepted: 04/26/2022] [Indexed: 06/14/2023]
Abstract
Measurement of traffic emissions has gained a lot of interest in recent times due to its contribution to urban pollution. This paper reports the outcome from an unmanned aerial vehicle (UAV) based measurement of PM concentration near an urban roadway at Kolkata, India. A total of 54 flights were carried out for simultaneous measurements of PM1, PM2.5 and PM10 mass concentration and meteorological parameters in vertical as well as in horizontal direction. Results for the vertical flight up to 100 m showed that the PM1, PM2.5 and PM10 concentrations at higher altitudes are less (mean; 24.6, 39.9 and 103.8 μg m-3) compared to the respective ground level concentrations (mean; 26.3, 50.4 and 201.9 μg m-3). For all the three particle sizes, the majority of the cases of higher PM concentration at higher altitudes happened during the evening flight. Low mixing height and low wind speed are suggested to be the reasons for the poor dispersion of pollutants in the evening. While there was a 7-10% fall of fine particles (PM1 and PM2.5) mass concentrations up to 90 m away from the road, no trend could be seen for PM10. The random forest model to predict the UAV/Ground concentration ratio showed high accuracy (R2 = 0.82-0.95) for all three particle sizes. This is an important finding from this study, which shows how UAV measurement data can be used to generate models that can predict the higher altitude concentrations from the ground based measurements.
Collapse
Affiliation(s)
- Ravish Dubey
- School of Environmental Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Aditya Kumar Patra
- School of Environmental Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India; Department of Mining Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India.
| | - Jayadev Joshi
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | | |
Collapse
|
13
|
Afgan E, Nekrutenko A, Grüning BA, Blankenberg D, Goecks J, Schatz MC, Ostrovsky AE, Mahmoud A, Lonie AJ, Syme A, Fouilloux A, Bretaudeau A, Nekrutenko A, Kumar A, Eschenlauer AC, DeSanto AD, Guerler A, Serrano-Solano B, Batut B, Grüning BA, Langhorst BW, Carr B, Raubenolt BA, Hyde CJ, Bromhead CJ, Barnett CB, Royaux C, Gallardo C, Blankenberg D, Fornika DJ, Baker D, Bouvier D, Clements D, de Lima Morais DA, Tabernero DL, Lariviere D, Nasr E, Afgan E, Zambelli F, Heyl F, Psomopoulos F, Coppens F, Price GR, Cuccuru G, Corguillé GL, Von Kuster G, Akbulut GG, Rasche H, Hotz HR, Eguinoa I, Makunin I, Ranawaka IJ, Taylor JP, Joshi J, Hillman-Jackson J, Goecks J, Chilton JM, Kamali K, Suderman K, Poterlowicz K, Yvan LB, Lopez-Delisle L, Sargent L, Bassetti ME, Tangaro MA, van den Beek M, Čech M, Bernt M, Fahrner M, Tekman M, Föll MC, Schatz MC, Crusoe MR, Roncoroni M, Kucher N, Coraor N, Stoler N, Rhodes N, Soranzo N, Pinter N, Goonasekera NA, Moreno PA, Videm P, Melanie P, Mandreoli P, Jagtap PD, Gu Q, Weber RJM, Lazarus R, Vorderman RHP, Hiltemann S, Golitsynskiy S, Garg S, Bray SA, Gladman SL, Leo S, Mehta SP, Griffin TJ, Jalili V, Yves V, Wen V, Nagampalli VK, Bacon WA, de Koning W, Maier W, Briggs PJ. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022; 50:W345-W351. [PMID: 35446428 PMCID: PMC9252830 DOI: 10.1093/nar/gkac247] [Citation(s) in RCA: 235] [Impact Index Per Article: 117.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/17/2022] [Accepted: 03/30/2022] [Indexed: 01/19/2023] Open
Abstract
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.
Collapse
|
14
|
Oh S, Geistlinger L, Ramos M, Blankenberg D, van den Beek M, Taroni JN, Carey VJ, Greene CS, Waldron L, Davis S. GenomicSuperSignature facilitates interpretation of RNA-seq experiments through robust, efficient comparison to public databases. Nat Commun 2022; 13:3695. [PMID: 35760813 PMCID: PMC9237024 DOI: 10.1038/s41467-022-31411-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 06/14/2022] [Indexed: 02/04/2023] Open
Abstract
Millions of transcriptomic profiles have been deposited in public archives, yet remain underused for the interpretation of new experiments. We present a method for interpreting new transcriptomic datasets through instant comparison to public datasets without high-performance computing requirements. We apply Principal Component Analysis on 536 studies comprising 44,890 human RNA sequencing profiles and aggregate sufficiently similar loading vectors to form Replicable Axes of Variation (RAV). RAVs are annotated with metadata of originating studies and by gene set enrichment analysis. Functionality to associate new datasets with RAVs, extract interpretable annotations, and provide intuitive visualization are implemented as the GenomicSuperSignature R/Bioconductor package. We demonstrate the efficient and coherent database search, robustness to batch effects and heterogeneous training data, and transfer learning capacity of our method using TCGA and rare diseases datasets. GenomicSuperSignature aids in analyzing new gene expression data in the context of existing databases using minimal computing resources.
Collapse
Affiliation(s)
- Sehyun Oh
- grid.212340.60000000122985718Graduate School of Public Health and Health Policy and Institute for Implementation Sciences in Public Health, City University of New York, New York, NY USA
| | - Ludwig Geistlinger
- grid.38142.3c000000041936754XCenter for Computational Biomedicine, Harvard Medical School, Boston, MA USA
| | - Marcel Ramos
- grid.212340.60000000122985718Graduate School of Public Health and Health Policy and Institute for Implementation Sciences in Public Health, City University of New York, New York, NY USA
| | - Daniel Blankenberg
- grid.239578.20000 0001 0675 4725Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH USA ,grid.67105.350000 0001 2164 3847Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH USA
| | - Marius van den Beek
- grid.29857.310000 0001 2097 4281The Pennsylvania State University, State College, PA USA
| | - Jaclyn N. Taroni
- grid.430722.0Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, Bala Cynwyd, PA USA
| | - Vincent J. Carey
- grid.38142.3c000000041936754XChanning Division of Network Medicine, Mass General Brigham, Harvard Medical School, Boston, MA USA
| | - Casey S. Greene
- grid.241116.10000000107903411Center for Health AI, University of Colorado Anschutz School of Medicine, Denver, CO USA
| | - Levi Waldron
- grid.212340.60000000122985718Graduate School of Public Health and Health Policy and Institute for Implementation Sciences in Public Health, City University of New York, New York, NY USA
| | - Sean Davis
- grid.241116.10000000107903411Center for Health AI, University of Colorado Anschutz School of Medicine, Denver, CO USA
| |
Collapse
|
15
|
Sharma V, Joshi J, Yeh IJ, Doughman Y, Blankenberg D, Wald D, Montano MM. Re-Expression of ERα and AR in Receptor Negative Endocrine Cancers via GSK3 Inhibition. Front Oncol 2022; 12:824594. [PMID: 35402240 PMCID: PMC8988137 DOI: 10.3389/fonc.2022.824594] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 02/28/2022] [Indexed: 01/04/2023] Open
Abstract
DNA methylation, catalyzed by DNA methyltransferase (DNMT), is a well-characterized epigenetic modification in cancer cells. In particular, promoter hypermethylation of AR and ESR1 results in loss of expression on Androgen Receptor (AR) and Estrogen Receptor (ER), respectively, and is associated with a hormone refractory state. We now report that Glycogen Synthase Kinase 3 (GSK3) phosphorylates DNMT1 at S714, which is localized to a 62 amino acid region referred to as auto-inhibitory linker, which functions to occlude the DNA from the active site of DNMT1 to prevent the methylation of unmethylated DNA. Molecular Dynamics simulation indicates that phosphorylation at S714 resulted in conformational rearrangement of the autoinhibitory domain that inactivated its ability to block the methylation of unmethylated DNA and resulted in enhanced DNA binding. Treatment with a novel and more selective inhibitor of GSK3 resulted in decreased methylation of the promoter region of genes encoding the Androgen Receptor (AR) and Estrogen Receptor alpha (ERa) and re-expression of the AR and ERa in AR negative prostate cancer and ER negative breast cancer cells, respectively. As a result, concurrent treatment with the GSK3 inhibitor resulted in responsiveness of AR negative prostate cancer and ER negative breast cancer cells to inhibitors of the AR or ER, respectively, in in vitro and in vivo experimental models.
Collapse
Affiliation(s)
- Vikas Sharma
- Department of Pharmacology, Case Western Reserve University School of Medicine, Cleveland, OH, United States
| | - Jayadev Joshi
- Genomic Medicine Institute, Cleveland Clinic Lerner Research Institute, Cleveland, OH, United States
| | - I-Ju Yeh
- Department of Pharmacology, Case Western Reserve University School of Medicine, Cleveland, OH, United States
| | - YongQiu Doughman
- Department of Pharmacology, Case Western Reserve University School of Medicine, Cleveland, OH, United States
| | - Daniel Blankenberg
- Genomic Medicine Institute, Cleveland Clinic Lerner Research Institute, Cleveland, OH, United States
| | - David Wald
- Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH, United States
| | - Monica M. Montano
- Department of Pharmacology, Case Western Reserve University School of Medicine, Cleveland, OH, United States
- *Correspondence: Monica M. Montano,
| |
Collapse
|
16
|
VijayKrishna N, Joshi J, Coraor N, Hillman-Jackson J, Bouvier D, van den Beek M, Eguinoa I, Coppens F, Davis J, Stolarczyk M, Sheffield NC, Gladman S, Cuccuru G, Grüning B, Soranzo N, Rasche H, Langhorst BW, Bernt M, Fornika D, de Lima Morais DA, Barrette M, van Heusden P, Petrillo M, Puertas-Gallardo A, Patak A, Hotz HR, Blankenberg D. Expanding the Galaxy's reference data. Bioinform Adv 2022; 2:vbac030. [PMID: 35669346 PMCID: PMC9155181 DOI: 10.1093/bioadv/vbac030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 04/01/2022] [Accepted: 04/26/2022] [Indexed: 01/27/2023]
Abstract
Summary Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. Availability and implementation The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.
Collapse
Affiliation(s)
| | - Jayadev Joshi
- Genomic Medicine Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Nate Coraor
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA
| | - Jennifer Hillman-Jackson
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA
| | - Dave Bouvier
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA
| | - Marius van den Beek
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA
| | - Ignacio Eguinoa
- VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Frederik Coppens
- VIB Center for Plant Systems Biology, 9052 Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - John Davis
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Michał Stolarczyk
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903, USA
| | | | | | - Björn Grüning
- University of Freiburg, Freiburg im Breisgau, Germany
| | | | - Helena Rasche
- Clinical Bioinformatics Group, Department of Pathology, Erasmus Medical Center, 3015 CN Rotterdam, The Netherlands
| | | | - Matthias Bernt
- Department Computational Biology, Helmholtz Centre for Environmental Research, UFZ, 04318 Leipzig, Germany
| | - Dan Fornika
- BC Centre for Disease Control Public Health Laboratory, Vancouver, BC, Canada
| | | | - Michel Barrette
- Centre de Calcul Scientifique, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Peter van Heusden
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa
| | - Mauro Petrillo
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | | | - Alex Patak
- European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Hans-Rudolf Hotz
- Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Daniel Blankenberg
- Genomic Medicine Institute, Cleveland Clinic, Cleveland, OH 44195, USA
- Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH 44195, USA
- To whom correspondence should be addressed.
| |
Collapse
|
17
|
Ostrovsky A, Hillman-Jackson J, Bouvier D, Clements D, Afgan E, Blankenberg D, Schatz MC, Nekrutenko A, Taylor J, Team TG, Lariviere D. Using Galaxy to Perform Large-Scale Interactive Data Analyses-An Update. Curr Protoc 2021; 1:e31. [PMID: 33583104 DOI: 10.1002/cpz1.31] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Modern biology continues to become increasingly computational. Datasets are becoming progressively larger, more complex, and more abundant. The computational savviness necessary to analyze these data creates an ongoing obstacle for experimental biologists. Galaxy (galaxyproject.org) provides access to computational biology tools in a web-based interface. It also provides access to major public biological data repositories, allowing private data to be combined with public datasets. Galaxy is hosted on high-capacity servers worldwide and is accessible for free, with an option to be installed locally. This article demonstrates how to employ Galaxy to perform biologically relevant analyses on publicly available datasets. These protocols use both standard and custom tools, serving as a tutorial and jumping-off point for more intensive and/or more specific analyses using Galaxy. © 2021 Wiley Periodicals LLC. Basic Protocol 1: Finding human coding exons with highest SNP density Basic Protocol 2: Calling peaks for ChIP-seq data Basic Protocol 3: Compare datasets using genomic coordinates Basic Protocol 4: Working with multiple alignments Basic Protocol 5: Single cell RNA-seq.
Collapse
Affiliation(s)
| | | | - Dave Bouvier
- Penn State University, University Park, Pennsylvania
| | | | - Enis Afgan
- Johns Hopkins University, Baltimore, Maryland
| | | | | | | | | | - The Galaxy Team
- Johns Hopkins University, Baltimore, Maryland.,Penn State University, University Park, Pennsylvania.,Cleveland Clinic, Lerner Research Institute, Cleveland, Ohio.,Oregon Health and Science University, Portland, Oregon
| | | |
Collapse
|
18
|
Macnee M, Pérez-Palma E, Schumacher-Bass S, Dalton J, Leu C, Blankenberg D, Lal D. SimText: A text mining framework for interactive analysis and visualization of similarities among biomedical entities. Bioinformatics 2021; 37:4285-4287. [PMID: 34037702 PMCID: PMC9502138 DOI: 10.1093/bioinformatics/btab365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 04/07/2021] [Accepted: 05/24/2021] [Indexed: 11/23/2022] Open
Abstract
Summary Literature exploration in PubMed on a large number of biomedical entities (e.g. genes, diseases or experiments) can be time-consuming and challenging, especially when assessing associations between entities. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among a set of entities based on text. SimText can be used for (i) text collection from PubMed and extraction of words with different text mining approaches, and (ii) interactive analysis and visualization of data using unsupervised learning techniques in an interactive app. Availability and implementation We developed SimText as an open-source R software and integrated it into Galaxy (https://usegalaxy.eu), an online data analysis platform with supporting self-learning training material available at https://training.galaxyproject.org. A command-line version of the toolset is available for download from GitHub (https://github.com/dlal-group/simtext) or as Docker image (https://hub.docker.com/r/dlalgroup/simtext/tags.). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marie Macnee
- Cologne Center for Genomics (CCG), Medical Faculty of the University of Cologne, University Hospital of Cologne, Cologne, 50931, Germany
| | - Eduardo Pérez-Palma
- Universidad del Desarrollo, Centro de Genética y Genómica, Facultad de Medicina Clínica Alemana, Santiago, Chile
| | | | - Jarrod Dalton
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio, 44195, USA
| | - Costin Leu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA
| | - Dennis Lal
- Cologne Center for Genomics (CCG), Medical Faculty of the University of Cologne, University Hospital of Cologne, Cologne, 50931, Germany.,Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.,Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, 44195, USA
| |
Collapse
|
19
|
Tekman M, Batut B, Ostrovsky A, Antoniewski C, Clements D, Ramirez F, Etherington GJ, Hotz HR, Scholtalbers J, Manning JR, Bellenger L, Doyle MA, Heydarian M, Huang N, Soranzo N, Moreno P, Mautner S, Papatheodorou I, Nekrutenko A, Taylor J, Blankenberg D, Backofen R, Grüning B. A single-cell RNA-sequencing training and analysis suite using the Galaxy framework. Gigascience 2020; 9:5931798. [PMID: 33079170 PMCID: PMC7574357 DOI: 10.1093/gigascience/giaa102] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/30/2020] [Indexed: 11/25/2022] Open
Abstract
Background The vast ecosystem of single-cell RNA-sequencing tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically driven methods needed to process and understand these ever-growing datasets. Results Here we outline several Galaxy workflows and learning resources for single-cell RNA-sequencing, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology. The Galaxy reproducible bioinformatics framework provides tools, workflows, and trainings that not only enable users to perform 1-click 10x preprocessing but also empower them to demultiplex raw sequencing from custom tagged and full-length sequencing protocols. The downstream analysis supports a range of high-quality interoperable suites separated into common stages of analysis: inspection, filtering, normalization, confounder removal, and clustering. The teaching resources cover concepts from computer science to cell biology. Access to all resources is provided at the singlecell.usegalaxy.eu portal. Conclusions The reproducible and training-oriented Galaxy framework provides a sustainable high-performance computing environment for users to run flexible analyses on both 10x and alternative platforms. The tutorials from the Galaxy Training Network along with the frequent training workshops hosted by the Galaxy community provide a means for users to learn, publish, and teach single-cell RNA-sequencing analysis.
Collapse
Affiliation(s)
- Mehmet Tekman
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Bérénice Batut
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Alexander Ostrovsky
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Christophe Antoniewski
- ARTbio, Sorbonne Université, CNRS FR 3631, Inserm US 037, Paris, France.,Institut de Biologie Paris Seine, 9 Quai Saint-Bernard Université Pierre et Marie Curie, Campus Jussieu, Bâtiments A-B-C, 75005 Paris, France
| | - Dave Clements
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Fidel Ramirez
- Boehringer Ingelheim International GmbH, Binger Strasse 173, 55216 Ingelheim am Rhein, Biberach, Germany
| | | | - Hans-Rudolf Hotz
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Maulbeerstrasse 66, 4058 Basel, Switzerland
| | - Jelle Scholtalbers
- European Molecular Biology Laboratory, Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Jonathan R Manning
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lea Bellenger
- ARTbio, Sorbonne Université, CNRS FR 3631, Inserm US 037, Paris, France
| | - Maria A Doyle
- Research Computing Facility, Peter MacCallum Cancer Centre, Melbourne, 305 Grattan Street, Victoria 3000, Australia.,Sir Peter MacCallum Department of Oncology, The University of Melbourne, Victoria 3010, Australia
| | - Mohammad Heydarian
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Ni Huang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK
| | - Nicola Soranzo
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, UK
| | - Pablo Moreno
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Stefan Mautner
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Mudd Hall 144, 3400 N. Charles Street, Baltimore, MD 21218, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue, NB21 Cleveland, OH 44195, USA
| | - Rolf Backofen
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Björn Grüning
- Department of Bioinformatics, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| |
Collapse
|
20
|
Baker D, van den Beek M, Blankenberg D, Bouvier D, Chilton J, Coraor N, Coppens F, Eguinoa I, Gladman S, Grüning B, Keener N, Larivière D, Lonie A, Kosakovsky Pond S, Maier W, Nekrutenko A, Taylor J, Weaver S. No more business as usual: Agile and effective responses to emerging pathogen threats require open data and open analytics. PLoS Pathog 2020; 16:e1008643. [PMID: 32790776 PMCID: PMC7425854 DOI: 10.1371/journal.ppat.1008643] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The current state of much of the Wuhan pneumonia virus (severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]) research shows a regrettable lack of data sharing and considerable analytical obfuscation. This impedes global research cooperation, which is essential for tackling public health emergencies and requires unimpeded access to data, analysis tools, and computational infrastructure. Here, we show that community efforts in developing open analytical software tools over the past 10 years, combined with national investments into scientific computational infrastructure, can overcome these deficiencies and provide an accessible platform for tackling global health emergencies in an open and transparent manner. Specifically, we use all SARS-CoV-2 genomic data available in the public domain so far to (1) underscore the importance of access to raw data and (2) demonstrate that existing community efforts in curation and deployment of biomedical software can reliably support rapid, reproducible research during global health crises. All our analyses are fully documented at https://github.com/galaxyproject/SARS-CoV-2.
Collapse
Affiliation(s)
- Dannon Baker
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Marius van den Beek
- The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | | | - Dave Bouvier
- The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - John Chilton
- The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Nate Coraor
- The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Frederik Coppens
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Ignacio Eguinoa
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Simon Gladman
- University of Melbourne, Melbourne, Australia
- Queensland Cyber Infrastructure Foundation, St. Lucia, Australia
| | - Björn Grüning
- University of Freiburg, Freiburg im Breisgau, Germany
| | - Nicholas Keener
- The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Delphine Larivière
- The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | | | - Sergei Kosakovsky Pond
- Temple University, Philadelphia, Pennsylvania, United States of America
- * E-mail: (AN); (SKP)
| | | | - Anton Nekrutenko
- The Pennsylvania State University, University Park, Pennsylvania, United States of America
- * E-mail: (AN); (SKP)
| | - James Taylor
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Steven Weaver
- Temple University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
21
|
Jalili V, Afgan E, Gu Q, Clements D, Blankenberg D, Goecks J, Taylor J, Nekrutenko A. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Res 2020; 48:W395-W402. [PMID: 32479607 PMCID: PMC7319590 DOI: 10.1093/nar/gkaa434] [Citation(s) in RCA: 238] [Impact Index Per Article: 59.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/24/2020] [Accepted: 05/11/2020] [Indexed: 12/18/2022] Open
Abstract
Galaxy (https://galaxyproject.org) is a web-based computational workbench used by tens of thousands of scientists across the world to analyze large biomedical datasets. Since 2005, the Galaxy project has fostered a global community focused on achieving accessible, reproducible, and collaborative research. Together, this community develops the Galaxy software framework, integrates analysis tools and visualizations into the framework, runs public servers that make Galaxy available via a web browser, performs and publishes analyses using Galaxy, leads bioinformatics workshops that introduce and use Galaxy, and develops interactive training materials for Galaxy. Over the last two years, all aspects of the Galaxy project have grown: code contributions, tools integrated, users, and training materials. Key advances in Galaxy's user interface include enhancements for analyzing large dataset collections as well as interactive tools for exploratory data analysis. Extensions to Galaxy's framework include support for federated identity and access management and increased ability to distribute analysis jobs to remote resources. New community resources include large public servers in Europe and Australia, an increasing number of regional and local Galaxy communities, and substantial growth in the Galaxy Training Network.
Collapse
Affiliation(s)
- Vahid Jalili
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Enis Afgan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Qiang Gu
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Dave Clements
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Jeremy Goecks
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| |
Collapse
|
22
|
Jalili V, Afgan E, Gu Q, Clements D, Blankenberg D, Goecks J, Taylor J, Nekrutenko A. Corrigendum: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Res 2020; 48:8205-8207. [PMID: 32585001 PMCID: PMC7641327 DOI: 10.1093/nar/gkaa554] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Vahid Jalili
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Enis Afgan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Qiang Gu
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - Dave Clements
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Jeremy Goecks
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| |
Collapse
|
23
|
Chapman LM, Spies N, Pai P, Lim CS, Carroll A, Narzisi G, Watson CM, Proukakis C, Clarke WE, Nariai N, Dawson E, Jones G, Blankenberg D, Brueffer C, Xiao C, Kolora SRR, Alexander N, Wolujewicz P, Ahmed AE, Smith G, Shehreen S, Wenger AM, Salit M, Zook JM. A crowdsourced set of curated structural variants for the human genome. PLoS Comput Biol 2020; 16:e1007933. [PMID: 32559231 PMCID: PMC7329145 DOI: 10.1371/journal.pcbi.1007933] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 07/01/2020] [Accepted: 05/07/2020] [Indexed: 11/19/2022] Open
Abstract
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.
Collapse
Affiliation(s)
- Lesley M. Chapman
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
| | - Noah Spies
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
- The Joint Initiative for Metrology in Biology, Stanford University, Stanford, California, United States of America
- Departments of Genetics and Pathology, Stanford University, Stanford, California, United States of America
| | - Patrick Pai
- University of Maryland - College Park, College Park, Maryland, United States of America
| | - Chun Shen Lim
- Department of Biochemistry, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Andrew Carroll
- DNAnexus Inc, Mountain View, California, United States of America
| | - Giuseppe Narzisi
- New York Genome Center, New York, New York, United States of America
| | - Christopher M. Watson
- School of Medicine, University of Leeds, Saint James's University Hospital, Leeds, Leeds, United Kingdom
- Yorkshire Regional Genetics Service, The Leeds Teaching Hospitals NHS Trust, Saint James's University Hospital, Leeds, United Kingdom
| | - Christos Proukakis
- University College London, Institute of Neurology, London, United Kingdom
| | - Wayne E. Clarke
- New York Genome Center, New York, New York, United States of America
| | - Naoki Nariai
- Illumina, Inc. San Diego, California, United States of America
| | - Eric Dawson
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, United States of America
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Garan Jones
- University of Exeter Medical School, Epidemiology and Public Health Group, Barrack Road, Exeter, Devon, United Kingdom
| | - Daniel Blankenberg
- Genomic Medicine Institute Lerner Research Institute Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Christian Brueffer
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Sree Rohit Raj Kolora
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Molecular Evolution and Systematics of Animals, Institute of Biology, University of Leipzig, Leipzig, Germany
| | - Noah Alexander
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, California, United States of America
| | - Paul Wolujewicz
- Weill Cornell, Belfer Research Building, New York, New York, United States of America
| | - Azza E. Ahmed
- Center for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum and Department of Electrical and Electronic Engineering, Faculty of Engineering, University of Khartoum, Khartoum, Sudan
| | - Graeme Smith
- Guy's Hospital and St Thomas's NHS Foundation Trust Great Maze Pond, London, United Kingdom
| | - Saadlee Shehreen
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Bangladesh
| | - Aaron M. Wenger
- Pacific Biosciences, Menlo Park, California, United States of America
| | - Marc Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
- The Joint Initiative for Metrology in Biology, Stanford University, Stanford, California, United States of America
| | - Justin M. Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland, United States of America
| |
Collapse
|
24
|
Affiliation(s)
| | - John Chilton
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania, USA
| | - Nate Coraor
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
25
|
Grüning BA, Lampa S, Vaudel M, Blankenberg D. Software engineering for scientific big data analysis. Gigascience 2019; 8:giz054. [PMID: 31121028 PMCID: PMC6532757 DOI: 10.1093/gigascience/giz054] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 01/20/2019] [Accepted: 04/18/2019] [Indexed: 11/14/2022] Open
Abstract
The increasing complexity of data and analysis methods has created an environment where scientists, who may not have formal training, are finding themselves playing the impromptu role of software engineer. While several resources are available for introducing scientists to the basics of programming, researchers have been left with little guidance on approaches needed to advance to the next level for the development of robust, large-scale data analysis tools that are amenable to integration into workflow management systems, tools, and frameworks. The integration into such workflow systems necessitates additional requirements on computational tools, such as adherence to standard conventions for robustness, data input, output, logging, and flow control. Here we provide a set of 10 guidelines to steer the creation of command-line computational tools that are usable, reliable, extensible, and in line with standards of modern coding practices.
Collapse
Affiliation(s)
- Björn A Grüning
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, D-79110 Freiburg, Germany
- Center for Biological Systems Analysis (ZBSA), University of Freiburg, Habsburgerstr. 49, D-79104 Freiburg, Germany
| | - Samuel Lampa
- Pharmaceutical Bioinformatics group, Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden
- Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Svante Arrhenius vag 16C, 106 91, Solna, Sweden
| | - Marc Vaudel
- K.G. Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Postboks 7804, 5020, Bergen, Norway
- Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Postboks 7804, 5020, Bergen, Norway
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, 9500 Euclid Avenue / NE50, Cleveland, OH, USA
| |
Collapse
|
26
|
Craig SJC, Blankenberg D, Parodi ACL, Paul IM, Birch LL, Savage JS, Marini ME, Stokes JL, Nekrutenko A, Reimherr M, Chiaromonte F, Makova KD. Child Weight Gain Trajectories Linked To Oral Microbiota Composition. Sci Rep 2018; 8:14030. [PMID: 30232389 PMCID: PMC6145887 DOI: 10.1038/s41598-018-31866-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 08/27/2018] [Indexed: 12/16/2022] Open
Abstract
Gut and oral microbiota perturbations have been observed in obese adults and adolescents; less is known about their influence on weight gain in young children. Here we analyzed the gut and oral microbiota of 226 two-year-olds with 16S rRNA gene sequencing. Weight and length were measured at seven time points and used to identify children with rapid infant weight gain (a strong risk factor for childhood obesity), and to derive growth curves with innovative Functional Data Analysis (FDA) techniques. We showed that growth curves were associated negatively with diversity, and positively with the Firmicutes-to-Bacteroidetes ratio, of the oral microbiota. We also demonstrated an association between the gut microbiota and child growth, even after controlling for the effect of diet on the microbiota. Lastly, we identified several bacterial genera that were associated with child growth patterns. These results suggest that by the age of two, the oral microbiota of children with rapid infant weight gain may have already begun to establish patterns often seen in obese adults. They also suggest that the gut microbiota at age two, while strongly influenced by diet, does not harbor obesity signatures many researchers identified in later life stages.
Collapse
Affiliation(s)
- Sarah J C Craig
- Center for Medical Genomics, Penn State University, University Park, PA, 16802, USA.,Department of Biology, Penn State University, University Park, PA, 16802, USA
| | - Daniel Blankenberg
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, 16802, USA.,Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44195, USA
| | - Alice Carla Luisa Parodi
- Department of Mathematics, Politecnico di Milano, Piazza Leonardo da Vinci, 32, Milano, 20133, Italy
| | - Ian M Paul
- Center for Medical Genomics, Penn State University, University Park, PA, 16802, USA.,Department of Pediatrics, Penn State College of Medicine, 500 University Drive, Hershey, PA, 17033, USA
| | - Leann L Birch
- Department of Foods and Nutrition, 176 Dawson Hall, University of Georgia, Athens, GA, 30602, USA
| | - Jennifer S Savage
- Center for Childhood Obesity Research, Penn State University, University Park, PA, 16802, USA.,Department of Nutritional Sciences, Penn State University, University Park, PA, 16802, USA
| | - Michele E Marini
- Center for Childhood Obesity Research, Penn State University, University Park, PA, 16802, USA
| | - Jennifer L Stokes
- Department of Pediatrics, Penn State College of Medicine, 500 University Drive, Hershey, PA, 17033, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, 16802, USA
| | - Matthew Reimherr
- Center for Medical Genomics, Penn State University, University Park, PA, 16802, USA. .,Department of Statistics, Penn State University, University Park, PA, 16802, USA.
| | - Francesca Chiaromonte
- Center for Medical Genomics, Penn State University, University Park, PA, 16802, USA. .,Department of Statistics, Penn State University, University Park, PA, 16802, USA. .,EMbeDS, Sant'Anna School of Advanced Studies, Piazza Martiri della Libertà, 33, Pisa, 56127, Italy.
| | - Kateryna D Makova
- Center for Medical Genomics, Penn State University, University Park, PA, 16802, USA. .,Department of Biology, Penn State University, University Park, PA, 16802, USA.
| |
Collapse
|
27
|
Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Grüning BA, Guerler A, Hillman-Jackson J, Hiltemann S, Jalili V, Rasche H, Soranzo N, Goecks J, Taylor J, Nekrutenko A, Blankenberg D. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 2018; 46:W537-W544. [PMID: 29790989 PMCID: PMC6030816 DOI: 10.1093/nar/gky379] [Citation(s) in RCA: 2148] [Impact Index Per Article: 358.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 04/25/2018] [Accepted: 05/02/2018] [Indexed: 02/06/2023] Open
Abstract
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially.
Collapse
Affiliation(s)
- Enis Afgan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Dannon Baker
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Bérénice Batut
- Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany
| | | | - Dave Bouvier
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Martin Čech
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - John Chilton
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Dave Clements
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Nate Coraor
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Björn A Grüning
- Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany
- Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Aysam Guerler
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer Hillman-Jackson
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Saskia Hiltemann
- Department of Pathology, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Vahid Jalili
- Department of Biomedical Engineering, Oregon Health and Science University, OR, USA
| | - Helena Rasche
- Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany
| | | | - Jeremy Goecks
- Department of Biomedical Engineering, Oregon Health and Science University, OR, USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| |
Collapse
|
28
|
Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC, Perez-Riverol Y. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 2018; 7. [PMID: 31543945 DOI: 10.12688/f1000research.15140.1] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/01/2018] [Indexed: 11/20/2022] Open
Abstract
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced under certain rules and standards in order to be reusable, compatible and easy to integrate into pipelines and analysis workflows. Here, we presented a set of recommendations developed by the BioContainers Community to produce standardized bioinformatics packages and containers. These recommendations provide practical guidelines to make bioinformatics software more discoverable, reusable and transparent. They are aimed to guide developers, organisations, journals and funders to increase the quality and sustainability of research software.
Collapse
Affiliation(s)
- Bjorn Gruening
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, 79110, Germany
| | - Olivier Sallou
- Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA/INRIA) - GenOuest Platform, Université de Rennes, Rennes, France
| | - Pablo Moreno
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Hervé Ménager
- Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France
| | - Dan Søndergaard
- Bioinformatics Research Centre, Aarhus University, Aarhus, DK-8000, Denmark
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
| | - Timo Sachsenberg
- Applied Bioinformatics Group, Wilhelm Schickard Institut für Informatik, Universität Tübingen, Tübingen, D-72076, Germany
| | - Brian O'Connor
- Computational Genomics Lab, UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, USA
| | - Fábio Madeira
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Michael R Crusoe
- Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
| | - Susheel Varma
- EMBL European Bioinformatics Institute, Cambridge, UK
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | | | | | | |
Collapse
|
29
|
Gruening B, Sallou O, Moreno P, da Veiga Leprevost F, Ménager H, Søndergaard D, Röst H, Sachsenberg T, O'Connor B, Madeira F, Dominguez Del Angel V, Crusoe MR, Varma S, Blankenberg D, Jimenez RC, Perez-Riverol Y. Recommendations for the packaging and containerizing of bioinformatics software. F1000Res 2018; 7. [PMID: 31543945 PMCID: PMC6738188 DOI: 10.12688/f1000research.15140.2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/18/2019] [Indexed: 11/22/2022] Open
Abstract
Software Containers are changing the way scientists and researchers develop, deploy and exchange scientific software. They allow labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. However, containers and software packages should be produced under certain rules and standards in order to be reusable, compatible and easy to integrate into pipelines and analysis workflows. Here, we presented a set of recommendations developed by the BioContainers Community to produce standardized bioinformatics packages and containers. These recommendations provide practical guidelines to make bioinformatics software more discoverable, reusable and transparent. They are aimed to guide developers, organisations, journals and funders to increase the quality and sustainability of research software.
Collapse
Affiliation(s)
- Bjorn Gruening
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, 79110, Germany
| | - Olivier Sallou
- Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA/INRIA) - GenOuest Platform, Université de Rennes, Rennes, France
| | - Pablo Moreno
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Hervé Ménager
- Center of Bioinformatics, Biostatistics and Integrative Biology, Institut Pasteur, Paris, France
| | - Dan Søndergaard
- Bioinformatics Research Centre, Aarhus University, Aarhus, DK-8000, Denmark
| | - Hannes Röst
- The Donnelly Centre, University of Toronto, Toronto, Ontario, M5S 3E1, Canada
| | - Timo Sachsenberg
- Applied Bioinformatics Group, Wilhelm Schickard Institut für Informatik, Universität Tübingen, Tübingen, D-72076, Germany
| | - Brian O'Connor
- Computational Genomics Lab, UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, California, USA
| | - Fábio Madeira
- EMBL European Bioinformatics Institute, Cambridge, UK
| | | | - Michael R Crusoe
- Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, USA
| | - Susheel Varma
- EMBL European Bioinformatics Institute, Cambridge, UK
| | - Daniel Blankenberg
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA
| | | | | | | |
Collapse
|
30
|
Abstract
Research in population genetics and evolutionary biology has always provided a computational backbone for life sciences as a whole. Today evolutionary and population biology reasoning are essential for interpretation of large complex datasets that are characteristic of all domains of today's life sciences ranging from cancer biology to microbial ecology. This situation makes algorithms and software tools developed by our community more important than ever before. This means that we, developers of software tool for molecular evolutionary analyses, now have a shared responsibility to make these tools accessible using modern technological developments as well as provide adequate documentation and training.
Collapse
Affiliation(s)
- Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA
| | - Galaxy Team
- The Galaxy Project, https://github.com/orgs/galaxyproject/people Biomedical Engineering
| | - Jeremy Goecks
- Department, Computational Biology Program, Oregon Health and Science University, Portland, OR
| | - James Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD
| | | |
Collapse
|
31
|
Afgan E, Baker D, van den Beek M, Blankenberg D, Bouvier D, Čech M, Chilton J, Clements D, Coraor N, Eberhard C, Grüning B, Guerler A, Hillman-Jackson J, Von Kuster G, Rasche E, Soranzo N, Turaga N, Taylor J, Nekrutenko A, Goecks J. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res 2016; 44:W3-W10. [PMID: 27137889 PMCID: PMC4987906 DOI: 10.1093/nar/gkw343] [Citation(s) in RCA: 1217] [Impact Index Per Article: 152.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 04/18/2016] [Indexed: 02/07/2023] Open
Abstract
High-throughput data production technologies, particularly ‘next-generation’ DNA sequencing, have ushered in widespread and disruptive changes to biomedical research. Making sense of the large datasets produced by these technologies requires sophisticated statistical and computational methods, as well as substantial computational power. This has led to an acute crisis in life sciences, as researchers without informatics training attempt to perform computation-dependent analyses. Since 2005, the Galaxy project has worked to address this problem by providing a framework that makes advanced computational tools usable by non experts. Galaxy seeks to make data-intensive research more accessible, transparent and reproducible by providing a Web-based environment in which users can perform computational analyses and have all of the details automatically tracked for later inspection, publication, or reuse. In this report we highlight recently added features enabling biomedical analyses on a large scale.
Collapse
Affiliation(s)
- Enis Afgan
- Department of Biology, Johns Hopkins University, Baltimore, MD USA
| | - Dannon Baker
- Department of Biology, Johns Hopkins University, Baltimore, MD USA
| | - Marius van den Beek
- Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Paris, France
| | - Daniel Blankenberg
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Dave Bouvier
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Martin Čech
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - John Chilton
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Dave Clements
- Department of Biology, Johns Hopkins University, Baltimore, MD USA
| | - Nate Coraor
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Carl Eberhard
- Department of Biology, Johns Hopkins University, Baltimore, MD USA
| | - Björn Grüning
- Department of Computer Science, Albert-Ludwigs-University, Freiburg, Freiburg, Germany Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany
| | - Aysam Guerler
- Department of Biology, Johns Hopkins University, Baltimore, MD USA
| | - Jennifer Hillman-Jackson
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Greg Von Kuster
- Academic Computing Services, Penn State University, University Park, PA, USA
| | - Eric Rasche
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, TX, USA
| | | | - Nitesh Turaga
- Department of Biology, Johns Hopkins University, Baltimore, MD USA
| | - James Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | - Jeremy Goecks
- The Computational Biology Institute, George Washington University, Washington DC, USA
| |
Collapse
|
32
|
Abstract
The availability of high-throughput sequencing has created enormous possibilities for scientific discovery. However, the massive amount of data being generated has resulted in a severe informatics bottleneck. A large number of tools exist for analyzing next-generation sequencing (NGS) data, yet often there remains a disconnect between these research tools and the ability of many researchers to use them. As a consequence, several online resources and communities have been developed to assist researchers with both the management and the analysis of sequencing data sets. Here we describe the use and applications of common file formats for coding and storing genomic data, consider several web-accessible open-source resources for the visualization and analysis of NGS data, and provide examples of typical analyses with links to further detailed exercises.
Collapse
Affiliation(s)
- Daniel Blankenberg
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania 16802
| | - James Taylor
- Departments of Biology and Computer Science, Johns Hopkins University, Baltimore, Maryland 21211
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania 16802
| |
Collapse
|
33
|
Dickins B, Rebolledo-Jaramillo B, Su MSW, Paul IM, Blankenberg D, Stoler N, Makova KD, Nekrutenko A. Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach. Biotechniques 2014; 56:134-141. [PMID: 24641477 DOI: 10.2144/000114146] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Accepted: 01/17/2014] [Indexed: 11/23/2022] Open
Abstract
Polymorphism discovery is a routine application of next-generation sequencing technology where multiple samples are sent to a service provider for library preparation, subsequent sequencing, and bioinformatic analyses. The decreasing cost and advances in multiplexing approaches have made it possible to analyze hundreds of samples at a reasonable cost. However, because of the manual steps involved in the initial processing of samples and handling of sequencing equipment, cross-contamination remains a significant challenge. It is especially problematic in cases where polymorphism frequencies do not adhere to diploid expectation, for example, heterogeneous tumor samples, organellar genomes, as well as during bacterial and viral sequencing. In these instances, low levels of contamination may be readily mistaken for polymorphisms, leading to false results. Here we describe practical steps designed to reliably detect contamination and uncover its origin, and also provide new, Galaxy-based, readily accessible computational tools and workflows for quality control. All results described in this report can be reproduced interactively on the web as described at http://usegalaxy.org/contamination.
Collapse
Affiliation(s)
- Benjamin Dickins
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA.,Department of Biology, Penn State University, University Park, PA
| | - Boris Rebolledo-Jaramillo
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA.,Interdisciplinary Graduate Program in BioSciences, Penn State University, University Park, PA
| | | | - Ian M Paul
- Department of Pediatrics, Penn State College of Medicine, Hershey, PA
| | - Daniel Blankenberg
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA
| | - Nicholas Stoler
- Interdisciplinary Graduate Program in BioSciences, Penn State University, University Park, PA
| | | | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA
| |
Collapse
|
34
|
Abstract
Summary: The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one’s own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cache of data for Galaxy has been an error-prone manual process lacking reproducibility and provenance. The Galaxy Data Manager framework is an enhancement that changes the management of Galaxy’s built-in data cache from a manual procedure to an automated graphical user interface (GUI) driven process, which contains the same openness, reproducibility and provenance that is afforded to Galaxy’s analysis tools. Data Manager tools allow the Galaxy administrator to download, create and install additional datasets for any type of reference data in real time. Availability and implementation: The Galaxy Data Manager framework is implemented in Python and has been integrated as part of the core Galaxy platform. Individual Data Manager tools can be defined locally or installed from a ToolShed, allowing the Galaxy community to define additional Data Manager tools as needed, with full versioning and dependency support. Contact:dan@bx.psu.edu. or anton@bx.psu.edu Supplementary information:Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Blankenberg
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA, http://www.galaxyproject.org, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA, Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USADepartment of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA, http://www.galaxyproject.org, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA, Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA
| | - James E Johnson
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA, http://www.galaxyproject.org, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA, Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA
| | | | - James Taylor
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA, http://www.galaxyproject.org, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA, Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USADepartment of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA, http://www.galaxyproject.org, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA, Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USADepartment of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA, http://www.galaxyproject.org, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA, Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA, http://www.galaxyproject.org, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA, Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USADepartment of Biochemistry and Molecular Biology, Penn State University, University Park, PA 16802, USA, http://www.galaxyproject.org, Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA, Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
35
|
Blankenberg D, Von Kuster G, Bouvier E, Baker D, Afgan E, Stoler N, Taylor J, Nekrutenko A. Dissemination of scientific software with Galaxy ToolShed. Genome Biol 2014; 15:403. [PMID: 25001293 PMCID: PMC4038738 DOI: 10.1186/gb4161] [Citation(s) in RCA: 135] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The proliferation of web-based integrative analysis frameworks has enabled users to perform complex analyses directly through the web. Unfortunately, it also revoked the freedom to easily select the most appropriate tools. To address this, we have developed Galaxy ToolShed.
Collapse
|
36
|
Abstract
The extraordinary throughput of next-generation sequencing (NGS) technology is outpacing our ability to analyze and interpret the data. This chapter will focus on practical informatics methods, strategies, and software tools for transforming NGS data into usable information through the use of a web-based platform, Galaxy. The Galaxy interface is explored through several different types of example analyses. Instructions for running one's own Galaxy server on local hardware or on cloud computing resources are provided. Installing new tools into a personal Galaxy instance is also demonstrated.
Collapse
Affiliation(s)
- Daniel Blankenberg
- Department of Biochemistry and Molecular Biology, Penn State University, 505 Wartik Laboratory, University Park, PA, 16802, USA,
| | | |
Collapse
|
37
|
Hillman-Jackson J, Clements D, Blankenberg D, Taylor J, Nekrutenko A. Using Galaxy to perform large-scale interactive data analyses. ACTA ACUST UNITED AC 2012; Chapter 10:10.5.1-10.5.47. [PMID: 22700312 DOI: 10.1002/0471250953.bi1005s38] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Innovations in biomedical research technologies continue to provide experimental biologists with novel and increasingly large genomic and high-throughput data resources to be analyzed. As creating and obtaining data has become easier, the key decision faced by many researchers is a practical one: where and how should an analysis be performed? Datasets are large and analysis tool set-up and use is riddled with complexities outside of the scope of core research activities. The authors believe that Galaxy provides a powerful solution that simplifies data acquisition and analysis in an intuitive Web application, granting all researchers access to key informatics tools previously only available to computational specialists working in Unix-based environments. We will demonstrate through a series of biomedically relevant protocols how Galaxy specifically brings together (1) data retrieval from public and private sources, for example, UCSC's Eukaryote and Microbial Genome Browsers, (2) custom tools (wrapped Unix functions, format standardization/conversions, interval operations), and 3rd-party analysis tools.
Collapse
|
38
|
Stamatoyannopoulos JA, Snyder M, Hardison R, Ren B, Gingeras T, Gilbert DM, Groudine M, Bender M, Kaul R, Canfield T, Giste E, Johnson A, Zhang M, Balasundaram G, Byron R, Roach V, Sabo PJ, Sandstrom R, Stehling AS, Thurman RE, Weissman SM, Cayting P, Hariharan M, Lian J, Cheng Y, Landt SG, Ma Z, Wold BJ, Dekker J, Crawford GE, Keller CA, Wu W, Morrissey C, Kumar SA, Mishra T, Jain D, Byrska-Bishop M, Blankenberg D, Lajoie1 BR, Jain G, Sanyal A, Chen KB, Denas O, Taylor J, Blobel GA, Weiss MJ, Pimkin M, Deng W, Marinov GK, Williams BA, Fisher-Aylor KI, Desalvo G, Kiralusha A, Trout D, Amrhein H, Mortazavi A, Edsall L, McCleary D, Kuan S, Shen Y, Yue F, Ye Z, Davis CA, Zaleski C, Jha S, Xue C, Dobin A, Lin W, Fastuca M, Wang H, Guigo R, Djebali S, Lagarde J, Ryba T, Sasaki T, Malladi VS, Cline MS, Kirkup VM, Learned K, Rosenbloom KR, Kent WJ, Feingold EA, Good PJ, Pazin M, Lowdon RF, Adams LB. An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome Biol 2012; 13:418. [PMID: 22889292 PMCID: PMC3491367 DOI: 10.1186/gb-2012-13-8-418] [Citation(s) in RCA: 343] [Impact Index Per Article: 28.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
To complement the human Encyclopedia of DNA Elements (ENCODE) project and to enable a broad range of mouse genomics efforts, the Mouse ENCODE Consortium is applying the same experimental pipelines developed for human ENCODE to annotate the mouse genome.
Collapse
Affiliation(s)
- John A Stamatoyannopoulos
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - Michael Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Ross Hardison
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, University of California San Diego, La Jolla, California, USA
| | - Thomas Gingeras
- Dept. of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, Florida, USA
| | - Mark Groudine
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Michael Bender
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Rajinder Kaul
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - Theresa Canfield
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - Erica Giste
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - Audra Johnson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - Mia Zhang
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Gayathri Balasundaram
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Rachel Byron
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - Vaughan Roach
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - Peter J Sabo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - A Sandra Stehling
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | - Robert E Thurman
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
| | | | - Philip Cayting
- Department of Genetics, Yale University, New Haven, Connecticut, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, USA
| | - Manoj Hariharan
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Jin Lian
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Yong Cheng
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Stephen G Landt
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Zhihai Ma
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Barbara J Wold
- Div. of Biology, California Institute of Technology, Pasadena, California, USA
| | - Job Dekker
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachussetts, USA
| | - Gregory E Crawford
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, USA
- Department of Pediatrics, Duke University, Durham, North Carolina, USA
| | - Cheryl A Keller
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Weisheng Wu
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Christopher Morrissey
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Swathi A Kumar
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Tejaswini Mishra
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Deepti Jain
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Marta Byrska-Bishop
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Daniel Blankenberg
- Center for Comparative Genomics and Bioinformatics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, USA
| | - Bryan R Lajoie1
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Gaurav Jain
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachussetts, USA
| | - Amartya Sanyal
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachussetts, USA
| | - Kaun-Bei Chen
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, USA
| | - Olgert Denas
- Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, USA
| | - James Taylor
- Department of Mathematics and Computer Science, Emory University, Atlanta, Georgia, USA
| | - Gerd A Blobel
- Div. of Hematology, Children's Hospital of Philadelphia, Abramson Research Center, Philadelphia, Pennsylvania, USA
| | - Mitchell J Weiss
- Div. of Hematology, Children's Hospital of Philadelphia, Abramson Research Center, Philadelphia, Pennsylvania, USA
| | - Max Pimkin
- Div. of Hematology, Children's Hospital of Philadelphia, Abramson Research Center, Philadelphia, Pennsylvania, USA
| | - Wulan Deng
- Div. of Hematology, Children's Hospital of Philadelphia, Abramson Research Center, Philadelphia, Pennsylvania, USA
| | - Georgi K Marinov
- Div. of Biology, California Institute of Technology, Pasadena, California, USA
| | - Brian A Williams
- Div. of Biology, California Institute of Technology, Pasadena, California, USA
| | | | - Gilberto Desalvo
- Div. of Biology, California Institute of Technology, Pasadena, California, USA
| | - Anthony Kiralusha
- Div. of Biology, California Institute of Technology, Pasadena, California, USA
| | - Diane Trout
- Div. of Biology, California Institute of Technology, Pasadena, California, USA
| | - Henry Amrhein
- Div. of Biology, California Institute of Technology, Pasadena, California, USA
| | - Ali Mortazavi
- Dept. of Developmental and Cell Biology, University of California Irvine, Irvine California, USA
| | - Lee Edsall
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, University of California San Diego, La Jolla, California, USA
| | - David McCleary
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, University of California San Diego, La Jolla, California, USA
| | - Samantha Kuan
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, University of California San Diego, La Jolla, California, USA
| | - Yin Shen
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, University of California San Diego, La Jolla, California, USA
| | - Feng Yue
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, University of California San Diego, La Jolla, California, USA
| | - Zhen Ye
- Department of Cellular and Molecular Medicine, Institute of Genomic Medicine, University of California San Diego, La Jolla, California, USA
| | - Carrie A Davis
- Dept. of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Chris Zaleski
- Dept. of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Sonali Jha
- Dept. of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Chenghai Xue
- Dept. of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Alex Dobin
- Dept. of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Wei Lin
- Dept. of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Meagan Fastuca
- Dept. of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Huaien Wang
- Dept. of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| | - Roderic Guigo
- Division of Bioinformatics and Genomics, Center for Genomic Regulation, Barcelona, Catalunya, Spain
| | - Sarah Djebali
- Division of Bioinformatics and Genomics, Center for Genomic Regulation, Barcelona, Catalunya, Spain
| | - Julien Lagarde
- Division of Bioinformatics and Genomics, Center for Genomic Regulation, Barcelona, Catalunya, Spain
| | - Tyrone Ryba
- Department of Biological Science, Florida State University, Tallahassee, Florida, USA
| | - Takayo Sasaki
- Department of Biological Science, Florida State University, Tallahassee, Florida, USA
| | - Venkat S Malladi
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California, USA
| | - Melissa S Cline
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California, USA
| | - Vanessa M Kirkup
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California, USA
| | - Kate R Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California, USA
| | - W James Kent
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California, USA
| | - Elise A Feingold
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Peter J Good
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Michael Pazin
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Rebecca F Lowdon
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Leslie B Adams
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
39
|
Blankenberg D, Taylor J, Nekrutenko A. Making whole genome multiple alignments usable for biologists. Bioinformatics 2011; 27:2426-8. [PMID: 21775304 PMCID: PMC3157923 DOI: 10.1093/bioinformatics/btr398] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2011] [Revised: 05/18/2011] [Accepted: 06/27/2011] [Indexed: 12/04/2022] Open
Abstract
SUMMARY Here we describe a set of tools implemented within the Galaxy platform designed to make analysis of multiple genome alignments truly accessible for biologists. These tools are available through both a web-based graphical user interface and a command-line interface. AVAILABILITY AND IMPLEMENTATION This open-source toolset was implemented in Python and has been integrated into the online data analysis platform Galaxy (public web access: http://usegalaxy.org; download: http://getgalaxy.org). Additional help is available as a live supplement from http://usegalaxy.org/u/dan/p/maf. CONTACT james.taylor@emory.edu; anton@bx.psu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Blankenberg
- The Huck Institutes for the Life Sciences and Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA, USA
| | | | | |
Collapse
|
40
|
Blankenberg D, Coraor N, Von Kuster G, Taylor J, Nekrutenko A. Integrating diverse databases into an unified analysis framework: a Galaxy approach. Database (Oxford) 2011; 2011:bar011. [PMID: 21531983 PMCID: PMC3092608 DOI: 10.1093/database/bar011] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Revised: 03/16/2011] [Accepted: 03/17/2011] [Indexed: 11/22/2022]
Abstract
Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. DATABASE URL: http://usegalaxy.org.
Collapse
Affiliation(s)
- Daniel Blankenberg
- The Galaxy Project, http://usegalaxy.org, The Huck Institutes for the Life Sciences, Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA and Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Nathan Coraor
- The Galaxy Project, http://usegalaxy.org, The Huck Institutes for the Life Sciences, Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA and Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Gregory Von Kuster
- The Galaxy Project, http://usegalaxy.org, The Huck Institutes for the Life Sciences, Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA and Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - James Taylor
- The Galaxy Project, http://usegalaxy.org, The Huck Institutes for the Life Sciences, Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA and Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Anton Nekrutenko
- The Galaxy Project, http://usegalaxy.org, The Huck Institutes for the Life Sciences, Department of Biochemistry and Molecular Biology, Penn State University, University Park, PA and Department of Biology and Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| |
Collapse
|
41
|
Abstract
Summary: Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps. Availability and Implementation: This open-source toolset was implemented in Python and has been integrated into the online data analysis platform Galaxy (public web access: http://usegalaxy.org; download: http://getgalaxy.org). Two short movies that highlight the functionality of tools described in this manuscript as well as results from testing components of this tool suite against a set of previously published files are available at http://usegalaxy.org/u/dan/p/fastq Contact:james.taylor@emory.edu; anton@bx.psu.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Blankenberg
- Huck Institute for the Life Sciences, Penn State University, University Park, PA 16803, USA
| | | | | | | | | | | | | |
Collapse
|
42
|
Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J. Galaxy: a web-based genome analysis tool for experimentalists. ACTA ACUST UNITED AC 2010; Chapter 19:Unit 19.10.1-21. [PMID: 20069535 DOI: 10.1002/0471142727.mb1910s89] [Citation(s) in RCA: 911] [Impact Index Per Article: 65.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
High-throughput data production has revolutionized molecular biology. However, massive increases in data generation capacity require analysis approaches that are more sophisticated, and often very computationally intensive. Thus, making sense of high-throughput data requires informatics support. Galaxy (http://galaxyproject.org) is a software system that provides this support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy is distributed both as a publicly available Web service, which provides tools for the analysis of genomic, comparative genomic, and functional genomic data, or a downloadable package that can be deployed in individual laboratories. Either way, it allows experimentalists without informatics or programming expertise to perform complex large-scale analysis with just a Web browser.
Collapse
Affiliation(s)
- Daniel Blankenberg
- The Huck Institutes for the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Miller W, Rosenbloom K, Hardison RC, Hou M, Taylor J, Raney B, Burhans R, King DC, Baertsch R, Blankenberg D, Kosakovsky Pond SL, Nekrutenko A, Giardine B, Harris RS, Tyekucheva S, Diekhans M, Pringle TH, Murphy WJ, Lesk A, Weinstock GM, Lindblad-Toh K, Gibbs RA, Lander ES, Siepel A, Haussler D, Kent WJ. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genes Dev 2007; 17:1797-808. [PMID: 17984227 PMCID: PMC2099589 DOI: 10.1101/gr.6761107] [Citation(s) in RCA: 207] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 08/30/2007] [Indexed: 01/17/2023]
Abstract
This article describes a set of alignments of 28 vertebrate genome sequences that is provided by the UCSC Genome Browser. The alignments can be viewed on the Human Genome Browser (March 2006 assembly) at http://genome.ucsc.edu, downloaded in bulk by anonymous FTP from http://hgdownload.cse.ucsc.edu/goldenPath/hg18/multiz28way, or analyzed with the Galaxy server at http://g2.bx.psu.edu. This article illustrates the power of this resource for exploring vertebrate and mammalian evolution, using three examples. First, we present several vignettes involving insertions and deletions within protein-coding regions, including a look at some human-specific indels. Then we study the extent to which start codons and stop codons in the human sequence are conserved in other species, showing that start codons are in general more poorly conserved than stop codons. Finally, an investigation of the phylogenetic depth of conservation for several classes of functional elements in the human genome reveals striking differences in the rates and modes of decay in alignability. Each functional class has a distinctive period of stringent constraint, followed by decays that allow (for the case of regulatory regions) or reject (for coding regions and ultraconserved elements) insertions and deletions.
Collapse
Affiliation(s)
- Webb Miller
- Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania 16802, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Blankenberg D, Taylor J, Schenck I, He J, Zhang Y, Ghent M, Veeraraghavan N, Albert I, Miller W, Makova KD, Hardison RC, Nekrutenko A. A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res 2007; 17:960-4. [PMID: 17568012 PMCID: PMC1891355 DOI: 10.1101/gr.5578007] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The standardization and sharing of data and tools are the biggest challenges of large collaborative projects such as the Encyclopedia of DNA Elements (ENCODE). Here we describe a compact Web application, Galaxy2(ENCODE), that effectively addresses these issues. It provides an intuitive interface for the deposition and access of data, and features a vast number of analysis tools including operations on genomic intervals, utilities for manipulation of multiple sequence alignments, and molecular evolution algorithms. By providing a direct link between data and analysis tools, Galaxy2(ENCODE) allows addressing biological questions that are beyond the reach of existing software. We use Galaxy2(ENCODE) to show that the ENCODE regions contain >2000 unannotated transcripts under strong purifying selection that are likely functional. We also show that the ENCODE regions are representative of the entire genome by estimating the rate of nucleotide substitution and comparing it to published data. Although each of these analyses is complex, none takes more than 15 min from beginning to end. Finally, we demonstrate how new tools can be added to Galaxy2(ENCODE) with almost no effort. Every section of the manuscript is supplemented with QuickTime screencasts. Galaxy2(ENCODE) and the screencasts can be accessed at http://g2.bx.psu.edu.
Collapse
Affiliation(s)
- Daniel Blankenberg
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - James Taylor
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Ian Schenck
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Jianbin He
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Yi Zhang
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Matthew Ghent
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Narayanan Veeraraghavan
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Istvan Albert
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Webb Miller
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Kateryna D. Makova
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Ross C. Hardison
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | - Anton Nekrutenko
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
- Corresponding author.E-mail ; fax (814) 863-6699
| |
Collapse
|
45
|
Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ, Nekrutenko A. Galaxy: a platform for interactive large-scale genome analysis. Genome Res 2005; 15:1451-5. [PMID: 16169926 PMCID: PMC1240089 DOI: 10.1101/gr.4086505] [Citation(s) in RCA: 1395] [Impact Index Per Article: 73.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Accessing and analyzing the exponentially expanding genomic sequence and functional data pose a challenge for biomedical researchers. Here we describe an interactive system, Galaxy, that combines the power of existing genome annotation databases with a simple Web portal to enable users to search remote resources, combine data from independent queries, and visualize the results. The heart of Galaxy is a flexible history system that stores the queries from each user; performs operations such as intersections, unions, and subtractions; and links to other computational tools. Galaxy can be accessed at http://g2.bx.psu.edu.
Collapse
Affiliation(s)
- Belinda Giardine
- Center for Comparative Genomics and Bioinformatics, Huck Institutes for Life Sciences, Penn State University, University Park, Pennsylvania 16802, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|