1
|
Abstract
ELIXIR is a pan-European intergovernmental organisation for life science that aims to coordinate bioinformatics resources in a single infrastructure across Europe; bioinformatics training is central to its strategy, which aims to develop a training community that spans all ELIXIR member states. In an evidence-based approach for strengthening bioinformatics training programmes across Europe, the ELIXIR Training Platform, led by the ELIXIR EXCELERATE Quality and Impact Assessment Subtask in collaboration with the ELIXIR Training Coordinators Group, has implemented an assessment strategy to measure quality and impact of its entire training portfolio. Here, we present ELIXIR’s framework for assessing training quality and impact, which includes the following: specifying assessment aims, determining what data to collect in order to address these aims, and our strategy for centralised data collection to allow for ELIXIR-wide analyses. In addition, we present an overview of the ELIXIR training data collected over the past 4 years. We highlight the importance of a coordinated and consistent data collection approach and the relevance of defining specific metrics and answer scales for consortium-wide analyses as well as for comparison of data across iterations of the same course.
Collapse
|
2
|
Abstract
Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it's sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They're often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all.
Collapse
|
3
|
Microarray-Based Quality Assessment as a Supporting Criterion for de novo Transcriptome Assembly Selection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:198-206. [PMID: 30059314 DOI: 10.1109/tcbb.2018.2860997] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
RNA-Sequencing and de novo assembly have enabled the analysis of species with non-available reference transcriptomes, although intrinsic features (biological and technical) induce errors in the reconstruction. A strategy to resolve these errors consists of varying assembling process parameters to generate multiple reconstructions. However, the best assembly selection remains a challenge. Quantitative metrics for quality assessment have been inconsistent when compared with pertinent references. In this paper, a criterion for supporting assembly selection based on mapping DNA microarray hybridized probes to assembly sets is proposed. Mouse and fruit fly RNA-Seq datasets were assembled with standard de novo procedures. Quality assessment was estimated using quantitative metrics and the proposed criterion. The assembly that best mapped to the available reference transcriptomes of these model species provided the highest quality assembly. The hybridized probes identified the best assemblies, whereas quantitative metrics remained inconsistent. For example, subtle probe mapping difference of 0.25 percent, but statistically significant (ANOVA, p < 0.05), enabled the assembly selection that led to identify 3,719 more contigs and led to 1,049 further mapped contigs to the mouse reference transcriptome. The microarray data availability for non-model species makes the proposed criterion suitable for quality assessment of multiple de novo assembly strategies.
Collapse
|
4
|
Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled “An intrinsically disordered protein user community proposal for ELIXIR” held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders.
Collapse
|
5
|
A new pan-European Train-the-Trainer programme for bioinformatics: pilot results on feasibility, utility and sustainability of learning. Brief Bioinform 2019; 20:405-415. [PMID: 29028883 PMCID: PMC6433894 DOI: 10.1093/bib/bbx112] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2017] [Revised: 07/26/2017] [Indexed: 11/22/2022] Open
Abstract
Demand for training life scientists in bioinformatics methods, tools and resources and computational approaches is urgent and growing. To meet this demand, new trainers must be prepared with effective teaching practices for delivering short hands-on training sessions—a specific type of education that is not typically part of professional preparation of life scientists in many countries. A new Train-the-Trainer (TtT) programme was created by adapting existing models, using input from experienced trainers and experts in bioinformatics, and from educational and cognitive sciences. This programme was piloted across Europe from May 2016 to January 2017. Preparation included drafting the training materials, organizing sessions to pilot them and studying this paradigm for its potential to support the development and delivery of future bioinformatics training by participants. Seven pilot TtT sessions were carried out, and this manuscript describes the results of the pilot year. Lessons learned include (i) support is required for logistics, so that new instructors can focus on their teaching; (ii) institutions must provide incentives to include training opportunities for those who want/need to become new or better instructors; (iii) formal evaluation of the TtT materials is now a priority; (iv) a strategy is needed to recruit, train and certify new instructor trainers (faculty); and (v) future evaluations must assess utility. Additionally, defining a flexible but rigorous and reliable process of TtT ‘certification’ may incentivize participants and will be considered in future.
Collapse
|
6
|
The development and application of bioinformatics core competencies to improve bioinformatics training and education. PLoS Comput Biol 2018; 14:e1005772. [PMID: 29390004 PMCID: PMC5794068 DOI: 10.1371/journal.pcbi.1005772] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans. As data size and complexity increase in life science research, so the need for bioinformatics training has increased. This training is required across a wide variety of audiences, but varies in the level of detail and content that needs to be delivered. A scientist wishing to use some bioinformatics tools to analyse their specific dataset will require different competencies than one that provides support in a bioinformatics services environment. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has attempted to address this by developing a set of bioinformatics core competencies and mapping these to ten different user profiles across the spectrum of potential trainees. Here we present the final iteration of the competencies and some examples to demonstrate how they have been used to drive bioinformatics curriculum development and training in different settings.
Collapse
|
7
|
The ELIXIR-EXCELERATE Train-the-Trainer pilot programme: empower researchers to deliver high-quality training. F1000Res 2017; 6:ELIXIR-1557. [PMID: 28928938 PMCID: PMC5596339 DOI: 10.12688/f1000research.12332.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/18/2017] [Indexed: 11/20/2022] Open
Abstract
One of the main goals of the ELIXIR-EXCELERATE project from the European Union's Horizon 2020 programme is to support a pan-European training programme to increase bioinformatics capacity and competency across ELIXIR Nodes. To this end, a Train-the-Trainer (TtT) programme has been developed by the TtT subtask of EXCELERATE's Training Platform, to try to expose bioinformatics instructors to aspects of pedagogy and evidence-based learning principles, to help them better design, develop and deliver high-quality training in future. As a first step towards such a programme, an ELIXIR-EXCELERATE TtT (EE-TtT) pilot was developed, drawing on existing 'instructor training' models, using input both from experienced instructors and from experts in bioinformatics, the cognitive sciences and educational psychology. This manuscript describes the process of defining the pilot programme, illustrates its goals, structure and contents, and discusses its outcomes. From Jan 2016 to Jan 2017, we carried out seven pilot EE-TtT courses (training more than sixty new instructors), collaboratively drafted the training materials, and started establishing a network of trainers and instructors within the ELIXIR community. The EE-TtT pilot represents an essential step towards the development of a sustainable and scalable ELIXIR TtT programme. Indeed, the lessons learned from the pilot, the experience gained, the materials developed, and the analysis of the feedback collected throughout the seven pilot courses have both positioned us to consolidate the programme in the coming years, and contributed to the development of an enthusiastic and expanding ELIXIR community of instructors and trainers.
Collapse
|
8
|
Abstract
ELIXIR-UK is the UK node of ELIXIR, the European infrastructure for life science data. Since its foundation in 2014, ELIXIR-UK has played a leading role in training both within the UK and in the ELIXIR Training Platform, which coordinates and delivers training across all ELIXIR members. ELIXIR-UK contributes to the Training Platform’s coordination and supports the development of training to address key skill gaps amongst UK scientists. As part of this work it acts as a conduit for nationally-important bioinformatics training resources to promote their activities to the ELIXIR community. ELIXIR-UK also leads ELIXIR’s flagship Training Portal, TeSS, which collects information about a diverse range of training and makes it easily accessible to the community. ELIXIR-UK also works with others to provide key digital skills training, partnering with the Software Sustainability Institute to provide Software Carpentry training to the ELIXIR community and to establish the Data Carpentry initiative, and taking a lead role amongst national stakeholders to deliver the StaTS project – a coordinated effort to drive engagement with training in statistics.
Collapse
|
9
|
Abstract
This Resource describes the Image Data Resource (IDR), a prototype online system for biological image data that links experimental and analytic data across multiple data sets and promotes image data sharing and reanalysis. Access to primary research data is vital for the advancement of science. To extend the data types supported by community repositories, we built a prototype Image Data Resource (IDR). IDR links data from several imaging modalities, including high-content screening, multi-dimensional microscopy and digital pathology, with public genetic or chemical databases and cell and tissue phenotypes expressed using controlled ontologies. Using this integration, IDR facilitates the analysis of gene networks and reveals functional interactions that are inaccessible to individual studies. To enable reanalysis, we also established a computational resource based on Jupyter notebooks that allows remote access to the entire IDR. IDR is also an open-source platform for publishing imaging data. Thus IDR provides an online resource and a software infrastructure that promotes and extends publication and reanalysis of scientific image data.
Collapse
|
10
|
Training in High-Throughput Sequencing: Common Guidelines to Enable Material Sharing, Dissemination, and Reusability. PLoS Comput Biol 2016; 12:e1004937. [PMID: 27309738 PMCID: PMC4910983 DOI: 10.1371/journal.pcbi.1004937] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
The advancement of high-throughput sequencing (HTS) technologies and the rapid development of numerous analysis algorithms and pipelines in this field has resulted in an unprecedentedly high demand for training scientists in HTS data analysis. Embarking on developing new training materials is challenging for many reasons. Trainers often do not have prior experience in preparing or delivering such materials and struggle to keep them up to date. A repository of curated HTS training materials would support trainers in materials preparation, reduce the duplication of effort by increasing the usage of existing materials, and allow for the sharing of teaching experience among the HTS trainers’ community. To achieve this, we have developed a strategy for materials’ curation and dissemination. Standards for describing training materials have been proposed and applied to the curation of existing materials. A Git repository has been set up for sharing annotated materials that can now be reused, modified, or incorporated into new courses. This repository uses Git; hence, it is decentralized and self-managed by the community and can be forked/built-upon by all users. The repository is accessible at http://bioinformatics.upsc.se/htmr. In recent years, the advancement of high-throughput sequencing (HTS) and the rapid development of numerous analysis algorithms and pipelines in this field have resulted in an unprecedentedly high demand for training scientists in HTS data analysis. Generating effective training materials is time-consuming, and a large body of training materials on HTS data analysis has already been generated but is rarely shared among trainers. In this paper we provide guidelines to trainers for describing training materials to increase their reusability. The best practices standards proposed here have been used to annotate a collection of HTS training materials, which is now available to the trainers’ community in Git and discoverable through the ELIXIR and GOBLET portals. Efforts are now underway to utilize the strategy presented in this paper to annotate a wider collection of training materials and define a generic approach for the curation and dissemination of materials that should be adopted by existing training portals and new emerging initiatives.
Collapse
|
11
|
Abstract
Background Phenotypic data derived from high content screening is currently annotated using free-text, thus preventing the integration of independent datasets, including those generated in different biological domains, such as cell lines, mouse and human tissues. Description We present the Cellular Microscopy Phenotype Ontology (CMPO), a species neutral ontology for describing phenotypic observations relating to the whole cell, cellular components, cellular processes and cell populations. CMPO is compatible with related ontology efforts, allowing for future cross-species integration of phenotypic data. CMPO was developed following a curator-driven approach where phenotype data were annotated by expert biologists following the Entity-Quality (EQ) pattern. These EQs were subsequently transformed into new CMPO terms following an established post composition process. Conclusion CMPO is currently being utilized to annotate phenotypes associated with high content screening datasets stored in several image repositories including the Image Data Repository (IDR), MitoSys project database and the Cellular Phenotype Database to facilitate data browsing and discoverability.
Collapse
|
12
|
Applying, Evaluating and Refining Bioinformatics Core Competencies (An Update from the Curriculum Task Force of ISCB's Education Committee). PLoS Comput Biol 2016; 12:e1004943. [PMID: 27175996 PMCID: PMC4866758 DOI: 10.1371/journal.pcbi.1004943] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
|
13
|
Abstract
High content screening (HCS) experiments create a classic data management challenge-multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of "final" results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org.
Collapse
|
14
|
Abstract
Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges that lead to a translation of experimental findings into clinical applications and thereby support 'bench to bedside' efforts. However, to build this translational bridge, a common and universal understanding of phenotypes is required that goes beyond domain-specific definitions. To achieve this ambitious goal, a digital revolution is ongoing that enables the encoding of data in computer-readable formats and the data storage in specialized repositories, ready for integration, enabling translational research. While phenome research is an ongoing endeavor, the true potential hidden in the currently available data still needs to be unlocked, offering exciting opportunities for the forthcoming years. Here, we provide insights into the state-of-the-art in digital phenotyping, by means of representing, acquiring and analyzing phenotype data. In addition, we provide visions of this field for future research work that could enable better applications of phenotype data.
Collapse
|
15
|
Cellular phenotype database: a repository for systems microscopy data. Bioinformatics 2015; 31:2736-40. [PMID: 25861964 PMCID: PMC4528631 DOI: 10.1093/bioinformatics/btv199] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2014] [Accepted: 04/01/2015] [Indexed: 12/02/2022] Open
Abstract
Motivation: The Cellular Phenotype Database (CPD) is a repository for data derived from high-throughput systems microscopy studies. The aims of this resource are: (i) to provide easy access to cellular phenotype and molecular localization data for the broader research community; (ii) to facilitate integration of independent phenotypic studies by means of data aggregation techniques, including use of an ontology and (iii) to facilitate development of analytical methods in this field. Results: In this article we present CPD, its data structure and user interface, propose a minimal set of information describing RNA interference experiments, and suggest a generic schema for management and aggregation of outputs from phenotypic or molecular localization experiments. The database has a flexible structure for management of data from heterogeneous sources of systems microscopy experimental outputs generated by a variety of protocols and technologies and can be queried by gene, reagent, gene attribute, study keywords, phenotype or ontology terms. Availability and implementation: CPD is developed as part of the Systems Microscopy Network of Excellence and is accessible at http://www.ebi.ac.uk/fg/sym. Contact:jes@ebi.ac.uk or ugis@ebi.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
16
|
Tumor cell migration screen identifies SRPK1 as breast cancer metastasis determinant. J Clin Invest 2015; 125:1648-64. [PMID: 25774502 DOI: 10.1172/jci74440] [Citation(s) in RCA: 93] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 01/29/2015] [Indexed: 01/14/2023] Open
Abstract
Tumor cell migration is a key process for cancer cell dissemination and metastasis that is controlled by signal-mediated cytoskeletal and cell matrix adhesion remodeling. Using a phagokinetic track assay with migratory H1299 cells, we performed an siRNA screen of almost 1,500 genes encoding kinases/phosphatases and adhesome- and migration-related proteins to identify genes that affect tumor cell migration speed and persistence. Thirty candidate genes that altered cell migration were validated in live tumor cell migration assays. Eight were associated with metastasis-free survival in breast cancer patients, with integrin β3-binding protein (ITGB3BP), MAP3K8, NIMA-related kinase (NEK2), and SHC-transforming protein 1 (SHC1) being the most predictive. Examination of genes that modulate migration indicated that SRPK1, encoding the splicing factor kinase SRSF protein kinase 1, is relevant to breast cancer outcomes, as it was highly expressed in basal breast cancer. Furthermore, high SRPK1 expression correlated with poor breast cancer disease outcome and preferential metastasis to the lungs and brain. In 2 independent murine models of breast tumor metastasis, stable shRNA-based SRPK1 knockdown suppressed metastasis to distant organs, including lung, liver, and spleen, and inhibited focal adhesion reorganization. Our study provides comprehensive information on the molecular determinants of tumor cell migration and suggests that SRPK1 has potential as a drug target for limiting breast cancer metastasis.
Collapse
|
17
|
Abstract
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42 000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.
Collapse
|
18
|
Expression Atlas update--a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments. Nucleic Acids Res 2013; 42:D926-32. [PMID: 24304889 PMCID: PMC3964963 DOI: 10.1093/nar/gkt1270] [Citation(s) in RCA: 251] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.
Collapse
|
19
|
Data Mining and Meta-Analysis on DNA Microarray Data. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Microarray technology enables high-throughput parallel gene expression analysis, and use has grown exponentially thanks to the development of a variety of applications for expression, genetics and epigenetic studies. A wealth of data is now available from public repositories, providing unprecedented opportunities for meta-analysis approaches, which could generate new biological information, unrelated to the original scope of individual studies. This study provides a guideline for identification of biological significance of the statistically-selected differentially-expressed genes derived from gene expression arrays as well as to suggest further analysis pathways. The authors review the prerequisites for data-mining and meta-analysis, summarize the conceptual methods to derive biological information from microarray data and suggest software for each category of data mining or meta-analysis.
Collapse
|
20
|
The challenges of delivering bioinformatics training in the analysis of high-throughput data. Brief Bioinform 2013; 14:538-47. [PMID: 23543353 PMCID: PMC3771233 DOI: 10.1093/bib/bbt018] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
High-throughput technologies are widely used in the field of functional genomics and used in an increasing number of applications. For many ‘wet lab’ scientists, the analysis of the large amount of data generated by such technologies is a major bottleneck that can only be overcome through very specialized training in advanced data analysis methodologies and the use of dedicated bioinformatics software tools. In this article, we wish to discuss the challenges related to delivering training in the analysis of high-throughput sequencing data and how we addressed these challenges in the hands-on training courses that we have developed at the European Bioinformatics Institute.
Collapse
|
21
|
Abstract
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
Collapse
|
22
|
Semantic integration of physiology phenotypes with an application to the Cellular Phenotype Ontology. ACTA ACUST UNITED AC 2012; 28:1783-9. [PMID: 22539675 DOI: 10.1093/bioinformatics/bts250] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION The systematic observation of phenotypes has become a crucial tool of functional genomics, and several large international projects are currently underway to identify and characterize the phenotypes that are associated with genotypes in several species. To integrate phenotype descriptions within and across species, phenotype ontologies have been developed. Applying ontologies to unify phenotype descriptions in the domain of physiology has been a particular challenge due to the high complexity of the underlying domain. RESULTS In this study, we present the outline of a theory and its implementation for an ontology of physiology-related phenotypes. We provide a formal description of process attributes and relate them to the attributes of their temporal parts and participants. We apply our theory to create the Cellular Phenotype Ontology (CPO). The CPO is an ontology of morphological and physiological phenotypic characteristics of cells, cell components and cellular processes. Its prime application is to provide terms and uniform definition patterns for the annotation of cellular phenotypes. The CPO can be used for the annotation of observed abnormalities in domains, such as systems microscopy, in which cellular abnormalities are observed and for which no phenotype ontology has been created. AVAILABILITY AND IMPLEMENTATION The CPO and the source code we generated to create the CPO are freely available on http://cell-phenotype.googlecode.com.
Collapse
|
23
|
Gene Expression Atlas update--a value-added database of microarray and sequencing-based functional genomics experiments. Nucleic Acids Res 2011; 40:D1077-81. [PMID: 22064864 PMCID: PMC3245177 DOI: 10.1093/nar/gkr913] [Citation(s) in RCA: 124] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19 014 biological conditions in 136 551 assays from 5598 independent studies.
Collapse
|
24
|
ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 2010; 39:D1002-4. [PMID: 21071405 PMCID: PMC3013660 DOI: 10.1093/nar/gkq1040] [Citation(s) in RCA: 271] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The ArrayExpress Archive (http://www.ebi.ac.uk/arrayexpress) is one of the three international public repositories of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
Collapse
|
25
|
Abstract
The Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive of Functional Genomics Data. A simple interface allows the user to query for differential gene expression either (i) by gene names or attributes such as Gene Ontology terms, or (ii) by biological conditions, e.g. diseases, organism parts or cell types. The gene queries return the conditions where expression has been reported, while condition queries return which genes are reported to be expressed in these conditions. A combination of both query types is possible. The query results are ranked using various statistical measures and by how many independent studies in the database show the particular gene-condition association. Currently, the database contains information about more than 200 000 genes from nine species and almost 4500 biological conditions studied in over 30 000 assays from over 1000 independent studies.
Collapse
|
26
|
The fission yeast homeodomain protein Yox1p binds to MBF and confines MBF-dependent cell-cycle transcription to G1-S via negative feedback. PLoS Genet 2009; 5:e1000626. [PMID: 19714215 PMCID: PMC2726434 DOI: 10.1371/journal.pgen.1000626] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2009] [Accepted: 07/31/2009] [Indexed: 12/31/2022] Open
Abstract
The regulation of the G1- to S-phase transition is critical for cell-cycle progression. This transition is driven by a transient transcriptional wave regulated by transcription factor complexes termed MBF/SBF in yeast and E2F-DP in mammals. Here we apply genomic, genetic, and biochemical approaches to show that the Yox1p homeodomain protein of fission yeast plays a critical role in confining MBF-dependent transcription to the G1/S transition of the cell cycle. The yox1 gene is an MBF target, and Yox1p accumulates and preferentially binds to MBF-regulated promoters, via the MBF components Res2p and Nrm1p, when they are transcriptionally repressed during the cell cycle. Deletion of yox1 results in constitutively high transcription of MBF target genes and loss of their cell cycle–regulated expression, similar to deletion of nrm1. Genome-wide location analyses of Yox1p and the MBF component Cdc10p reveal dozens of genes whose promoters are bound by both factors, including their own genes and histone genes. In addition, Cdc10p shows promiscuous binding to other sites, most notably close to replication origins. This study establishes Yox1p as a new regulatory MBF component in fission yeast, which is transcriptionally induced by MBF and in turn inhibits MBF-dependent transcription. Yox1p may function together with Nrm1p to confine MBF-dependent transcription to the G1/S transition of the cell cycle via negative feedback. Compared to the orthologous budding yeast Yox1p, which indirectly functions in a negative feedback loop for cell-cycle transcription, similarities but also notable differences in the wiring of the regulatory circuits are evident. Cells proliferate by growth and division, which is supported by different gene groups that are periodically induced at specific times when they are required during the cell cycle. These genes not only need to be induced at the right time but also repressed when they are no longer required; mistakes in gene regulation can lead to problems in cell proliferation and diseases such as cancer. A well-known regulatory complex functions just before cells replicate their DNA to induce genes required for this important transition. We show that in fission yeast this regulatory complex (MBF) induces a gene whose encoded protein (Yox1p) in turn binds to MBF and represses MBF-regulated genes. In the absence of Yox1p, the MBF-regulated genes do not fluctuate during the cell cycle but remain constantly induced. Thus, MBF sets up not only the induction but also the timely repression of its target genes via Yox1p. We also provide a global analysis of all the genes regulated by Yox1p and MBF. Together, our data uncover a new negative control loop, further highlighting the sophistication of gene regulation during the cell cycle, and illustrating regulatory similarities and differences between organisms.
Collapse
|
27
|
ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res 2009; 37:D868-72. [PMID: 19015125 PMCID: PMC2686529 DOI: 10.1093/nar/gkn889] [Citation(s) in RCA: 346] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 10/17/2008] [Accepted: 10/20/2008] [Indexed: 11/13/2022] Open
Abstract
ArrayExpress http://www.ebi.ac.uk/arrayexpress consists of three components: the ArrayExpress Repository--a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse--a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas--a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200,000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently-ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.
Collapse
|
28
|
Abstract
ArrayExpress at the European Bioinformatics Institute is a public database for MIAME-compliant microarray and transcriptomics data. It consists of two parts: the ArrayExpress Repository, which is a public archive of microarray data, and the ArrayExpress Warehouse of Gene Expression Profiles, which contains additionally curated subsets of data from the Repository. Archived experiments can be queried by experimental attributes, such as keywords, species, array platform, publication details, or accession numbers. Gene expression profiles can be queried by gene names and properties, such as Gene Ontology terms, allowing expression profiles visualization. The data can be exported and analyzed using the online data analysis tool named Expression Profiler. Data analysis components, such as data preprocessing, filtering, differentially expressed gene finding, clustering methods, and ordination-based techniques, as well as other statistical tools are all available in Expression Profiler, via integration with the statistical package R.
Collapse
|
29
|
Global transcriptional responses of fission and budding yeast to changes in copper and iron levels: a comparative study. Genome Biol 2007; 8:R73. [PMID: 17477863 PMCID: PMC1929147 DOI: 10.1186/gb-2007-8-5-r73] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2006] [Revised: 01/31/2007] [Accepted: 05/03/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Recent studies in comparative genomics demonstrate that interspecies comparison represents a powerful tool for identifying both conserved and specialized biologic processes across large evolutionary distances. All cells must adjust to environmental fluctuations in metal levels, because levels that are too low or too high can be detrimental. Here we explore the conservation of metal homoeostasis in two distantly related yeasts. RESULTS We examined genome-wide gene expression responses to changing copper and iron levels in budding and fission yeast using DNA microarrays. The comparison reveals conservation of only a small core set of genes, defining the copper and iron regulons, with a larger number of additional genes being specific for each species. Novel regulatory targets were identified in Schizosaccharomyces pombe for Cuf1p (pex7 and SPAC3G6.05) and Fep1p (srx1, sib1, sib2, rds1, isu1, SPBC27B12.03c, SPAC1F8.02c, and SPBC947.05c). We also present evidence refuting a direct role of Cuf1p in the repression of genes involved in iron uptake. Remarkable differences were detected in responses of the two yeasts to excess copper, probably reflecting evolutionary adaptation to different environments. CONCLUSION The considerable evolutionary distance between budding and fission yeast resulted in substantial diversion in the regulation of copper and iron homeostasis. Despite these differences, the conserved regulation of a core set of genes involved in the uptake of these metals provides valuable clues to key features of metal metabolism.
Collapse
|
30
|
Abstract
Using a novel cell-based assay to profile transcriptional pathway targeting, we have identified a new functional class of thalidomide analogs with distinct and selective antileukemic activity. These agents activate nuclear factor of activated T cells (NFAT) transcriptional pathways while simultaneously repressing nuclear factor-kappaB (NF-kappaB) via a rapid intracellular amplification of reactive oxygen species (ROS). The elevated ROS is associated with increased intracellular free calcium, rapid dissipation of the mitochondrial membrane potential, disrupted mitochondrial structure, and caspase-independent cell death. This cytotoxicity is highly selective for transformed lymphoid cells, is reversed by free radical scavengers, synergizes with the antileukemic activity of other redox-directed compounds, and preferentially targets cells in the S phase of the cell cycle. Live-cell imaging reveals a rapid drug-induced burst of ROS originating in the endoplasmic reticulum and associated mitochondria just prior to spreading throughout the cell. As members of a novel functional class of "redoxreactive" thalidomides, these compounds provide a new tool through which selective cellular properties of redox status and intracellular bioactivation can be leveraged by rational combinatorial therapeutic strategies and appropriate drug design to exploit cell-specific vulnerabilities for maximum drug efficacy.
Collapse
|
31
|
Abstract
The cellular response to the antitumor drug cisplatin is complex, and resistance is widespread. To gain insights into the global transcriptional response and mechanisms of resistance, we used microarrays to examine the fission yeast cell response to cisplatin. In two isogenic strains with differing drug sensitivity, cisplatin activated a stress response involving glutathione-S-transferase, heat shock, and recombinational repair genes. Genes required for proteasome-mediated protein degradation were up-regulated in the sensitive strain, whereas genes for DNA damage recognition/repair and for mitotic progression were induced in the resistant strain. The response to cisplatin overlaps in part with the responses to cadmium and the DNA-damaging agent methylmethane sulfonate. The different gene groups involved in the cellular response to cisplatin help the cells to tolerate and repair DNA damage and to overcome cell cycle blocks. These findings are discussed with respect to known cisplatin response pathways in human cells.
Collapse
|
32
|
Periodic gene expression program of the fission yeast cell cycle. Nat Genet 2004; 36:809-17. [PMID: 15195092 DOI: 10.1038/ng1377] [Citation(s) in RCA: 354] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2004] [Accepted: 05/18/2004] [Indexed: 01/28/2023]
Abstract
Cell-cycle control of transcription seems to be universal, but little is known about its global conservation and biological significance. We report on the genome-wide transcriptional program of the Schizosaccharomyces pombe cell cycle, identifying 407 periodically expressed genes of which 136 show high-amplitude changes. These genes cluster in four major waves of expression. The forkhead protein Sep1p regulates mitotic genes in the first cluster, including Ace2p, which activates transcription in the second cluster during the M-G1 transition and cytokinesis. Other genes in the second cluster, which are required for G1-S progression, are regulated by the MBF complex independently of Sep1p and Ace2p. The third cluster coincides with S phase and a fourth cluster contains genes weakly regulated during G2 phase. Despite conserved cell-cycle transcription factors, differences in regulatory circuits between fission and budding yeasts are evident, revealing evolutionary plasticity of transcriptional control. Periodic transcription of most genes is not conserved between the two yeasts, except for a core set of approximately 40 genes that seem to be universally regulated during the eukaryotic cell cycle and may have key roles in cell-cycle progression.
Collapse
|
33
|
Whole-genome microarrays of fission yeast: characteristics, accuracy, reproducibility, and processing of array data. BMC Genomics 2003; 4:27. [PMID: 12854975 PMCID: PMC179895 DOI: 10.1186/1471-2164-4-27] [Citation(s) in RCA: 181] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2003] [Accepted: 07/10/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The genome of the fission yeast Schizosaccharomyces pombe has recently been sequenced, setting the stage for the post-genomic era of this increasingly popular model organism. We have built fission yeast microarrays, optimised protocols to improve array performance, and carried out experiments to assess various characteristics of microarrays. RESULTS We designed PCR primers to amplify specific probes (180-500 bp) for all known and predicted fission yeast genes, which are printed in duplicate onto separate regions of glass slides together with control elements (approximately 13,000 spots/slide). Fluorescence signal intensities depended on the size and intragenic position of the array elements, whereas the signal ratios were largely independent of element properties. Only the coding strand is covalently linked to the slides, and our array elements can discriminate transcriptional direction. The microarrays can distinguish sequences with up to 70% identity, above which cross-hybridisation contributes to the signal intensity. We tested the accuracy of signal ratios and measured the reproducibility of array data caused by biological and technical factors. Because the technical variability is lower, it is best to use samples prepared from independent biological experiments to obtain repeated measurements with swapping of fluorochromes to prevent dye bias. We also developed a script that discards unreliable data and performs a normalization to correct spatial artefacts. CONCLUSIONS This paper provides data for several microarray properties that are rarely measured. The results define critical parameters for microarray design and experiments and provide a framework to optimise and interpret array data. Our arrays give reproducible and accurate expression ratios with high sensitivity. The scripts for primer design and initial data processing as well as primer sequences and detailed protocols are available from our website.
Collapse
|
34
|
Nucleotide sequence, genome organisation and phylogenetic analysis of Indian citrus ringspot virus. Brief report. Arch Virol 2002; 147:2215-24. [PMID: 12417955 DOI: 10.1007/s00705-002-0875-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The sequence of the single-stranded RNA genome of Indian citrus ringspot virus (ICRSV) consists of 7560 nucleotides. It contains six open reading frames (ORFs) which encode putative proteins of 187.3, 25, 12, 6.4, 34 and 23 kDa respectively. ORF1 encodes a polypeptide that contains all the elements of a replicase; ORFs 2, 3 and 4 compose a triple-gene block; ORF5 encodes the capsid protein; the function of ORF6 is unknown. Phylogenetic analysis of the complete genome and each ORF separately, and database searches indicate that ICRSV, though showing some similarities to potexviruses, is significantly different, as in the presence of ORF6, the genome and CP sizes, and particle morphology. These differences favour its inclusion in a new virus genus.
Collapse
|
35
|
Indian citrus ringspot virus: a proposed new species with some affinities to potex-, carla-, fovea- and allexiviruses. Arch Virol 2001; 145:1895-908. [PMID: 11043949 DOI: 10.1007/s007050070064] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
An isolate of Indian citrus ringspot virus from Kinnow mandarin in northern India had flexuous particles with evident cross-banding and a modal length of 650 nm. It was mechanically transmitted to five herbaceous hosts including Phaseolus vulgaris cv Saxa, in which it became systemic. In thin sections, virus particles were observed in the cytoplasm of parenchyma cells but no specific inclusions were seen. The virus was purified from infected Saxa bean leaves and an antiserum prepared. There was no serological cross-reaction with representative allexi-, capillo-, potex- and trichoviruses, except a faint one-way reaction with Potato virus X. Purified virus yielded a major band, the presumed coat protein (CP), of about 34 kDa, and a single ssRNA of about 7.5 kb, which was infectious. Two ORFs encoding putative proteins of 34 kDa and 23 kDa were located in the 3' part of the RNA. The product of the 34 kDa ORF was confirmed as the CP by expression in E. coli. The derived amino acid sequence of the CP contained some short motifs similar to those of potex-, fovea-, carla- and allexiviruses but otherwise there was no strong similarity to any of these. The 23 kDa ORF contained a zinc finger-like sequence, as in similar ORFs in carla- and allexiviruses but overall amino acid homology with these was low. The virus does not appear to fall into any known genus. A new species is proposed. Serological and molecular diagnostic reagents were prepared.
Collapse
|