1
|
Du P, Fan R, Zhang N, Wu C, Zhang Y. Advances in Integrated Multi-omics Analysis for Drug-Target Identification. Biomolecules 2024; 14:692. [PMID: 38927095 PMCID: PMC11201992 DOI: 10.3390/biom14060692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/08/2024] [Accepted: 06/12/2024] [Indexed: 06/28/2024] Open
Abstract
As an essential component of modern drug discovery, the role of drug-target identification is growing increasingly prominent. Additionally, single-omics technologies have been widely utilized in the process of discovering drug targets. However, it is difficult for any single-omics level to clearly expound the causal connection between drugs and how they give rise to the emergence of complex phenotypes. With the progress of large-scale sequencing and the development of high-throughput technologies, the tendency in drug-target identification has shifted towards integrated multi-omics techniques, gradually replacing traditional single-omics techniques. Herein, this review centers on the recent advancements in the domain of integrated multi-omics techniques for target identification, highlights the common multi-omics analysis strategies, briefly summarizes the selection of multi-omics analysis tools, and explores the challenges of existing multi-omics analyses, as well as the applications of multi-omics technology in drug-target identification.
Collapse
Affiliation(s)
- Peiling Du
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (P.D.); (R.F.); (N.Z.); (C.W.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| | - Rui Fan
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (P.D.); (R.F.); (N.Z.); (C.W.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| | - Nana Zhang
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (P.D.); (R.F.); (N.Z.); (C.W.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| | - Chenyuan Wu
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (P.D.); (R.F.); (N.Z.); (C.W.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| | - Yingqian Zhang
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (P.D.); (R.F.); (N.Z.); (C.W.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| |
Collapse
|
2
|
Sistrom M, Andrews H, Edwards DL. Comparative genomics of Japanese encephalitis virus shows low rates of recombination and a small subset of codon positions under episodic diversifying selection. PLoS Negl Trop Dis 2024; 18:e0011459. [PMID: 38295106 PMCID: PMC10861042 DOI: 10.1371/journal.pntd.0011459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 02/12/2024] [Accepted: 01/11/2024] [Indexed: 02/02/2024] Open
Abstract
Orthoflavivirus japonicum (JEV) is the dominant cause of viral encephalitis in the Asian region with 100,000 cases and 25,000 deaths reported annually. The genome is comprised of a single polyprotein that encodes three structural and seven non-structural proteins. We collated a dataset of 349 complete genomes from a number of public databases, and analysed the data for recombination, evolutionary selection and phylogenetic structure. There are low rates of recombination in JEV, subsequently recombination is not a major evolutionary force shaping JEV. We found a strong overall signal of purifying selection in the genome, which is the main force affecting the evolutionary dynamics in JEV. There are also a small number of genomic sites under episodic diversifying selection, especially in the envelope protein and non-structural proteins 3 and 5. Overall, these results support previous analyses of JEV evolutionary genomics and provide additional insight into the evolutionary processes shaping the distribution and adaptation of this important pathogenic arbovirus.
Collapse
Affiliation(s)
- Mark Sistrom
- Department of Industry, Trade and Tourism, Berrimah Veterinary Laboratories, Darwin, Australia
- Research Institute for the Environment and Livelihoods, Faculty of Science and Technology, Charles Darwin University, Casuarina, Australia
| | - Hannah Andrews
- Department of Industry, Trade and Tourism, Berrimah Veterinary Laboratories, Darwin, Australia
| | - Danielle L. Edwards
- Research Institute for the Environment and Livelihoods, Faculty of Science and Technology, Charles Darwin University, Casuarina, Australia
- Department of Natural Sciences, Museum and Art Gallery of the Northern Territory, Darwin, Australia
| |
Collapse
|
3
|
Deng CH, Naithani S, Kumari S, Cobo-Simón I, Quezada-Rodríguez EH, Skrabisova M, Gladman N, Correll MJ, Sikiru AB, Afuwape OO, Marrano A, Rebollo I, Zhang W, Jung S. Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences. Database (Oxford) 2023; 2023:baad088. [PMID: 38079567 PMCID: PMC10712715 DOI: 10.1093/database/baad088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 10/17/2023] [Accepted: 11/28/2023] [Indexed: 12/18/2023]
Abstract
Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021-22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.
Collapse
Affiliation(s)
- Cecilia H Deng
- Molecular and Digital Breeding, New Cultivar Innovation, The New Zealand Institute for Plant and Food Research Limited, 120 Mt Albert Road, Auckland 1025, New Zealand
| | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
| | - Irene Cobo-Simón
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
- Institute of Forest Science (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Elsa H Quezada-Rodríguez
- Departamento de Producción Agrícola y Animal, Universidad Autónoma Metropolitana-Xochimilco, Ciudad de México, México
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Maria Skrabisova
- Department of Biochemistry, Faculty of Science, Palacky University, Olomouc, Czech Republic
| | - Nick Gladman
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
- U.S. Department of Agriculture-Agricultural Research Service, NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY 14853, USA
| | - Melanie J Correll
- Agricultural and Biological Engineering Department, University of Florida, 1741 Museum Rd, Gainesville, FL 32611, USA
| | | | | | - Annarita Marrano
- Phoenix Bioinformatics, 39899 Balentine Drive, Suite 200, Newark, CA 94560, USA
| | | | - Wentao Zhang
- National Research Council Canada, 110 Gymnasium Pl, Saskatoon, Saskatchewan S7N 0W9, Canada
| | - Sook Jung
- Department of Horticulture, Washington State University, 303c Plant Sciences Building, Pullman, WA 99164-6414, USA
| |
Collapse
|
4
|
Plessis C, Jeanne T, Dionne A, Vivancos J, Droit A, Hogue R. ASVmaker: A New Tool to Improve Taxonomic Identifications for Amplicon Sequencing Data. PLANTS (BASEL, SWITZERLAND) 2023; 12:3678. [PMID: 37960035 PMCID: PMC10647208 DOI: 10.3390/plants12213678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 10/20/2023] [Accepted: 10/22/2023] [Indexed: 11/15/2023]
Abstract
The taxonomic assignment of sequences obtained by high throughput amplicon sequencing poses a limitation for various applications in the biomedical, environmental, and agricultural fields. Identifications are constrained by the length of the obtained sequences and the computational processes employed to efficiently assign taxonomy. Arriving at a consensus is often preferable to uncertain identification for ecological purposes. To address this issue, a new tool called "ASVmaker" has been developed to facilitate the creation of custom databases, thereby enhancing the precision of specific identifications. ASVmaker is specifically designed to generate reference databases for allocating amplicon sequencing data. It uses publicly available reference data and generates specific sequences derived from the primers used to create amplicon sequencing libraries. This versatile tool can complete taxonomic assignments performed with pre-trained classifiers from the SILVA and UNITE databases. Moreover, it enables the generation of comprehensive reference databases for specific genes in cases where no directly applicable database exists for taxonomic classification tools.
Collapse
Affiliation(s)
- Clément Plessis
- Institut de Recherche et de Développement en Agroenvironnement, Québec, QC G1P 3W8, Canada
- Computational Biology Laboratory, CHU de Québec—Université Laval Research Center, Québec City, QC G1V 4G2, Canada
| | - Thomas Jeanne
- Institut de Recherche et de Développement en Agroenvironnement, Québec, QC G1P 3W8, Canada
- Computational Biology Laboratory, CHU de Québec—Université Laval Research Center, Québec City, QC G1V 4G2, Canada
| | - Antoine Dionne
- Laboratoire d’Expertise et de Diagnostic en Phytoprotection, Ministère de l’Agriculture, des Pêcheries et de l’Alimentation du Québec (MAPAQ), Québec City, QC G1P 3W6, Canada
| | - Julien Vivancos
- Laboratoire d’Expertise et de Diagnostic en Phytoprotection, Ministère de l’Agriculture, des Pêcheries et de l’Alimentation du Québec (MAPAQ), Québec City, QC G1P 3W6, Canada
| | - Arnaud Droit
- Computational Biology Laboratory, CHU de Québec—Université Laval Research Center, Québec City, QC G1V 4G2, Canada
| | - Richard Hogue
- Institut de Recherche et de Développement en Agroenvironnement, Québec, QC G1P 3W8, Canada
| |
Collapse
|
5
|
Caño De Las Heras S, Gargalo CL, Caccavale F, Gernaey KV, Krühne U. NyctiDB: A non-relational bioprocesses modeling database supported by an ontology. FRONTIERS IN CHEMICAL ENGINEERING 2022. [DOI: 10.3389/fceng.2022.1036867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Strategies to exploit and enable the digitalization of industrial processes are on course to become game-changers in optimizing (bio)chemical facilities. To achieve this, these industries face an increasing need for process models and, as importantly, an efficient way to store the models and data/information. Therefore, this work proposes developing an online information storage system that can facilitate the reuse and expansion of process models and make them available to the digitalization cycle. This system is named NyctiDB, and it is a novel non-relational database coupled with a bioprocess ontology. The ontology supports the selection and classification of bioprocess models focused information, while the database is in charge of the online storage of said information. Through a series of online collections, NyctiDB contains essential knowledge for the design, monitoring, control, and optimization of a bioprocess based on its mathematical model. Once NyctiDB has been implemented, its applicability and usefulness are demonstrated through two applications. Application A shows how NyctiDB is integrated inside the software architecture of an online educational bioprocess simulator. This implies that NyctiDB provides the information for the visualization of different bioprocess behaviours and the modifications of the models in the software. Moreover, the information related to the parameters and conditions of each model is used to support the users’ understanding of the process. Additionally, application B illustrates that NyctiDB can be used as AI enabler to further the research in this field through open-source and reliable data. This can, in fact, be used as the information source for the AI frameworks when developing, for example, hybrid models or smart expert systems for bioprocesses. Henceforth, this work aims to provide a blueprint on how to collect bioprocess modeling information and connect it to facilitate and empower the Internet-of-Things paradigm and the digitalization of the biomanufacturing industries.
Collapse
|
6
|
Singh A, Ambaru B, Bandsode V, Ahmed N. Panomics to decode virulence and fitness in Gram-negative bacteria. Front Cell Infect Microbiol 2022; 12:1061596. [PMID: 36478674 PMCID: PMC9719987 DOI: 10.3389/fcimb.2022.1061596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 10/26/2022] [Indexed: 11/22/2022] Open
|
7
|
Lobanov V, Gobet A, Joyce A. Ecosystem-specific microbiota and microbiome databases in the era of big data. ENVIRONMENTAL MICROBIOME 2022; 17:37. [PMID: 35842686 PMCID: PMC9287977 DOI: 10.1186/s40793-022-00433-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/29/2022] [Indexed: 05/05/2023]
Abstract
The rapid development of sequencing methods over the past decades has accelerated both the potential scope and depth of microbiota and microbiome studies. Recent developments in the field have been marked by an expansion away from purely categorical studies towards a greater investigation of community functionality. As in-depth genomic and environmental coverage is often distributed unequally across major taxa and ecosystems, it can be difficult to identify or substantiate relationships within microbial communities. Generic databases containing datasets from diverse ecosystems have opened a new era of data accessibility despite costs in terms of data quality and heterogeneity. This challenge is readily embodied in the integration of meta-omics data alongside habitat-specific standards which help contextualise datasets both in terms of sample processing and background within the ecosystem. A special case of large genomic repositories, ecosystem-specific databases (ES-DB's), have emerged to consolidate and better standardise sample processing and analysis protocols around individual ecosystems under study, allowing independent studies to produce comparable datasets. Here, we provide a comprehensive review of this emerging tool for microbial community analysis in relation to current trends in the field. We focus on the factors leading to the formation of ES-DB's, their comparison to traditional microbial databases, the potential for ES-DB integration with meta-omics platforms, as well as inherent limitations in the applicability of ES-DB's.
Collapse
Affiliation(s)
- Victor Lobanov
- Department of Marine Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden
| | | | - Alyssa Joyce
- Department of Marine Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden.
| |
Collapse
|
8
|
Industrially Important Genes from Trichoderma. Fungal Biol 2022. [DOI: 10.1007/978-3-030-91650-3_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
9
|
Pathak RK, Singh DB, Singh R. Introduction to basics of bioinformatics. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00006-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
10
|
Almerekova S, Genievskaya Y, Abugalieva S, Sato K, Turuspekov Y. Population Structure and Genetic Diversity of Two-Rowed Barley Accessions from Kazakhstan Based on SNP Genotyping Data. PLANTS 2021; 10:plants10102025. [PMID: 34685834 PMCID: PMC8540147 DOI: 10.3390/plants10102025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 09/17/2021] [Accepted: 09/24/2021] [Indexed: 12/22/2022]
Abstract
The genetic relationship and population structure of two-rowed barley accessions from Kazakhstan were assessed using single-nucleotide polymorphism (SNP) markers. Two different approaches were employed in the analysis: (1) the accessions from Kazakhstan were compared with barley samples from six different regions around the world using 1955 polymorphic SNPs, and (2) 94 accessions collected from six breeding programs from Kazakhstan were studied using 5636 polymorphic SNPs using a 9K Illumina Infinium assay. In the first approach, the neighbor-joining tree showed that the majority of the accessions from Kazakhstan were grouped in a separate subcluster with a common ancestral node; there was a sister subcluster that comprised mainly barley samples that originated in Europe. The Pearson’s correlation analysis suggested that Kazakh accessions were genetically close to samples from Africa and Europe. In the second approach, the application of the STRUCTURE package using 5636 polymorphic SNPs suggested that Kazakh barley samples consisted of five subclusters in three major clusters. The principal coordinate analysis plot showed that, among six breeding origins in Kazakhstan, the Krasnovodopad (KV) and Karaganda (KA) samples were the most distant groups. The assessment of the pedigrees in the KV and KA samples showed that the hybridization schemes in these breeding stations heavily used accessions from Ethiopia and Ukraine, respectively. The comparative analysis of the KV and KA samples allowed us to identify 214 SNPs with opposite allele frequencies that were tightly linked to 60 genes/gene blocks associated with plant adaptation traits, such as the heading date and plant height. The identified SNP markers can be efficiently used in studies of barley adaptation and deployed in breeding projects to develop new competitive cultivars.
Collapse
Affiliation(s)
- Shyryn Almerekova
- Laboratory of Molecular Genetics, Institute of Plant Biology and Biotechnology, Almaty 050040, Kazakhstan; (S.A.); (Y.G.); (S.A.)
- Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty 050038, Kazakhstan
| | - Yuliya Genievskaya
- Laboratory of Molecular Genetics, Institute of Plant Biology and Biotechnology, Almaty 050040, Kazakhstan; (S.A.); (Y.G.); (S.A.)
- Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty 050038, Kazakhstan
| | - Saule Abugalieva
- Laboratory of Molecular Genetics, Institute of Plant Biology and Biotechnology, Almaty 050040, Kazakhstan; (S.A.); (Y.G.); (S.A.)
- Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty 050038, Kazakhstan
| | - Kazuhiro Sato
- Institute of Plant Science and Resources, Okayama University, Kurashiki 710-0046, Japan;
| | - Yerlan Turuspekov
- Laboratory of Molecular Genetics, Institute of Plant Biology and Biotechnology, Almaty 050040, Kazakhstan; (S.A.); (Y.G.); (S.A.)
- Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty 050038, Kazakhstan
- Correspondence:
| |
Collapse
|
11
|
Proteomic Tools for the Analysis of Cytoskeleton Proteins. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2364:363-425. [PMID: 34542864 DOI: 10.1007/978-1-0716-1661-1_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence, as well as for predicting interactions with other proteins or macromolecules. Although a wealth of structural and functional information is available for many cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice while retaining advanced functionality for more experienced computational biologists.
Collapse
|
12
|
Jain S, Saxena A, Hesarur S, Bhadhadhara K, Bharti N, Kasibhatla SM, Sonavane U, Joshi R. GenoVault: a cloud based genomics repository. BioData Min 2021; 14:36. [PMID: 34325724 PMCID: PMC8319889 DOI: 10.1186/s13040-021-00268-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Accepted: 07/02/2021] [Indexed: 11/15/2022] Open
Abstract
GenoVault is a cloud-based repository for handling Next Generation Sequencing (NGS) data. It is developed using OpenStack-based private cloud with various services like keystone for authentication, cinder for block storage, neutron for networking and nova for managing compute instances for the Cloud. GenoVault uses object-based storage, which enables data to be stored as objects instead of files or blocks for faster retrieval from different distributed object nodes. Along with a web-based interface, a JavaFX-based desktop client has also been developed to meet the requirements of large file uploads that are usually seen in NGS datasets. Users can store files in their respective object-based storage areas and the metadata provided by the user during file uploads is used for querying the database. GenoVault repository is designed taking into account future needs and hence can scale both vertically and horizontally using OpenStack-based cloud features. Users have an option to make the data shareable to the public or restrict the access as private. Data security is ensured as every container is a separate entity in object-based storage architecture which is also supported by Secure File Transfer Protocol (SFTP) for data upload and download. The data is uploaded by the user in individual containers that include raw read files (fastq), processed alignment files (bam, sam, bed) and the output of variation detection (vcf). GenoVault architecture allows verification of the data in terms of integrity and authentication before making it available to collaborators as per the user’s permissions. GenoVault is useful for maintaining the organization-wide NGS data generated in various labs which is not yet published and submitted to public repositories like NCBI. GenoVault also provides support to share NGS data among the collaborating institutions. GenoVault can thus manage vast volumes of NGS data on any OpenStack-based private cloud.
Collapse
Affiliation(s)
- Sankalp Jain
- HPC-M&BA) Group, Centre for Development of Advanced Computing (C-DAC), Pune, MH, 411008, India
| | - Amit Saxena
- HPC-M&BA) Group, Centre for Development of Advanced Computing (C-DAC), Pune, MH, 411008, India
| | - Suprit Hesarur
- HPC-M&BA) Group, Centre for Development of Advanced Computing (C-DAC), Pune, MH, 411008, India
| | - Kirti Bhadhadhara
- HPC-M&BA) Group, Centre for Development of Advanced Computing (C-DAC), Pune, MH, 411008, India
| | - Neeraj Bharti
- HPC-M&BA) Group, Centre for Development of Advanced Computing (C-DAC), Pune, MH, 411008, India
| | | | - Uddhavesh Sonavane
- HPC-M&BA) Group, Centre for Development of Advanced Computing (C-DAC), Pune, MH, 411008, India
| | - Rajendra Joshi
- HPC-M&BA) Group, Centre for Development of Advanced Computing (C-DAC), Pune, MH, 411008, India.
| |
Collapse
|
13
|
Reel PS, Reel S, Pearson E, Trucco E, Jefferson E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 2021; 49:107739. [PMID: 33794304 DOI: 10.1016/j.biotechadv.2021.107739] [Citation(s) in RCA: 265] [Impact Index Per Article: 88.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/01/2021] [Accepted: 03/25/2021] [Indexed: 02/06/2023]
Abstract
With the development of modern high-throughput omic measurement platforms, it has become essential for biomedical studies to undertake an integrative (combined) approach to fully utilise these data to gain insights into biological systems. Data from various omics sources such as genetics, proteomics, and metabolomics can be integrated to unravel the intricate working of systems biology using machine learning-based predictive algorithms. Machine learning methods offer novel techniques to integrate and analyse the various omics data enabling the discovery of new biomarkers. These biomarkers have the potential to help in accurate disease prediction, patient stratification and delivery of precision medicine. This review paper explores different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease. It provides insight and recommendations for interdisciplinary professionals who envisage employing machine learning skills in multi-omics studies.
Collapse
Affiliation(s)
- Parminder S Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Smarti Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Ewan Pearson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Emanuele Trucco
- VAMPIRE project, Computing, School of Science and Engineering, University of Dundee, Dundee, United Kingdom
| | - Emily Jefferson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom.
| |
Collapse
|
14
|
Abstract
The patent literature should reflect the past 30 years of engineering efforts directed toward developing monoclonal antibody therapeutics. Such information is potentially valuable for rational antibody design. Patents, however, are designed not to convey scientific knowledge, but to provide legal protection. It is not obvious whether antibody information from patent documents, such as antibody sequences, is useful in conveying engineering know-how, rather than as a legal reference only. To assess the utility of patent data for therapeutic antibody engineering, we quantified the amount of antibody sequences in patents destined for medicinal purposes and how well they reflect the primary sequences of therapeutic antibodies in clinical use. We identified 16,526 patent families covering major jurisdictions (e.g., US Patent and Trademark Office (USPTO) and World Intellectual Property Organization) that contained antibody sequences. These families held 245,109 unique antibody chains (135,397 heavy chains and 109,712 light chains) that we compiled in our Patented Antibody Database (PAD, http://naturalantibody.com/pad). We find that antibodies make up a non-trivial proportion of all patent amino acid sequence depositions (e.g., 11% of USPTO Full Text database). Our analysis of the 16,526 families demonstrates that the volume of patent documents with antibody sequences is growing, with the majority of documents classified as containing antibodies for medicinal purposes. We further studied the 245,109 antibody chains from patent literature to reveal that they very well reflect the primary sequences of antibody therapeutics in clinical use. This suggests that the patent literature could serve as a reference for previous engineering efforts to improve rational antibody design.
Collapse
Affiliation(s)
- Konrad Krawczyk
- Research and Development, Natural Antibody, Hamburg, Germany
| | - Andrew Buchanan
- Department of Bio and Health Informatics, R&D, AstraZeneca, Cambridge, UK
| | | |
Collapse
|
15
|
Ali Shah SM, Taju SW, Ho QT, Nguyen TTD, Ou YY. GT-Finder: Classify the family of glucose transporters with pre-trained BERT language models. Comput Biol Med 2021; 131:104259. [PMID: 33581474 DOI: 10.1016/j.compbiomed.2021.104259] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/04/2021] [Accepted: 02/04/2021] [Indexed: 12/14/2022]
Abstract
Recently, language representation models have drawn a lot of attention in the field of natural language processing (NLP) due to their remarkable results. Among them, BERT (Bidirectional Encoder Representations from Transformers) has proven to be a simple, yet powerful language model that has achieved novel state-of-the-art performance. BERT adopted the concept of contextualized word embeddings to capture the semantics and context in which words appear. We utilized pre-trained BERT models to extract features from protein sequences for discriminating three families of glucose transporters: the major facilitator superfamily of glucose transporters (GLUTs), the sodium-glucose linked transporters (SGLTs), and the sugars will eventually be exported transporters (SWEETs). We treated protein sequences as sentences and transformed them into fixed-length meaningful vectors where a 768- or 1024-dimensional vector represents each amino acid. We observed that BERT-Base and BERT-Large models improved the performance by more than 4% in terms of average sensitivity and Matthews correlation coefficient (MCC), indicating the efficiency of this approach. We also developed a bidirectional transformer-based protein model (TransportersBERT) for comparison with existing pre-trained BERT models.
Collapse
Affiliation(s)
- Syed Muazzam Ali Shah
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Semmy Wellem Taju
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | - Quang-Thai Ho
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan
| | | | - Yu-Yen Ou
- Department of Computer Science & Engineering, Yuan Ze University, Chungli, 32003, Taiwan.
| |
Collapse
|
16
|
Padmavathi P, Setlur AS, Chandrashekar K, Niranjan V. A comprehensive in-silico computational analysis of twenty cancer exome datasets and identification of associated somatic variants reveals potential molecular markers for detection of varied cancer types. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100762] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
|
17
|
König P, Beier S, Basterrechea M, Schüler D, Arend D, Mascher M, Stein N, Scholz U, Lange M. BRIDGE - A Visual Analytics Web Tool for Barley Genebank Genomics. FRONTIERS IN PLANT SCIENCE 2020; 11:701. [PMID: 32595658 PMCID: PMC7300248 DOI: 10.3389/fpls.2020.00701] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 05/04/2020] [Indexed: 05/05/2023]
Abstract
Genebanks harbor a large treasure trove of untapped plant genetic diversity. A growing world population and a changing climate require an increase in the production and development of stress resistant plant cultivars while decreasing the acreage. These requirements for improved plant cultivars can be supported by the broader exploitation of plant genetic resources (PGR) as inputs for genomics-assisted breeding. To support this process we have developed BRIDGE, a data warehouse and exploratory data analysis tool for genebank genomics of barley (Hordeum vulgare L.). Using efficient technologies for data storage, data transfer and web development, we facilitate access to digital genebank resources of barley by prioritizing the interactive and visual analysis of integrated genotypic and phenotypic data. The underlying data resulted from a barley genebank genomics study cataloging sequence and morphological data of 22,626 barley accessions, mainly from the German Federal ex situ genebank. BRIDGE consists of interactively coupled modules to visualize integrated, curated and quality checked data, such as variation data, results of dimensionality reduction and genome wide association studies (GWAS), phenotyping results, passport data as well as the geographic distribution of germplasm samples. The core component is a manager for custom collections of germplasm. A search module to find and select germplasm by passport and phenotypic attributes is included as well as modules to export genotypic data in gzip-compressed variant call format (VCF) files and phenotypic data in MIAPPE-compliant ISA-Tab files. BRIDGE is accessible at the following URL: https://bridge.ipk-gatersleben.de.
Collapse
Affiliation(s)
- Patrick König
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Sebastian Beier
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Martin Basterrechea
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Leipzig, Germany
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
- Center for Integrated Breeding Research, Georg-August University, Göttingen, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, Germany
| |
Collapse
|
18
|
Grimshaw SG, Smith AM, Arnold DS, Xu E, Hoptroff M, Murphy B. The diversity and abundance of fungi and bacteria on the healthy and dandruff affected human scalp. PLoS One 2019; 14:e0225796. [PMID: 31851674 PMCID: PMC6919596 DOI: 10.1371/journal.pone.0225796] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/12/2019] [Indexed: 12/22/2022] Open
Abstract
Dandruff is a skin condition that affects the scalp of up to half the world's population, it is characterised by an itchy, flaky scalp and is associated with colonisation of the skin by Malassezia spp. Management of this condition is typically via antifungal therapies, however the precise role of microbes in the aggravation of the condition are incompletely characterised. Here, a combination of 454 sequencing and qPCR techniques were used to compare the scalp microbiota of dandruff and non-dandruff affected Chinese subjects. Based on 454 sequencing of the scalp microbiome, the two most abundant bacterial genera found on the scalp surface were Cutibacterium (formerly Propionibacterium) and Staphylococcus, while Malassezia was the main fungal inhabitant. Quantitative PCR (qPCR) analysis of four scalp taxa (M. restricta, M. globosa, C. acnes and Staphylococcus spp.) believed to represent the bulk of the overall population was additionally carried out. Metataxonomic and qPCR analyses were performed on healthy and lesional buffer scrub samples to facilitate assessment of whether the scalp condition is associated with differential microbial communities on the sampled skin. Dandruff was associated with greater frequencies of M. restricta and Staphylococcus spp. compared with the healthy population (p<0.05). Analysis also revealed the presence of an unclassified fungal taxon that could represent a novel Malassezia species.
Collapse
Affiliation(s)
- Sally G. Grimshaw
- Unilever Research & Development, Port Sunlight, England, United Kingdom
| | - Adrian M. Smith
- Unilever Research & Development, Colworth, England, United Kingdom
| | - David S. Arnold
- Unilever Research & Development, Port Sunlight, England, United Kingdom
| | - Elaine Xu
- Unilever Research & Development, Shanghai, China
| | - Michael Hoptroff
- Unilever Research & Development, Port Sunlight, England, United Kingdom
| | - Barry Murphy
- Unilever Research & Development, Port Sunlight, England, United Kingdom
| |
Collapse
|
19
|
Godini R, Fallahi H. A brief overview of the concepts, methods and computational tools used in phylogenetic tree construction and gene prediction. Meta Gene 2019. [DOI: 10.1016/j.mgene.2019.100586] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
|
20
|
Wahyuni RM, Utsumi T, Juniastuti, Yano Y, Murti IS, Amin M, Yamani LN, Istimagfiroh A, Purwono PB, Soetjipto, Lusida MI, Hayashi Y. Analysis of hepatitis B virus genotype and gene mutation in patients with advanced liver disease in East Kalimantan, Indonesia. Biomed Rep 2019; 10:303-310. [PMID: 31086664 PMCID: PMC6489537 DOI: 10.3892/br.2019.1202] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 03/12/2019] [Indexed: 12/18/2022] Open
Abstract
Liver cirrhosis (LC) and hepatocellular carcinoma (HCC) are life-threatening conditions frequently associated with chronic hepatitis B virus (HBV) infection in Asian countries, including Indonesia. HBV genotypes and several specific mutations are associated with disease progression. To clarify the geographical variation in viral characteristics, HBV genotypes and gene mutations were investigated in patients with advanced liver disease (ALD) in Samarinda, East Kalimantan, Indonesia. Sera were collected from 41 patients with ALD at Abdul Wahab Sjahranie Hospital and HBV carriers from Red Cross Center blood bank in Samarinda, and screened for hepatitis B surface antigen and hepatitis B e-antigen. Liver function data were obtained from the medical records from each patient. HBV genotype and gene mutations were determined by polymerase chain reaction sequencing. Analysis of HBV isolates indicated that genotype B was the most frequent genotype, at 85.4 and 97.8%, followed by C, at 14.6 and 2.2%, in patients with ALD and in HBV carriers, respectively. The C1505A mutation in X region, T1753V and A1762T/G1764A mutations in the basal core promoter region and C1858T in precore (PC) region were frequent and only detected in patients with ALD (28.9, 40, 73.5 and 17.6%, respectively), whereas the G1896A mutation in the PC region was frequently detected in HBV carriers. The presence of HBV genotype B and certain HBV gene mutations were characteristic of patients with ALD in East Kalimantan.
Collapse
Affiliation(s)
- Rury Mega Wahyuni
- Indonesia-Japan Collaborative Research Center for Emerging and Re-emerging Infectious Diseases, Institute of Tropical Disease, Airlangga University, Campus C, Surabaya 60115, Indonesia
| | - Takako Utsumi
- Indonesia-Japan Collaborative Research Center for Emerging and Re-emerging Infectious Diseases, Institute of Tropical Disease, Airlangga University, Campus C, Surabaya 60115, Indonesia
- Center for Infectious Diseases, Kobe University Graduate School of Medicine, Kobe, Hyogo 650-0017, Japan
| | - Juniastuti
- Indonesia-Japan Collaborative Research Center for Emerging and Re-emerging Infectious Diseases, Institute of Tropical Disease, Airlangga University, Campus C, Surabaya 60115, Indonesia
- Department of Microbiology, Faculty of Medicine, Airlangga University, Campus A, Surabaya 60131, Indonesia
| | - Yoshihiko Yano
- Center for Infectious Diseases, Kobe University Graduate School of Medicine, Kobe, Hyogo 650-0017, Japan
| | - Ignatia Sinta Murti
- Department of Internal Medicine, Faculty of Medicine, Mulawarman University, Samarinda 75119, Indonesia
| | - Mochamad Amin
- Indonesia-Japan Collaborative Research Center for Emerging and Re-emerging Infectious Diseases, Institute of Tropical Disease, Airlangga University, Campus C, Surabaya 60115, Indonesia
| | - Laura Navika Yamani
- Indonesia-Japan Collaborative Research Center for Emerging and Re-emerging Infectious Diseases, Institute of Tropical Disease, Airlangga University, Campus C, Surabaya 60115, Indonesia
- Department of Epidemiology, Faculty of Public Health, Airlangga University, Campus C, Surabaya 60115, Indonesia
| | - Anittaqwa Istimagfiroh
- Indonesia-Japan Collaborative Research Center for Emerging and Re-emerging Infectious Diseases, Institute of Tropical Disease, Airlangga University, Campus C, Surabaya 60115, Indonesia
| | - Priyo Budi Purwono
- Indonesia-Japan Collaborative Research Center for Emerging and Re-emerging Infectious Diseases, Institute of Tropical Disease, Airlangga University, Campus C, Surabaya 60115, Indonesia
- Department of Microbiology, Faculty of Medicine, Airlangga University, Campus A, Surabaya 60131, Indonesia
| | - Soetjipto
- Indonesia-Japan Collaborative Research Center for Emerging and Re-emerging Infectious Diseases, Institute of Tropical Disease, Airlangga University, Campus C, Surabaya 60115, Indonesia
- Department of Biochemistry, Faculty of Medicine, Airlangga University, Campus A, Surabaya 60131, Indonesia
| | - Maria Inge Lusida
- Indonesia-Japan Collaborative Research Center for Emerging and Re-emerging Infectious Diseases, Institute of Tropical Disease, Airlangga University, Campus C, Surabaya 60115, Indonesia
- Department of Microbiology, Faculty of Medicine, Airlangga University, Campus A, Surabaya 60131, Indonesia
| | - Yoshitake Hayashi
- Center for Infectious Diseases, Kobe University Graduate School of Medicine, Kobe, Hyogo 650-0017, Japan
| |
Collapse
|
21
|
Zhang L, Xiao M, Zhou J, Yu J. Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish-based LAUPs analysis application (JBLA). Bioinformatics 2018; 34:3624-3630. [DOI: 10.1093/bioinformatics/bty392] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 05/09/2018] [Indexed: 12/25/2022] Open
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
- School of Computer and Information Science, Southwest University, Chongqing, China
| | - Ming Xiao
- School of Computer and Information Science, Southwest University, Chongqing, China
- College of Mobile Telecommunications, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Jingsong Zhou
- College of Computer Science, Sichuan University, Chengdu, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
22
|
A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection. mSphere 2018; 3:mSphere00069-18. [PMID: 29564396 PMCID: PMC5853486 DOI: 10.1128/mspheredirect.00069-18] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Accepted: 02/16/2018] [Indexed: 12/20/2022] Open
Abstract
To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.
Collapse
|
23
|
Bhandary P, Seetharam AS, Arendsee ZW, Hur M, Wurtele ES. Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2018; 267:32-47. [PMID: 29362097 DOI: 10.1016/j.plantsci.2017.10.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/07/2017] [Accepted: 10/15/2017] [Indexed: 05/19/2023]
Abstract
More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system.
Collapse
Affiliation(s)
- Priyanka Bhandary
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Arun S Seetharam
- Genome Informatics Facility, Office of Biotechnology, Iowa State University, Ames, IA 50011, USA
| | - Zebulun W Arendsee
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Manhoi Hur
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA
| | - Eve Syrkin Wurtele
- Dept. of Genetics Development and Cell Biology, Iowa State University, Ames IA 50010, USA; Center for Metabolic Biology, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
24
|
Global Distribution Patterns and Pangenomic Diversity of the Candidate Phylum "Latescibacteria" (WS3). Appl Environ Microbiol 2017; 83:AEM.00521-17. [PMID: 28314726 DOI: 10.1128/aem.00521-17] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 03/11/2017] [Indexed: 01/01/2023] Open
Abstract
We investigated the global distribution patterns and pangenomic diversity of the candidate phylum "Latescibacteria" (WS3) in 16S rRNA gene as well as metagenomic data sets. We document distinct distribution patterns for various "Latescibacteria" orders in 16S rRNA gene data sets, with prevalence of orders sediment_1 in terrestrial, PBSIII_9 in groundwater and temperate freshwater, and GN03 in pelagic marine, saline-hypersaline, and wastewater habitats. Using a fragment recruitment approach, we identified 68.9 Mb of "Latescibacteria"-affiliated contigs in publicly available metagenomic data sets comprising 73,079 proteins. Metabolic reconstruction suggests a prevalent saprophytic lifestyle in all "Latescibacteria" orders, with marked capacities for the degradation of proteins, lipids, and polysaccharides predominant in plant, bacterial, fungal/crustacean, and eukaryotic algal cell walls. As well, extensive transport and central metabolic pathways for the metabolism of imported monomers were identified. Interestingly, genes and domains suggestive of the production of a cellulosome-e.g., protein-coding genes harboring dockerin I domains attached to a glycosyl hydrolase and scaffoldin-encoding genes harboring cohesin I and CBM37 domains-were identified in order PBSIII_9, GN03, and MSB-4E2 fragments recovered from four anoxic aquatic habitats; hence extending the cellulosomal production capabilities in Bacteria beyond the Gram-positive Firmicutes In addition to fermentative pathways, a complete electron transport chain with terminal cytochrome c oxidases Caa3 (for operation under high oxygen tension) and Cbb3 (for operation under low oxygen tension) were identified in PBSIII_9 and GN03 fragments recovered from oxygenated and partially/seasonally oxygenated aquatic habitats. Our metagenomic recruitment effort hence represents a comprehensive pangenomic view of this yet-uncultured phylum and provides insights broader than and complementary to those gained from genome recovery initiatives focusing on a single or few sampled environments.IMPORTANCE Our understanding of the phylogenetic diversity, metabolic capabilities, and ecological roles of yet-uncultured microorganisms is rapidly expanding. However, recent efforts mainly have been focused on recovering genomes of novel microbial lineages from a specific sampling site, rather than from a wide range of environmental habitats. To comprehensively evaluate the genomic landscape, putative metabolic capabilities, and ecological roles of yet-uncultured candidate phyla, efforts that focus on the recovery of genomic fragments from a wide range of habitats and that adequately sample the intraphylum diversity within a specific target lineage are needed. Here, we investigated the global distribution patterns and pangenomic diversity of the candidate phylum "Latescibacteria" Our results document the preference of specific "Latescibacteria" orders to specific habitats, the prevalence of plant polysaccharide degradation abilities within all "Latescibacteria" orders, the occurrence of all genes/domains necessary for the production of cellulosomes within three "Latescibacteria" orders (GN03, PBSIII_9, and MSB-4E2) in data sets recovered from anaerobic locations, and the identification of the components of an aerobic respiratory chain, as well as occurrence of multiple O2-dependent metabolic reactions in "Latescibacteria" orders GN03 and PBSIII_9 recovered from oxygenated habitats. The results demonstrate the value of phylocentric pangenomic surveys for understanding the global ecological distribution and panmetabolic abilities of yet-uncultured microbial lineages since they provide broader and more complementary insights than those gained from single-cell genomic and/or metagenomic-enabled genome recovery efforts focusing on a single sampling site.
Collapse
|
25
|
Urdidiales‐Nieto D, Navas‐Delgado I, Aldana‐Montes JF. Biological Web Service Repositories Review. Mol Inform 2017; 36:1600035. [PMID: 27783459 PMCID: PMC5434852 DOI: 10.1002/minf.201600035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2016] [Accepted: 09/27/2016] [Indexed: 12/26/2022]
Abstract
Web services play a key role in bioinformatics enabling the integration of database access and analysis of algorithms. However, Web service repositories do not usually publish information on the changes made to their registered Web services. Dynamism is directly related to the changes in the repositories (services registered or unregistered) and at service level (annotation changes). Thus, users, software clients or workflow based approaches lack enough relevant information to decide when they should review or re-execute a Web service or workflow to get updated or improved results. The dynamism of the repository could be a measure for workflow developers to re-check service availability and annotation changes in the services of interest to them. This paper presents a review on the most well-known Web service repositories in the life sciences including an analysis of their dynamism. Freshness is introduced in this paper, and has been used as the measure for the dynamism of these repositories.
Collapse
Affiliation(s)
- David Urdidiales‐Nieto
- Department of Computer Languages and Computing ScienceHigher Technical School of Computer Science EngineeringUniversity of MalagaMalaga29071Spain
| | - Ismael Navas‐Delgado
- Department of Computer Languages and Computing ScienceHigher Technical School of Computer Science EngineeringUniversity of MalagaMalaga29071Spain
| | - José F. Aldana‐Montes
- Department of Computer Languages and Computing ScienceHigher Technical School of Computer Science EngineeringUniversity of MalagaMalaga29071Spain
| |
Collapse
|
26
|
Bermudez-Santana CI. APLICACIONES DE LA BIOINFORMÁTICA EN LA MEDICINA: EL GENOMA HUMANO. ¿CÓMO PODEMOS VER TANTO DETALLE? ACTA BIOLÓGICA COLOMBIANA 2016. [DOI: 10.15446/abc.v21n1supl.51233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
<p lang="es-ES" align="JUSTIFY">La bioinformática es un campo novedoso que soporta parte de la investigación biológica dirigida a la identificación de variantes génicas que pueden ser descubiertas desde los estudios de genomas completos. Basados en esta motivación se presenta el panorama general de los aportes principales de la bioinformática en el desarrollo del secuenciamiento del primer genoma humano. Adicionalmente se resumen los principales avances en cómputo desarrollados para responder a las demandas requeridas por los métodos de secuenciamiento de última generación para lograr re-secuenciar un genoma humano. Finalmente se introducen algunos de los nuevos retos que deben asumirse para aplicar la genómica personalizada en el desarrollo de la medicina. </p><p lang="es-ES" align="JUSTIFY"> </p><p lang="es-ES" align="JUSTIFY">Abstract</p><p lang="es-ES" align="JUSTIFY">Bioinformatics is a new field that supports part of the biological research aimed at identifying gene variants that can be discovered from studies of whole genomes. Based on this motivation the overview of the main contributions of bioinformatics in the development of sequencing the first human genome is presented. Additionally it is summarized the main advances in computing developed to meet the demands to re-sequence a human genome by using the next generation sequencing technologies. Finally some new challenges that must be faced to apply the personalized genomics into the medicine development are introduced.</p>
Collapse
|
27
|
Somvanshi VS, Gahoi S, Banakar P, Thakur PK, Kumar M, Sajnani M, Pandey P, Rao U. A transcriptomic insight into the infective juvenile stage of the insect parasitic nematode, Heterorhabditis indica. BMC Genomics 2016; 17:166. [PMID: 26931371 PMCID: PMC4774024 DOI: 10.1186/s12864-016-2510-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 02/22/2016] [Indexed: 01/02/2023] Open
Abstract
Background Nematodes are the most numerous animals in the soil. Insect parasitic nematodes of the genus Heterorhabditis are capable of selectively seeking, infecting and killing their insect-hosts in the soil. The infective juvenile (IJ) stage of the Heterorhabditis nematodes is analogous to Caenorhabditis elegans dauer juvenile stage, which remains in ‘arrested development’ till it finds and infects a new insect-host in the soil. H. indica is the most prevalent species of Heterorhabditis in India. To understand the genes and molecular processes that govern the biology of the IJ stage, and to create a resource to facilitate functional genomics and genetic exploration, we sequenced the transcriptome of H. indica IJs. Results The de-novo sequence assembly using Velvet-Oases pipeline resulted in 13,593 unique transcripts at N50 of 1,371 bp, of which 53 % were annotated by blastx. H. indica transcripts showed higher orthology with parasitic nematodes as compared to free living nematodes. In-silico expression analysis showed 30 % of transcripts expressing with ≥100 FPKM value. All the four canonical dauer formation pathways like cGMP-PKG, insulin, dafachronic acid and TGF-β were active in the IJ stage. Several other signaling pathways were highly represented in the transcriptome. Twenty-four orthologs of C. elegans RNAi pathway effector genes were discovered in H. indica, including nrde-3 that is reported for the first time in any of the parasitic nematodes. An ortholog of C. elegans tol-1 was also identified. Further, 272 kinases belonging to 137 groups, and several previously unidentified members of important gene classes were identified. Conclusions We generated high-quality transcriptome sequence data from H. indica IJs for the first time. The transcripts showed high similarity with the parasitic nematodes, M. hapla, and A. suum as opposed to C. elegans, a species to which H. indica is more closely related. The high representation of transcripts from several signaling pathways in the IJs indicates that despite being a developmentally arrested stage; IJs are a hotbed of signaling and are actively interacting with their environment. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2510-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Vishal S Somvanshi
- Division of Nematology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India.
| | - Shachi Gahoi
- Division of Nematology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India.
| | - Prakash Banakar
- Division of Nematology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India.
| | - Prasoon Kumar Thakur
- Division of Nematology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India.
| | - Mukesh Kumar
- Division of Nematology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India.
| | - Manisha Sajnani
- Division of Nematology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India.
| | - Priyatama Pandey
- Division of Nematology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India.
| | - Uma Rao
- Division of Nematology, ICAR-Indian Agricultural Research Institute, New Delhi, 110012, India.
| |
Collapse
|
28
|
Scarpati M, Heavner ME, Wiech E, Singh S. Proteomic Tools for the Analysis of Cytoskeleton Proteins. Methods Mol Biol 2016; 1365:385-413. [PMID: 26498799 DOI: 10.1007/978-1-4939-3124-8_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Proteomic analyses have become an essential part of the toolkit of the molecular biologist, given the widespread availability of genomic data and open source or freely accessible bioinformatics software. Tools are available for detecting homologous sequences, recognizing functional domains, and modeling the three-dimensional structure for any given protein sequence. Although a wealth of structural and functional information is available for a large number of cytoskeletal proteins, with representatives spanning all of the major subfamilies, the majority of cytoskeletal proteins remain partially or totally uncharacterized. Moreover, bioinformatics tools provide a means for studying the effects of synthetic mutations or naturally occurring variants of these cytoskeletal proteins. This chapter discusses various freely available proteomic analysis tools, with a focus on in silico prediction of protein structure and function. The selected tools are notable for providing an easily accessible interface for the novice, while retaining advanced functionality for more experienced computational biologists.
Collapse
Affiliation(s)
- Michael Scarpati
- Biology Program, The Graduate Center, City University of New York, New York, NY, USA
| | - Mary Ellen Heavner
- Biochemistry Program, The Graduate Center, City University of New York, New York, NY, USA
| | - Eliza Wiech
- Biology Program, The Graduate Center, City University of New York, New York, NY, USA
| | - Shaneen Singh
- Biochemistry Program, The Graduate Center, City University of New York, New York, NY, USA.
- Department of Biology, Brooklyn College, City University of New York, 209 Ingersoll Hall Extension, 2900 Bedford Ave., Brooklyn, NY, 11210, USA.
- Biology Program, The Graduate Center, City University of New York, New York, NY, USA.
| |
Collapse
|
29
|
Vaudel M, Barsnes H, Ræder H, Berven FS. Using Proteomics Bioinformatics Tools and Resources in Proteogenomic Studies. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 926:65-75. [DOI: 10.1007/978-3-319-42316-6_5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
30
|
Abstract
UNLABELLED Metagenomic data, which contains sequenced DNA reads of uncultured microbial species from environmental samples, provide a unique opportunity to thoroughly analyze microbial species that have never been identified before. Reconstructing 16S ribosomal RNA, a phylogenetic marker gene, is usually required to analyze the composition of the metagenomic data. However, massive volume of dataset, high sequence similarity between related species, skewed microbial abundance and lack of reference genes make 16S rRNA reconstruction difficult. Generic de novo assembly tools are not optimized for assembling 16S rRNA genes. In this work, we introduce a targeted rRNA assembly tool, REAGO (REconstruct 16S ribosomal RNA Genes from metagenOmic data). It addresses the above challenges by combining secondary structure-aware homology search, zproperties of rRNA genes and de novo assembly. Our experimental results show that our tool can correctly recover more rRNA genes than several popular generic metagenomic assembly tools and specially designed rRNA construction tools. AVAILABILITY AND IMPLEMENTATION The source code of REAGO is freely available at https://github.com/chengyuan/reago.
Collapse
Affiliation(s)
- Cheng Yuan
- Computer Science and Engineering, Michigan State Univerisity, 428 South Shaw Rd East Lansing, MI 48824, USA and Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA
| | - Jikai Lei
- Computer Science and Engineering, Michigan State Univerisity, 428 South Shaw Rd East Lansing, MI 48824, USA and Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA
| | - James Cole
- Computer Science and Engineering, Michigan State Univerisity, 428 South Shaw Rd East Lansing, MI 48824, USA and Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA
| | - Yanni Sun
- Computer Science and Engineering, Michigan State Univerisity, 428 South Shaw Rd East Lansing, MI 48824, USA and Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
31
|
Muthamilarasan M, Bonthala VS, Khandelwal R, Jaishankar J, Shweta S, Nawaz K, Prasad M. Global analysis of WRKY transcription factor superfamily in Setaria identifies potential candidates involved in abiotic stress signaling. FRONTIERS IN PLANT SCIENCE 2015; 6:910. [PMID: 26635818 PMCID: PMC4654423 DOI: 10.3389/fpls.2015.00910] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 10/12/2015] [Indexed: 05/18/2023]
Abstract
Transcription factors (TFs) are major players in stress signaling and constitute an integral part of signaling networks. Among the major TFs, WRKY proteins play pivotal roles in regulation of transcriptional reprogramming associated with stress responses. In view of this, genome- and transcriptome-wide identification of WRKY TF family was performed in the C4model plants, Setaria italica (SiWRKY) and S. viridis (SvWRKY), respectively. The study identified 105 SiWRKY and 44 SvWRKY proteins that were computationally analyzed for their physicochemical properties. Sequence alignment and phylogenetic analysis classified these proteins into three major groups, namely I, II, and III with majority of WRKY proteins belonging to group II (53 SiWRKY and 23 SvWRKY), followed by group III (39 SiWRKY and 11 SvWRKY) and group I (10 SiWRKY and 6 SvWRKY). Group II proteins were further classified into 5 subgroups (IIa to IIe) based on their phylogeny. Domain analysis showed the presence of WRKY motif and zinc finger-like structures in these proteins along with additional domains in a few proteins. All SiWRKY genes were physically mapped on the S. italica genome and their duplication analysis revealed that 10 and 8 gene pairs underwent tandem and segmental duplications, respectively. Comparative mapping of SiWRKY and SvWRKY genes in related C4 panicoid genomes demonstrated the orthologous relationships between these genomes. In silico expression analysis of SiWRKY and SvWRKY genes showed their differential expression patterns in different tissues and stress conditions. Expression profiling of candidate SiWRKY genes in response to stress (dehydration and salinity) and hormone treatments (abscisic acid, salicylic acid, and methyl jasmonate) suggested the putative involvement of SiWRKY066 and SiWRKY082 in stress and hormone signaling. These genes could be potential candidates for further characterization to delineate their functional roles in abiotic stress signaling.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Manoj Prasad
- National Institute of Plant Genome ResearchNew Delhi, India
| |
Collapse
|
32
|
Pedro H, Maheswari U, Urban M, Irvine AG, Cuzick A, McDowall MD, Staines DM, Kulesha E, Hammond-Kosack KE, Kersey PJ. PhytoPath: an integrative resource for plant pathogen genomics. Nucleic Acids Res 2015; 44:D688-93. [PMID: 26476449 PMCID: PMC4702788 DOI: 10.1093/nar/gkv1052] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 10/01/2015] [Indexed: 11/14/2022] Open
Abstract
PhytoPath (www.phytopathdb.org) is a resource for genomic and phenotypic data from plant pathogen species, that integrates phenotypic data for genes from PHI-base, an expertly curated catalog of genes with experimentally verified pathogenicity, with the Ensembl tools for data visualization and analysis. The resource is focused on fungi, protists (oomycetes) and bacterial plant pathogens that have genomes that have been sequenced and annotated. Genes with associated PHI-base data can be easily identified across all plant pathogen species using a BioMart-based query tool and visualized in their genomic context on the Ensembl genome browser. The PhytoPath resource contains data for 135 genomic sequences from 87 plant pathogen species, and 1364 genes curated for their role in pathogenicity and as targets for chemical intervention. Support for community annotation of gene models is provided using the WebApollo online gene editor, and we are working with interested communities to improve reference annotation for selected species.
Collapse
Affiliation(s)
- Helder Pedro
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Uma Maheswari
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Martin Urban
- Department of Plant Biology and Crop Science, Rothamsted Research, Harpenden, Herts, AL5 2JQ, UK
| | - Alistair George Irvine
- Department of Computational and Systems Biology, Rothamsted Research, Harpenden, Herts, AL5 2JQ, UK
| | - Alayne Cuzick
- Department of Plant Biology and Crop Science, Rothamsted Research, Harpenden, Herts, AL5 2JQ, UK
| | - Mark D McDowall
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Daniel M Staines
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Eugene Kulesha
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Paul Julian Kersey
- The European Molecular Biology Laboratory, The European Bioinformatics Institute, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
33
|
Moss WN, Steitz JA. In silico discovery and modeling of non-coding RNA structure in viruses. Methods 2015; 91:48-56. [PMID: 26116541 DOI: 10.1016/j.ymeth.2015.06.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 06/17/2015] [Accepted: 06/22/2015] [Indexed: 11/30/2022] Open
Abstract
This review covers several computational methods for discovering structured non-coding RNAs in viruses and modeling their putative secondary structures. Here we will use examples from two target viruses to highlight these approaches: influenza A virus-a relatively small, segmented RNA virus; and Epstein-Barr virus-a relatively large DNA virus with a complex transcriptome. Each system has unique challenges to overcome and unique characteristics to exploit. From these particular cases, generically useful approaches can be derived for the study of additional viral targets.
Collapse
Affiliation(s)
- Walter N Moss
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT 06536, USA
| | - Joan A Steitz
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT 06536, USA.
| |
Collapse
|
34
|
Greenfeld M, van de Meent JW, Pavlichin DS, Mabuchi H, Wiggins CH, Gonzalez RL, Herschlag D. Single-molecule dataset (SMD): a generalized storage format for raw and processed single-molecule data. BMC Bioinformatics 2015; 16:3. [PMID: 25591752 PMCID: PMC4384321 DOI: 10.1186/s12859-014-0429-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Accepted: 12/11/2014] [Indexed: 12/15/2022] Open
Abstract
Background Single-molecule techniques have emerged as incisive approaches for addressing a wide range of questions arising in contemporary biological research [Trends Biochem Sci 38:30–37, 2013; Nat Rev Genet 14:9–22, 2013; Curr Opin Struct Biol 2014, 28C:112–121; Annu Rev Biophys 43:19–39, 2014]. The analysis and interpretation of raw single-molecule data benefits greatly from the ongoing development of sophisticated statistical analysis tools that enable accurate inference at the low signal-to-noise ratios frequently associated with these measurements. While a number of groups have released analysis toolkits as open source software [J Phys Chem B 114:5386–5403, 2010; Biophys J 79:1915–1927, 2000; Biophys J 91:1941–1951, 2006; Biophys J 79:1928–1944, 2000; Biophys J 86:4015–4029, 2004; Biophys J 97:3196–3205, 2009; PLoS One 7:e30024, 2012; BMC Bioinformatics 288 11(8):S2, 2010; Biophys J 106:1327–1337, 2014; Proc Int Conf Mach Learn 28:361–369, 2013], it remains difficult to compare analysis for experiments performed in different labs due to a lack of standardization. Results Here we propose a standardized single-molecule dataset (SMD) file format. SMD is designed to accommodate a wide variety of computer programming languages, single-molecule techniques, and analysis strategies. To facilitate adoption of this format we have made two existing data analysis packages that are used for single-molecule analysis compatible with this format. Conclusion Adoption of a common, standard data file format for sharing raw single-molecule data and analysis outcomes is a critical step for the emerging and powerful single-molecule field, which will benefit both sophisticated users and non-specialists by allowing standardized, transparent, and reproducible analysis practices. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0429-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Max Greenfeld
- Department of Chemical Engineering, Stanford University, Stanford, CA, 94305, USA. .,Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA.
| | | | | | - Hideo Mabuchi
- Department of Applied Physics, Stanford University, Stanford, CA, 94305, USA.
| | - Chris H Wiggins
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY, 10027, USA.
| | - Ruben L Gonzalez
- Department of Chemistry, Columbia University, New York, NY, 10027, USA.
| | - Daniel Herschlag
- Department of Chemical Engineering, Stanford University, Stanford, CA, 94305, USA. .,Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA. .,Department of Biochemistry, B400, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
35
|
Cantacessi C, Hofmann A, Campbell BE, Gasser RB. Impact of next-generation technologies on exploring socioeconomically important parasites and developing new interventions. Methods Mol Biol 2015; 1247:437-474. [PMID: 25399114 DOI: 10.1007/978-1-4939-2004-4_31] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
High-throughput molecular and computer technologies have become instrumental for systems biological explorations of pathogens, including parasites. For instance, investigations of the transcriptomes of different developmental stages of parasitic nematodes give insights into gene expression, regulation and function in a parasite, which is a significant step to understanding their biology, as well as interactions with their host(s) and disease. This chapter (1) gives a background on some key parasitic nematodes of socioeconomic importance, (2) describes sequencing and bioinformatic technologies for large-scale studies of the transcriptomes and genomes of these parasites, (3) provides some recent examples of applications and (4) emphasizes the prospects of fundamental biological explorations of parasites using these technologies for the development of new interventions to combat parasitic diseases.
Collapse
Affiliation(s)
- Cinzia Cantacessi
- Department of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | | | | | | |
Collapse
|
36
|
Petit D, Teppa E, Mir AM, Vicogne D, Thisse C, Thisse B, Filloux C, Harduin-Lepers A. Integrative view of α2,3-sialyltransferases (ST3Gal) molecular and functional evolution in deuterostomes: significance of lineage-specific losses. Mol Biol Evol 2014; 32:906-27. [PMID: 25534026 PMCID: PMC4379398 DOI: 10.1093/molbev/msu395] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Sialyltransferases are responsible for the synthesis of a diverse range of sialoglycoconjugates predicted to be pivotal to deuterostomes’ evolution. In this work, we reconstructed the evolutionary history of the metazoan α2,3-sialyltransferases family (ST3Gal), a subset of sialyltransferases encompassing six subfamilies (ST3Gal I–ST3Gal VI) functionally characterized in mammals. Exploration of genomic and expressed sequence tag databases and search of conserved sialylmotifs led to the identification of a large data set of st3gal-related gene sequences. Molecular phylogeny and large scale sequence similarity network analysis identified four new vertebrate subfamilies called ST3Gal III-r, ST3Gal VII, ST3Gal VIII, and ST3Gal IX. To address the issue of the origin and evolutionary relationships of the st3gal-related genes, we performed comparative syntenic mapping of st3gal gene loci combined to ancestral genome reconstruction. The ten vertebrate ST3Gal subfamilies originated from genome duplication events at the base of vertebrates and are organized in three distinct and ancient groups of genes predating the early deuterostomes. Inferring st3gal gene family history identified also several lineage-specific gene losses, the significance of which was explored in a functional context. Toward this aim, spatiotemporal distribution of st3gal genes was analyzed in zebrafish and bovine tissues. In addition, molecular evolutionary analyses using specificity determining position and coevolved amino acid predictions led to the identification of amino acid residues with potential implication in functional divergence of vertebrate ST3Gal. We propose a detailed scenario of the evolutionary relationships of st3gal genes coupled to a conceptual framework of the evolution of ST3Gal functions.
Collapse
Affiliation(s)
- Daniel Petit
- INRA, UMR 1061, Unité Génétique Moléculaire Animale, F-87060 Limoges Cedex, France Université de Limoges, UMR 1061, Unité Génétique Moléculaire Animale, 123 avenue Albert Thomas, F-87060 Limoges Cedex, France
| | - Elin Teppa
- Bioinformatics Unit, Fundación Instituto Leloir, Buenos Aires, Argentina
| | - Anne-Marie Mir
- Laboratoire de Glycobiologie Structurale et Fonctionnelle, UMR 8576 CNRS, Université Lille Nord de France, Lille1, Villeneuve d'Ascq, France
| | - Dorothée Vicogne
- Laboratoire de Glycobiologie Structurale et Fonctionnelle, UMR 8576 CNRS, Université Lille Nord de France, Lille1, Villeneuve d'Ascq, France
| | - Christine Thisse
- Department of Cell Biology, School of Medicine, University of Virginia
| | - Bernard Thisse
- Department of Cell Biology, School of Medicine, University of Virginia
| | - Cyril Filloux
- INRA, UMR 1061, Unité Génétique Moléculaire Animale, F-87060 Limoges Cedex, France Université de Limoges, UMR 1061, Unité Génétique Moléculaire Animale, 123 avenue Albert Thomas, F-87060 Limoges Cedex, France
| | - Anne Harduin-Lepers
- Laboratoire de Glycobiologie Structurale et Fonctionnelle, UMR 8576 CNRS, Université Lille Nord de France, Lille1, Villeneuve d'Ascq, France
| |
Collapse
|
37
|
Lopez R, Cowley A, Li W, McWilliam H. Using EMBL-EBI Services via Web Interface and Programmatically via Web Services. ACTA ACUST UNITED AC 2014; 48:3.12.1-3.12.50. [PMID: 25501941 DOI: 10.1002/0471250953.bi0312s48] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows. This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages.
Collapse
Affiliation(s)
- Rodrigo Lopez
- EMBL Outstation-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | | | | |
Collapse
|
38
|
Krug K, Popic S, Carpy A, Taumer C, Macek B. Construction and assessment of individualized proteogenomic databases for large-scale analysis of nonsynonymous single nucleotide variants. Proteomics 2014; 14:2699-708. [DOI: 10.1002/pmic.201400219] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Revised: 08/02/2014] [Accepted: 09/19/2014] [Indexed: 01/08/2023]
Affiliation(s)
- Karsten Krug
- Proteome Center Tuebingen; University of Tuebingen; Germany
| | - Sasa Popic
- Proteome Center Tuebingen; University of Tuebingen; Germany
| | | | | | - Boris Macek
- Proteome Center Tuebingen; University of Tuebingen; Germany
| |
Collapse
|
39
|
De novo transcriptome sequencing and analysis of the cereal cyst nematode, Heterodera avenae. PLoS One 2014; 9:e96311. [PMID: 24802510 PMCID: PMC4011697 DOI: 10.1371/journal.pone.0096311] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 04/07/2014] [Indexed: 11/19/2022] Open
Abstract
The cereal cyst nematode (CCN, Heterodera avenae) is a major pest of wheat (Triticum spp) that reduces crop yields in many countries. Cyst nematodes are obligate sedentary endoparasites that reproduce by amphimixis. Here, we report the first transcriptome analysis of two stages of H. avenae. After sequencing extracted RNA from pre parasitic infective juvenile and adult stages of the life cycle, 131 million Illumina high quality paired end reads were obtained which generated 27,765 contigs with N50 of 1,028 base pairs, of which 10,452 were annotated. Comparative analyses were undertaken to evaluate H. avenae sequences with those of other plant, animal and free living nematodes to identify differences in expressed genes. There were 4,431 transcripts common to H. avenae and the free living nematode Caenorhabditis elegans, and 9,462 in common with more closely related potato cyst nematode, Globodera pallida. Annotation of H. avenae carbohydrate active enzymes (CAZy) revealed fewer glycoside hydrolases (GHs) but more glycosyl transferases (GTs) and carbohydrate esterases (CEs) when compared to M. incognita. 1,280 transcripts were found to have secretory signature, presence of signal peptide and absence of transmembrane. In a comparison of genes expressed in the pre-parasitic juvenile and feeding female stages, expression levels of 30 genes with high RPKM (reads per base per kilo million) value, were analysed by qRT-PCR which confirmed the observed differences in their levels of expression levels. In addition, we have also developed a user-friendly resource, Heterodera transcriptome database (HATdb) for public access of the data generated in this study. The new data provided on the transcriptome of H. avenae adds to the genetic resources available to study plant parasitic nematodes and provides an opportunity to seek new effectors that are specifically involved in the H. avenae-cereal host interaction.
Collapse
|
40
|
Russanov K, Lefort F, Atanassov A, Atanassov I. The Bulgarian Plant Genomics Database: A Web-Backed Molecular Genetics Database for Plant Biotechnology and Management of Plant Genetic Resources in Bulgaria. BIOTECHNOL BIOTEC EQ 2014. [DOI: 10.1080/13102818.2003.10817050] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
41
|
Abstract
Initially designed to infer evolutionary relationships based on morphological and physiological characters, phylogenetic reconstruction methods have greatly benefited from recent developments in molecular biology and sequencing technologies with a number of powerful methods having been developed specifically to infer phylogenies from macromolecular data. This chapter, while presenting an overview of basic concepts and methods used in phylogenetic reconstruction, is primarily intended as a simplified step-by-step guide to the construction of phylogenetic trees from nucleotide sequences using fairly up-to-date maximum likelihood methods implemented in freely available computer programs. While the analysis of chloroplast sequences from various Vanilla species is used as an illustrative example, the techniques covered here are relevant to the comparative analysis of homologous sequences datasets sampled from any group of organisms.
Collapse
Affiliation(s)
- Alexandre De Bruyn
- Pôle de Protection des Plantes, CIRAD, UMR PVBMT, Université de la Réunion, Saint-Pierre, France
| | | | | |
Collapse
|
42
|
Archivierung von Genomdaten. MED GENET-BERLIN 2013. [DOI: 10.1007/s11825-013-0403-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Zusammenfassung
Angesichts der zunehmenden Datenflut in der Genomforschung wird ein effizientes Forschungsdatenmanagement, verbunden mit einer sicheren und nachhaltigen Archivierung, auch in diesem Wissenschaftsbereich immer wichtiger. Der letzte von 3 Artikeln der Reihe „Forschungsdatenmanagement von Genomdaten“ beschreibt allgemein den Lebenszyklus von Forschungsdaten – ausgehend von deren Planung, über die Auswahl und Übernahme der Daten für die Speicherung bis hin zu notwendigen Erhaltungsmaßnahmen und dem Zugriff durch Datennutzer. Archive spielen in fast allen Phasen dieses Zyklus eine Rolle und bilden daher eine wichtige Komponente der Verarbeitung von Genomdaten. Beispielhaft werden 3 öffentliche europäische Archive für Genomdaten vorgestellt: die Datenbank des European Molecular Biology Laboratory (EMBL), das Sequence Read Archive und das Trace Archive. Da jede dieser Einrichtungen jedoch auf eine bestimmte Art von Daten spezialisiert ist, bleibt ein Bedarf an zusätzlichen Langzeitarchiven, die flexibel mit verschiedenen Datentypen umgehen bzw. auf zusätzliche Datentypen erweitert werden können. Für solche Archive wird ein generisches Konzept beschrieben und mit Empfehlungen für dessen praktische Umsetzung verbunden.
Collapse
|
43
|
Marx H, Lemeer S, Klaeger S, Rattei T, Kuster B. MScDB: a mass spectrometry-centric protein sequence database for proteomics. J Proteome Res 2013; 12:2386-98. [PMID: 23627461 DOI: 10.1021/pr400215r] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein sequence databases are indispensable tools for life science research including mass spectrometry (MS)-based proteomics. In current database construction processes, sequence similarity clustering is used to reduce redundancies in the source data. Albeit powerful, it ignores the peptide-centric nature of proteomic data and the fact that MS is able to distinguish similar sequences. Therefore, we introduce an approach that structures the protein sequence space at the peptide level using theoretical and empirical information from large-scale proteomic data to generate a mass spectrometry-centric protein sequence database (MScDB). The core modules of MScDB are an in-silico proteolytic digest and a peptide-centric clustering algorithm that groups protein sequences that are indistinguishable by mass spectrometry. Analysis of various MScDB uses cases against five complex human proteomes, resulting in 69 peptide identifications not present in UniProtKB as well as 79 putative single amino acid polymorphisms. MScDB retains ~99% of the identifications in comparison to common databases despite a 3-48% increase in the theoretical peptide search space (but comparable protein sequence space). In addition, MScDB enables cross-species applications such as human/mouse graft models, and our results suggest that the uncertainty in protein assignments to one species can be smaller than 20%.
Collapse
Affiliation(s)
- Harald Marx
- Technische Universität München, Emil-Erlenmeyer-Forum 5, 85354 Freising, Germany
| | | | | | | | | |
Collapse
|
44
|
A practical approach to reconstruct evolutionary history of animal sialyltransferases and gain insights into the sequence-function relationships of Golgi-glycosyltransferases. Methods Mol Biol 2013; 1022:73-97. [PMID: 23765655 DOI: 10.1007/978-1-62703-465-4_7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
In higher vertebrates, sialyltransferases catalyze the transfer of sialic acid residues, either Neu5Ac or Neu5Gc or KDN from an activated sugar donor, which is mainly CMP-Neu5Ac in human tissues, to the hydroxyl group of another saccharide acceptor. In the human genome, 20 unique genes have been described that encode enzymes with remarkable specificity with regards to their acceptor substrates and the glycosidic linkage formed. A systematic search of sialyltransferase-related sequences in genome and EST databases and the use of bioinformatic tools enabled us to investigate the evolutionary history of animal sialyltransferases and propose original models of divergent evolution of animal sialyltransferases. In this chapter, we extend our phylogenetic studies to the comparative analysis of the environment of sialyltransferase gene loci (synteny and paralogy studies), the variations of tissue expression of these genes and the analysis of amino-acid position evolution after gene duplications, in order to assess their sequence-function relationships and the molecular basis underlying their functional divergence.
Collapse
|
45
|
Cantacessi C, Campbell BE, Gasser RB. Key strongylid nematodes of animals — Impact of next-generation transcriptomics on systems biology and biotechnology. Biotechnol Adv 2012; 30:469-88. [DOI: 10.1016/j.biotechadv.2011.08.016] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Revised: 08/09/2011] [Accepted: 08/19/2011] [Indexed: 10/17/2022]
|
46
|
CANTACESSI C, CAMPBELL BE, JEX AR, YOUNG ND, HALL RS, RANGANATHAN S, GASSER RB. Bioinformatics meets parasitology. Parasite Immunol 2012; 34:265-75. [DOI: 10.1111/j.1365-3024.2011.01304.x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
47
|
Gasser RB, Cantacessi C. Heartworm genomics: unprecedented opportunities for fundamental molecular insights and new intervention strategies. Top Companion Anim Med 2012; 26:193-9. [PMID: 22152607 DOI: 10.1053/j.tcam.2011.09.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Vector-borne diseases, including canine heartworm disease (CHWD), are of major socioeconomic and canine health importance worldwide. Although many studies have provided insights into CHWD, to date there has been limited study of fundamental molecular aspects of Dirofilaria immitis itself, its relationship with the canine host, its vectors, as well as the potential of drug resistance to emerge, using advanced -omic technologies. This article takes a prospective view of the benefits that advanced -omics technologies will have toward understanding D. immitis and CHWD. Tackling key biological questions using these technologies will provide a "systems biology" context and could lead to radically new intervention and management strategies against heartworm.
Collapse
Affiliation(s)
- Robin B Gasser
- Faculty of Veterinary Science, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | | |
Collapse
|
48
|
Gasser RB, Cantacessi C, Campbell BE, Hofmann A, Otranto D. Major prospects for exploring canine vector borne diseases and novel intervention methods using 'omic technologies. Parasit Vectors 2011; 4:53. [PMID: 21489242 PMCID: PMC3095997 DOI: 10.1186/1756-3305-4-53] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 04/13/2011] [Indexed: 11/26/2022] Open
Abstract
Canine vector-borne diseases (CVBDs) are of major socioeconomic importance worldwide. Although many studies have provided insights into CVBDs, there has been limited exploration of fundamental molecular aspects of most pathogens, their vectors, pathogen-host relationships and disease and drug resistance using advanced, 'omic technologies. The aim of the present article is to take a prospective view of the impact that next-generation, 'omics technologies could have, with an emphasis on describing the principles of transcriptomic/genomic sequencing as well as bioinformatic technologies and their implications in both fundamental and applied areas of CVBD research. Tackling key biological questions employing these technologies will provide a 'systems biology' context and could lead to radically new intervention and management strategies against CVBDs.
Collapse
Affiliation(s)
- Robin B Gasser
- Department of Veterinary Science, The University of Melbourne, 250 Princes Highway, Werribee, Victoria 3030, Australia
| | - Cinzia Cantacessi
- Department of Veterinary Science, The University of Melbourne, 250 Princes Highway, Werribee, Victoria 3030, Australia
| | - Bronwyn E Campbell
- Department of Veterinary Science, The University of Melbourne, 250 Princes Highway, Werribee, Victoria 3030, Australia
| | - Andreas Hofmann
- Structural Chemistry Program, Eskitis Institute for Cell & Molecular Therapies, Griffith University, Brisbane, Queensland, Australia
| | - Domenico Otranto
- Dipartimento di Sanità Pubblica e Zootecnia, Facoltà di Medicina Veterinaria, Università di Bari, Str. prov. le per Casamassima Km 3, 70010, Valenzano, Bari, Italy
| |
Collapse
|
49
|
Jones MO, Koutsovoulos GD, Blaxter ML. iPhy: an integrated phylogenetic workbench for supermatrix analyses. BMC Bioinformatics 2011; 12:30. [PMID: 21261969 PMCID: PMC3037854 DOI: 10.1186/1471-2105-12-30] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Accepted: 01/24/2011] [Indexed: 11/11/2022] Open
Abstract
Background The increasing availability of molecular sequence data means that the accuracy of future phylogenetic studies is likely to by limited by systematic bias and taxon choice rather than by data. In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts. Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use. Results Here we present iPhy, a web application that lets non-technical users assemble, share and analyse DNA sequence datasets for multigene phylogenetic investigations. Built on a simple client-server architecture, iPhy eases the collection of gene sets for analysis, facilitates alignment and reliably generates phylogenetic analysis-ready data files. Phylogenetic trees generated in external programs can be imported and stored, and iPhy integrates with iTol to allow trees to be displayed with rich data annotation. The datasets collated in iPhy can be shared through the client interface. We show how systematic biases can be addressed by using explicit criteria when selecting sequences for analysis from a large dataset. A representative instance of iPhy can be accessed at iphy.bio.ed.ac.uk, but the toolkit can also be deployed on a local server for advanced users. Conclusions iPhy provides an easy-to-use environment for the assembly, analysis and sharing of large phylogenetic datasets, while encouraging best practices in terms of phylogenetic analysis and taxon selection.
Collapse
Affiliation(s)
- Martin O Jones
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH93JT, UK.
| | | | | |
Collapse
|
50
|
Lim SJ, Tan TW, Tong JC. Computational Epigenetics: the new scientific paradigm. Bioinformation 2010; 4:331-7. [PMID: 20978607 PMCID: PMC2957762 DOI: 10.6026/97320630004331] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2009] [Revised: 01/13/2010] [Accepted: 01/21/2010] [Indexed: 12/25/2022] Open
Abstract
Epigenetics has recently emerged as a critical field for studying how non-gene factors can influence the traits and functions of an organism. At the core of this new wave of research is the use of computational tools that play critical roles not only in directing the selection of key experiments, but also in formulating new testable hypotheses through detailed analysis of complex genomic information that is not achievable using traditional approaches alone. Epigenomics, which combines traditional genomics with computer science, mathematics, chemistry, biochemistry and proteomics for the large-scale analysis of heritable changes in phenotype, gene function or gene expression that are not dependent on gene sequence, offers new opportunities to further our understanding of transcriptional regulation, nuclear organization, development and disease. This article examines existing computational strategies for the study of epigenetic factors. The most important databases and bioinformatic tools in this rapidly growing field have been reviewed.
Collapse
Affiliation(s)
- Shen Jean Lim
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore 117597
| | | | | |
Collapse
|