1
|
Branco I, Choupina A. Bioinformatics: new tools and applications in life science and personalized medicine. Appl Microbiol Biotechnol 2021; 105:937-951. [PMID: 33404829 DOI: 10.1007/s00253-020-11056-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 11/29/2020] [Accepted: 12/09/2020] [Indexed: 11/28/2022]
Abstract
While we have a basic understanding of the functioning of the gene when coding sequences of specific proteins, we feel the lack of information on the role that DNA has on specific diseases or functions of thousands of proteins that are produced. Bioinformatics combines the methods used in the collection, storage, identification, analysis, and correlation of this huge and complex information. All this work produces an "ocean" of information that can only be "sailed" with the help of computerized methods. The goal is to provide scientists with the right means to explain normal biological processes, dysfunctions of these processes which give rise to disease and approaches that allow the discovery of new medical cures. Recently, sequencing platforms, a large scale of genomes and transcriptomes, have created new challenges not only to the genomics but especially for bioinformatics. The intent of this article is to compile a list of tools and information resources used by scientists to treat information from the massive sequencing of recent platforms to new generations and the applications of this information in different areas of life sciences including medicine. KEY POINTS: • Biological data mining • Omic approaches • From genotype to phenotype.
Collapse
Affiliation(s)
- Iuliia Branco
- Centro de Investigação de Montanha (CIMO), Instituto Politécnico de Bragança, Campus de Santa Apolónia, 5300-253, Bragança, Portugal
| | - Altino Choupina
- Centro de Investigação de Montanha (CIMO), Instituto Politécnico de Bragança, Campus de Santa Apolónia, 5300-253, Bragança, Portugal.
| |
Collapse
|
2
|
Lawrence TJ, Kauffman KT, Amrine KCH, Carper DL, Lee RS, Becich PJ, Canales CJ, Ardell DH. FAST: FAST Analysis of Sequences Toolbox. Front Genet 2015; 6:172. [PMID: 26042145 PMCID: PMC4437040 DOI: 10.3389/fgene.2015.00172] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 04/20/2015] [Indexed: 11/13/2022] Open
Abstract
FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought.
Collapse
Affiliation(s)
- Travis J Lawrence
- Quantitative and Systems Biology Program, University of California, Merced Merced, CA, USA
| | - Kyle T Kauffman
- Molecular Cell Biology Unit, School of Natural Sciences, University of California, Merced Merced, CA, USA
| | - Katherine C H Amrine
- Quantitative and Systems Biology Program, University of California, Merced Merced, CA, USA ; Department of Viticulture and Enology, University of California, Davis Davis, CA, USA
| | - Dana L Carper
- Quantitative and Systems Biology Program, University of California, Merced Merced, CA, USA
| | - Raymond S Lee
- School of Engineering, University of California, Merced Merced, CA, USA
| | - Peter J Becich
- Molecular Cell Biology Unit, School of Natural Sciences, University of California, Merced Merced, CA, USA
| | - Claudia J Canales
- School of Engineering, University of California, Merced Merced, CA, USA
| | - David H Ardell
- Quantitative and Systems Biology Program, University of California, Merced Merced, CA, USA ; Molecular Cell Biology Unit, School of Natural Sciences, University of California, Merced Merced, CA, USA
| |
Collapse
|
3
|
Dyall-Smith ML, Pfeiffer F, Klee K, Palm P, Gross K, Schuster SC, Rampp M, Oesterhelt D. Haloquadratum walsbyi: limited diversity in a global pond. PLoS One 2011; 6:e20968. [PMID: 21701686 PMCID: PMC3119063 DOI: 10.1371/journal.pone.0020968] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2011] [Accepted: 05/14/2011] [Indexed: 12/03/2022] Open
Abstract
Background Haloquadratum walsbyi commonly dominates the microbial flora of hypersaline waters. Its cells are extremely fragile squares requiring >14%(w/v) salt for growth, properties that should limit its dispersal and promote geographical isolation and divergence. To assess this, the genome sequences of two isolates recovered from sites at near maximum distance on Earth, were compared. Principal Findings Both chromosomes are 3.1 MB in size, and 84% of each sequence was highly similar to the other (98.6% identity), comprising the core sequence. ORFs of this shared sequence were completely synteneic (conserved in genomic orientation and order), without inversion or rearrangement. Strain-specific insertions/deletions could be precisely mapped, often allowing the genetic events to be inferred. Many inferred deletions were associated with short direct repeats (4–20 bp). Deletion-coupled insertions are frequent, producing different sequences at identical positions. In cases where the inserted and deleted sequences are homologous, this leads to variant genes in a common synteneic background (as already described by others). Cas/CRISPR systems are present in C23T but have been lost in HBSQ001 except for a few spacer remnants. Numerous types of mobile genetic elements occur in both strains, most of which appear to be active, and with some specifically targetting others. Strain C23T carries two ∼6 kb plasmids that show similarity to halovirus His1 and to sequences nearby halovirus/plasmid gene clusters commonly found in haloarchaea. Conclusions Deletion-coupled insertions show that Hqr. walsbyi evolves by uptake and precise integration of foreign DNA, probably originating from close relatives. Change is also driven by mobile genetic elements but these do not by themselves explain the atypically low gene coding density found in this species. The remarkable genome conservation despite the presence of active systems for genome rearrangement implies both an efficient global dispersal system, and a high selective fitness for this species.
Collapse
Affiliation(s)
- Mike L Dyall-Smith
- Department of Membrane Biochemistry, Max-Planck-Institute of Biochemistry, Martinsried, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
4
|
Schwibbert K, Marin-Sanguino A, Bagyan I, Heidrich G, Lentzen G, Seitz H, Rampp M, Schuster SC, Klenk HP, Pfeiffer F, Oesterhelt D, Kunte HJ. A blueprint of ectoine metabolism from the genome of the industrial producer Halomonas elongata DSM 2581 T. Environ Microbiol 2010; 13:1973-94. [PMID: 20849449 PMCID: PMC3187862 DOI: 10.1111/j.1462-2920.2010.02336.x] [Citation(s) in RCA: 173] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The halophilic γ-proteobacterium Halomonas elongata DSM 2581T thrives at high salinity by synthesizing and accumulating the compatible solute ectoine. Ectoine levels are highly regulated according to external salt levels but the overall picture of its metabolism and control is not well understood. Apart from its critical role in cell adaptation to halophilic environments, ectoine can be used as a stabilizer for enzymes and as a cell protectant in skin and health care applications and is thus produced annually on a scale of tons in an industrial process using H. elongata as producer strain. This paper presents the complete genome sequence of H. elongata (4 061 296 bp) and includes experiments and analysis identifying and characterizing the entire ectoine metabolism, including a newly discovered pathway for ectoine degradation and its cyclic connection to ectoine synthesis. The degradation of ectoine (doe) proceeds via hydrolysis of ectoine (DoeA) to Nα-acetyl-l-2,4-diaminobutyric acid, followed by deacetylation to diaminobutyric acid (DoeB). In H. elongata, diaminobutyric acid can either flow off to aspartate or re-enter the ectoine synthesis pathway, forming a cycle of ectoine synthesis and degradation. Genome comparison revealed that the ectoine degradation pathway exists predominantly in non-halophilic bacteria unable to synthesize ectoine. Based on the resulting genetic and biochemical data, a metabolic flux model of ectoine metabolism was derived that can be used to understand the way H. elongata survives under varying salt stresses and that provides a basis for a model-driven improvement of industrial ectoine production.
Collapse
Affiliation(s)
- Karin Schwibbert
- Materials and Environment Division, Federal Institute for Materials Research and Testing (BAM), Berlin, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Barrantes I, Glockner G, Meyer S, Marwan W. Transcriptomic changes arising during light-induced sporulation in Physarum polycephalum. BMC Genomics 2010; 11:115. [PMID: 20163733 PMCID: PMC2837032 DOI: 10.1186/1471-2164-11-115] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 02/17/2010] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Physarum polycephalum is a free-living amoebozoan protist displaying a complex life cycle, including alternation between single- and multinucleate stages through sporulation, a simple form of cell differentiation. Sporulation in Physarum can be experimentally induced by several external factors, and Physarum displays many biochemical features typical for metazoan cells, including metazoan-type signaling pathways, which makes this organism a model to study cell cycle, cell differentiation and cellular reprogramming. RESULTS In order to identify the genes associated to the light-induced sporulation in Physarum, especially those related to signal transduction, we isolated RNA before and after photoinduction from sporulation- competent cells, and used these RNAs to synthesize cDNAs, which were then analyzed using the 454 sequencing technology. We obtained 16,669 cDNAs that were annotated at every computational level. 13,169 transcripts included hit count data, from which 2,772 displayed significant differential expression (upregulated: 1,623; downregulated: 1,149). Transcripts with valid annotations and significant differential expression were later integrated into putative networks using interaction information from orthologs. CONCLUSIONS Gene ontology analysis suggested that most significantly downregulated genes are linked to DNA repair, cell division, inhibition of cell migration, and calcium release, while highly upregulated genes were involved in cell death, cell polarization, maintenance of integrity, and differentiation. In addition, cell death- associated transcripts were overrepresented between the upregulated transcripts. These changes are associated to a network of actin-binding proteins encoded by genes that are differentially regulated before and after light induction.
Collapse
Affiliation(s)
- Israel Barrantes
- Max Planck Institute for Dynamics of Complex Technical Systems and Magdeburg Centre for Systems Biology (MaCS), Otto von Guericke University, Magdeburg, Germany
| | | | | | | |
Collapse
|
6
|
Pfeiffer F, Broicher A, Gillich T, Klee K, Mejía J, Rampp M, Oesterhelt D. Genome information management and integrated data analysis with HaloLex. Arch Microbiol 2008; 190:281-99. [PMID: 18592220 PMCID: PMC2516542 DOI: 10.1007/s00203-008-0389-z] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2008] [Revised: 04/01/2008] [Accepted: 05/08/2008] [Indexed: 11/30/2022]
Abstract
HaloLex is a software system for the central management, integration, curation, and web-based visualization of genomic and other -omics data for any given microorganism. The system has been employed for the manual curation of three haloarchaeal genomes, namely Halobacterium salinarum (strain R1), Natronomonas pharaonis, and Haloquadratum walsbyi. HaloLex, in particular, enables the integrated analysis of genome-wide proteomic results with the underlying genomic data. This has proven indispensable to generate reliable gene predictions for GC-rich genomes, which, due to their characteristically low abundance of stop codons, are known to be hard targets for standard gene finders, especially concerning start codon assignment. The proteomic identification of more than 600 N-terminal peptides has greatly increased the reliability of the start codon assignment for Halobacterium salinarum. Application of homology-based methods to the published genome of Haloarcula marismortui allowed to detect 47 previously unidentified genes (a problem that is particularly serious for short protein sequences) and to correct more than 300 start codon misassignments.
Collapse
Affiliation(s)
- Friedhelm Pfeiffer
- Department of Membrane Biochemistry, Max-Planck-Institute of Biochemistry, Am Klopferspitz 18, 82152 Martinsried, Germany
| | | | | | | | | | | | | |
Collapse
|
7
|
Givan SA, Sullivan CM, Carrington JC. The Personal Sequence Database: a suite of tools to create and maintain web-accessible sequence databases. BMC Bioinformatics 2007; 8:479. [PMID: 18088438 PMCID: PMC2225426 DOI: 10.1186/1471-2105-8-479] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2007] [Accepted: 12/18/2007] [Indexed: 11/18/2022] Open
Abstract
Background Large molecular sequence databases are fundamental resources for modern bioscientists. Whether for project-specific purposes or sharing data with colleagues, it is often advantageous to maintain smaller sequence databases. However, this is usually not an easy task for the average bench scientist. Results We present the Personal Sequence Database (PSD), a suite of tools to create and maintain small- to medium-sized web-accessible sequence databases. All interactions with PSD tools occur via the internet with a web browser. Users may define sequence groups within their database that can be maintained privately or published to the web for public use. A sequence group can be downloaded, browsed, searched by keyword or searched for sequence similarities using BLAST. Publishing a sequence group extends these capabilities to colleagues and collaborators. In addition to being able to manage their own sequence databases, users can enroll sequences in BLASTAgent, a BLAST hit tracking system, to monitor NCBI databases for new entries displaying a specified level of nucleotide or amino acid similarity. Conclusion The PSD offers a valuable set of resources unavailable elsewhere. In addition to managing sequence data and BLAST search results, it facilitates data sharing with colleagues, collaborators and public users. The PSD is hosted by the authors and is available at .
Collapse
Affiliation(s)
- Scott A Givan
- Center for Genome Research and Biocomputing, Oregon State University, Corvallis, Oregon, USA.
| | | | | |
Collapse
|