1
|
The Relationship of the Mechanisms of the Pathogenesis of Multiple Sclerosis and the Expression of Endogenous Retroviruses. BIOLOGY 2020; 9:biology9120464. [PMID: 33322628 PMCID: PMC7764762 DOI: 10.3390/biology9120464] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 12/07/2020] [Accepted: 12/10/2020] [Indexed: 12/16/2022]
Abstract
Simple Summary Multiple sclerosis is a neurodegenerative disease of the central nervous system, develops at an early age and often leads to a disability. The etiological cause of the disease has not been fully elucidated, and as a result, no effective treatment is available. This review summarizes the current knowledge about the relationship between the expression of human endogenous retroviruses and the pathogenesis of multiple sclerosis. The epigenetic mechanisms of transcriptional regulation, the role of transcription factors, cytokines, and exogenous viruses are also addressed in this review. The elucidation of the mechanisms of an increase in endogenous retrovirus expression in multiple sclerosis could help to develop therapeutic strategies and novel methods for early diagnosis and treatment of the disease. Abstract Two human endogenous retroviruses of the HERV-W family can act as cofactors triggering multiple sclerosis (MS): MS-associated retrovirus (MSRV) and ERVWE1. Endogenous retroviral elements are believed to have integrated in our ancestors’ DNA millions of years ago. Their involvement in the pathogenesis of various diseases, including neurodegenerative pathologies, has been demonstrated. Numerous studies have shown a correlation between the deterioration of patients’ health and increased expression of endogenous retroviruses. The exact causes and mechanisms of endogenous retroviruses activation remains unknown, which hampers development of therapeutics. In this review, we will summarize the main characteristics of human endogenous W retroviruses and describe the putative mechanisms of activation, including epigenetic mechanisms, humoral factors as well as the role of the exogenous viral infections.
Collapse
|
2
|
Gifford RJ, Blomberg J, Coffin JM, Fan H, Heidmann T, Mayer J, Stoye J, Tristem M, Johnson WE. Nomenclature for endogenous retrovirus (ERV) loci. Retrovirology 2018; 15:59. [PMID: 30153831 PMCID: PMC6114882 DOI: 10.1186/s12977-018-0442-1] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Accepted: 08/20/2018] [Indexed: 11/10/2022] Open
Abstract
Retroviral integration into germline DNA can result in the formation of a vertically inherited proviral sequence called an endogenous retrovirus (ERV). Over the course of their evolution, vertebrate genomes have accumulated many thousands of ERV loci. These sequences provide useful retrospective information about ancient retroviruses, and have also played an important role in shaping the evolution of vertebrate genomes. There is an immediate need for a unified system of nomenclature for ERV loci, not only to assist genome annotation, but also to facilitate research on ERVs and their impact on genome biology and evolution. In this review, we examine how ERV nomenclatures have developed, and consider the possibilities for the implementation of a systematic approach for naming ERV loci. We propose that such a nomenclature should not only provide unique identifiers for individual loci, but also denote orthologous relationships between ERVs in different species. In addition, we propose that-where possible-mnemonic links to previous, well-established names for ERV loci and groups should be retained. We show how this approach can be applied and integrated into existing taxonomic and nomenclature schemes for retroviruses, ERVs and transposable elements.
Collapse
Affiliation(s)
- Robert J Gifford
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK.
| | - Jonas Blomberg
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - John M Coffin
- Department of Molecular Biology and Microbiology, Tufts University, Boston, MA, USA
| | - Hung Fan
- Department of Molecular Biology and Biochemistry and Cancer Research Institute, University of California, Irvine, CA, 92697, USA
| | - Thierry Heidmann
- Department of Molecular Physiology and Pathology of Infectious and Endogenous Retroviruses, CNRS UMR 9196, Institut Gustave Roussy, 94805, Villejuif, France
| | - Jens Mayer
- Department of Human Genetics, Center of Human and Molecular Biology, Medical Faculty, University of Saarland, Homburg, Germany
| | - Jonathan Stoye
- The Francis Crick Institute, Mill Hill Laboratory, The Ridgeway, Mill Hill, London, UK
| | - Michael Tristem
- Imperial College London, Silwood Park Campus, Buckhurst Road, Ascot, Berkshire, SL5 7PY, UK
| | - Welkin E Johnson
- Biology Department, Boston College, Chestnut Hill, Massachusetts, 02467, USA.
| |
Collapse
|
3
|
Gatherer D. Genome Signatures, Self-Organizing Maps and Higher Order Phylogenies: A Parametric Analysis. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Genome signatures are data vectors derived from the compositional statistics of DNA. The self-organizing map (SOM) is a neural network method for the conceptualisation of relationships within complex data, such as genome signatures. The various parameters of the SOM training phase are investigated for their effect on the accuracy of the resulting output map. It is concluded that larger SOMs, as well as taking longer to train, are less sensitive in phylogenetic classification of unknown DNA sequences. However, where a classification can be made, a larger SOM is more accurate. Increasing the number of iterations in the training phase of the SOM only slightly increases accuracy, without improving sensitivity. The optimal length of the DNA sequence k-mer from which the genome signature should be derived is 4 or 5, but shorter values are almost as effective. In general, these results indicate that small, rapidly trained SOMs are generally as good as larger, longer trained ones for the analysis of genome signatures. These results may also be more generally applicable to the use of SOMs for other complex data sets, such as microarray data.
Collapse
Affiliation(s)
- Derek Gatherer
- MRC Virology Unit, Institute of Virology. Church Street, Glasgow G11 5JR, UK
| |
Collapse
|
4
|
Vargiu L, Rodriguez-Tomé P, Sperber GO, Cadeddu M, Grandi N, Blikstad V, Tramontano E, Blomberg J. Classification and characterization of human endogenous retroviruses; mosaic forms are common. Retrovirology 2016; 13:7. [PMID: 26800882 PMCID: PMC4724089 DOI: 10.1186/s12977-015-0232-y] [Citation(s) in RCA: 182] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2015] [Accepted: 12/16/2015] [Indexed: 02/06/2023] Open
Abstract
Background Human endogenous retroviruses (HERVs) represent the inheritance of ancient germ-line cell infections by exogenous retroviruses and the subsequent transmission of the integrated proviruses to the descendants. ERVs have the same internal structure as exogenous retroviruses. While no replication-competent HERVs have been recognized, some retain up to three of four intact ORFs. HERVs have been classified before, with varying scope and depth, notably in the RepBase/RepeatMasker system. However, existing classifications are bewildering. There is a need for a systematic, unifying and simple classification. We strived for a classification which is traceable to previous classifications and which encompasses HERV variation within a limited number of clades. Results The human genome assembly GRCh 37/hg19 was analyzed with RetroTector, which primarily detects relatively complete Class I and II proviruses. A total of 3173 HERV sequences were identified. The structure of and relations between these proviruses was resolved through a multi-step classification procedure that involved a novel type of similarity image analysis (“Simage”) which allowed discrimination of heterogeneous (noncanonical) from homogeneous (canonical) HERVs. Of the 3173 HERVs, 1214 were canonical and segregated into 39 canonical clades (groups), belonging to class I (Gamma- and Epsilon-like), II (Beta-like) and III (Spuma-like). The groups were chosen based on (1) sequence (nucleotide and Pol amino acid), similarity, (2) degree of fit to previously published clades, often from RepBase, and (3) taxonomic markers. The groups fell into 11 supergroups. The 1959 noncanonical HERVs contained 31 additional, less well-defined groups. Simage analysis revealed several types of mosaicism, notably recombination and secondary integration. By comparing flanking sequences, LTRs and completeness of gene structure, we deduced that some noncanonical HERVs proliferated after the recombination event. Groups were further divided into envelope subgroups (altogether 94) based on sequence similarity and characteristic “immunosuppressive domain” motifs. Intra and inter(super)group, as well as intraclass, recombination involving envelope genes (“env snatching”) was a common event. LTR divergence indicated that HERV-K(HML2) and HERVFC had the most recent integrations, HERVL and HUERSP3 the oldest. Conclusions A comprehensive HERV classification and characterization approach was undertaken. It should be applicable for classification of all ERVs. Recombination was common among HERV ancestors. Electronic supplementary material The online version of this article (doi:10.1186/s12977-015-0232-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Laura Vargiu
- Department of Life and Environmental Sciences, University of Cagliari, Cagliari, Italy. .,Center for Advanced Studies, Research and Development in Sardinia, CRS4, Pula, Italy. .,Nurideas S.r.l., Cagliari, Italy.
| | - Patricia Rodriguez-Tomé
- Center for Advanced Studies, Research and Development in Sardinia, CRS4, Pula, Italy. .,Nurideas S.r.l., Cagliari, Italy.
| | - Göran O Sperber
- Physiology Unit, Department of Neuroscience, Uppsala University, Uppsala, Sweden.
| | - Marta Cadeddu
- Department of Life and Environmental Sciences, University of Cagliari, Cagliari, Italy.
| | - Nicole Grandi
- Department of Life and Environmental Sciences, University of Cagliari, Cagliari, Italy.
| | - Vidar Blikstad
- Department of Medical Sciences, Uppsala University Hospital, Dag Hammarskjölds Väg 17, Uppsala, 751 85, Sweden.
| | - Enzo Tramontano
- Department of Life and Environmental Sciences, University of Cagliari, Cagliari, Italy.
| | - Jonas Blomberg
- Department of Medical Sciences, Uppsala University Hospital, Dag Hammarskjölds Väg 17, Uppsala, 751 85, Sweden.
| |
Collapse
|
5
|
Multiple groups of endogenous epsilon-like retroviruses conserved across primates. J Virol 2014; 88:12464-71. [PMID: 25142585 DOI: 10.1128/jvi.00966-14] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
UNLABELLED Several types of cancer in fish are caused by retroviruses, including those responsible for major outbreaks of disease, such as walleye dermal sarcoma virus and salmon swim bladder sarcoma virus. These viruses form a phylogenetic group often described as the epsilonretrovirus genus. Epsilon-like retroviruses have become endogenous retroviruses (ERVs) on several occasions, integrating into germ line cells to become part of the host genome, and sections of fish and amphibian genomes are derived from epsilon-like retroviruses. However, epsilon-like ERVs have been identified in very few mammals. We have developed a pipeline to screen full genomes for ERVs, and using this pipeline, we have located over 800 endogenous epsilon-like ERV fragments in primate genomes. Genomes from 32 species of mammals and birds were screened, and epsilon-like ERV fragments were found in all primate and tree shrew genomes but no others. These viruses appear to have entered the genome of a common ancestor of Old and New World monkeys between 42 million and 65 million years ago. Based on these results, there is an ancient evolutionary relationship between epsilon-like retroviruses and primates. Clearly, these viruses had the potential to infect the ancestors of primates and were at some point a common pathogen in these hosts. Therefore, this result raises questions about the potential of epsilonretroviruses to infect humans and other primates and about the evolutionary history of these retroviruses. IMPORTANCE Epsilonretroviruses are a group of retroviruses that cause several important diseases in fish. Retroviruses have the ability to become a permanent part of the DNA of their host by entering the germ line as endogenous retroviruses (ERVs), where they lose their infectivity over time but can be recognized as retroviruses for millions of years. Very few mammals are known to have epsilon-like ERVs; however, we have identified over 800 fragments of endogenous epsilon-like ERVs in the genomes of all major groups of primates, including humans. These viruses seem to have circulated and infected primate ancestors 42 to 65 million years ago. We are now interested in how these viruses have evolved and whether they have the potential to infect modern humans or other primates.
Collapse
|
6
|
Repeated comprehensibility maximization in competitive learning. Neural Comput Appl 2013. [DOI: 10.1007/s00521-011-0785-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
7
|
Kohonen T. Essentials of the self-organizing map. Neural Netw 2012; 37:52-65. [PMID: 23067803 DOI: 10.1016/j.neunet.2012.09.018] [Citation(s) in RCA: 354] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Revised: 09/05/2012] [Accepted: 09/24/2012] [Indexed: 11/19/2022]
Abstract
The self-organizing map (SOM) is an automatic data-analysis method. It is widely applied to clustering problems and data exploration in industry, finance, natural sciences, and linguistics. The most extensive applications, exemplified in this paper, can be found in the management of massive textual databases and in bioinformatics. The SOM is related to the classical vector quantization (VQ), which is used extensively in digital signal processing and transmission. Like in VQ, the SOM represents a distribution of input data items using a finite set of models. In the SOM, however, these models are automatically associated with the nodes of a regular (usually two-dimensional) grid in an orderly fashion such that more similar models become automatically associated with nodes that are adjacent in the grid, whereas less similar models are situated farther away from each other in the grid. This organization, a kind of similarity diagram of the models, makes it possible to obtain an insight into the topographic relationships of data, especially of high-dimensional data items. If the data items belong to certain predetermined classes, the models (and the nodes) can be calibrated according to these classes. An unknown input item is then classified according to that node, the model of which is most similar with it in some metric used in the construction of the SOM. A new finding introduced in this paper is that an input item can even more accurately be represented by a linear mixture of a few best-matching models. This becomes possible by a least-squares fitting procedure where the coefficients in the linear mixture of models are constrained to nonnegative values.
Collapse
Affiliation(s)
- Teuvo Kohonen
- Aalto University, School of Science, P.O. Box 15400, FI-00076 AALTO, Finland.
| |
Collapse
|
8
|
HERRERO ÁLVARO, ZURUTUZA URKO, CORCHADO EMILIO. A NEURAL-VISUALIZATION IDS FOR HONEYNET DATA. Int J Neural Syst 2012; 22:1250005. [DOI: 10.1142/s0129065712500050] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Neural intelligent systems can provide a visualization of the network traffic for security staff, in order to reduce the widely known high false-positive rate associated with misuse-based Intrusion Detection Systems (IDSs). Unlike previous work, this study proposes an unsupervised neural models that generate an intuitive visualization of the captured traffic, rather than network statistics. These snapshots of network events are immensely useful for security personnel that monitor network behavior. The system is based on the use of different neural projection and unsupervised methods for the visual inspection of honeypot data, and may be seen as a complementary network security tool that sheds light on internal data structures through visual inspection of the traffic itself. Furthermore, it is intended to facilitate verification and assessment of Snort performance (a well-known and widely-used misuse-based IDS), through the visualization of attack patterns. Empirical verification and comparison of the proposed projection methods are performed in a real domain, where two different case studies are defined and analyzed.
Collapse
Affiliation(s)
- ÁLVARO HERRERO
- Department of Civil Engineering, University of Burgos, Burgos, Spain
| | - URKO ZURUTUZA
- Electronics and Computing Department, Mondragon University, Arrasate-Mondragon, Spain
| | - EMILIO CORCHADO
- Departamento de Informática y Automática, Universidad de Salamanca, Salamanca, Spain
- Department of Computer Science, VŠB-Technical University of Ostrava, Ostrava, Czech Republic
| |
Collapse
|
9
|
Kamimura R. Relative information maximization and its application to the extraction of explicit class structure in SOM. Neurocomputing 2012. [DOI: 10.1016/j.neucom.2011.09.031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
10
|
BORRAJO MLOURDES, BARUQUE BRUNO, CORCHADO EMILIO, BAJO JAVIER, CORCHADO JUANM. HYBRID NEURAL INTELLIGENT SYSTEM TO PREDICT BUSINESS FAILURE IN SMALL-TO-MEDIUM-SIZE ENTERPRISES. Int J Neural Syst 2011; 21:277-96. [PMID: 21809475 DOI: 10.1142/s0129065711002833] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
During the last years there has been a growing need of developing innovative tools that can help small to medium sized enterprises to predict business failure as well as financial crisis. In this study we present a novel hybrid intelligent system aimed at monitoring the modus operandi of the companies and predicting possible failures. This system is implemented by means of a neural-based multi-agent system that models the different actors of the companies as agents. The core of the multi-agent system is a type of agent that incorporates a case-based reasoning system and automates the business control process and failure prediction. The stages of the case-based reasoning system are implemented by means of web services: the retrieval stage uses an innovative weighted voting summarization of self-organizing maps ensembles-based method and the reuse stage is implemented by means of a radial basis function neural network. An initial prototype was developed and the results obtained related to small and medium enterprises in a real scenario are presented.
Collapse
Affiliation(s)
- M. LOURDES BORRAJO
- Department of Informática, University of Vigo, Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense, 32004, Spain
| | - BRUNO BARUQUE
- Department of de Ingeniería Civil, University of Burgos, Esc. Politécnica Superior, Edificio C, C/Francisco de Vitoria, 09006, Burgos, Spain
| | - EMILIO CORCHADO
- Departamento de Informática y Automática, University of Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain
| | - JAVIER BAJO
- Departamento de Informática y Automática, University of Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain
| | - JUAN M. CORCHADO
- Departamento de Informática y Automática, University of Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain
| |
Collapse
|
11
|
Kamimura R. Double enhancement learning for explicit internal representations: unifying self-enhancement and information enhancement to incorporate information on input variables. APPL INTELL 2011. [DOI: 10.1007/s10489-011-0300-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
Kamimura R. Self-enhancement learning: target-creating learning and its application to self-organizing maps. BIOLOGICAL CYBERNETICS 2011; 104:305-338. [PMID: 21594651 DOI: 10.1007/s00422-011-0434-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2010] [Accepted: 04/23/2011] [Indexed: 05/30/2023]
Abstract
In this article, we propose a new learning method called "self-enhancement learning." In this method, targets for learning are not given from the outside, but they can be spontaneously created within a neural network. To realize the method, we consider a neural network with two different states, namely, an enhanced and a relaxed state. The enhanced state is one in which the network responds very selectively to input patterns, while in the relaxed state, the network responds almost equally to input patterns. The gap between the two states can be reduced by minimizing the Kullback-Leibler divergence between the two states with free energy. To demonstrate the effectiveness of this method, we applied self-enhancement learning to the self-organizing maps, or SOM, in which lateral interactions were added to an enhanced state. We applied the method to the well-known Iris, wine, housing and cancer machine learning database problems. In addition, we applied the method to real-life data, a student survey. Experimental results showed that the U-matrices obtained were similar to those produced by the conventional SOM. Class boundaries were made clearer in the housing and cancer data. For all the data, except for the cancer data, better performance could be obtained in terms of quantitative and topological errors. In addition, we could see that the trustworthiness and continuity, referring to the quality of neighborhood preservation, could be improved by the self-enhancement learning. Finally, we used modern dimensionality reduction methods and compared their results with those obtained by the self-enhancement learning. The results obtained by the self-enhancement were not superior to but comparable with those obtained by the modern dimensionality reduction methods.
Collapse
Affiliation(s)
- Ryotaro Kamimura
- IT Education Center, Tokai University, 1117 Kitakaname, Hiratsuka, Kanagawa 259-1292, Japan.
| |
Collapse
|
13
|
|
14
|
Blomberg J, Benachenhou F, Blikstad V, Sperber G, Mayer J. Classification and nomenclature of endogenous retroviral sequences (ERVs): problems and recommendations. Gene 2009; 448:115-23. [PMID: 19540319 DOI: 10.1016/j.gene.2009.06.007] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2009] [Revised: 06/09/2009] [Accepted: 06/12/2009] [Indexed: 01/27/2023]
Abstract
The genomes of many species are crowded with repetitive mobile sequences. In the case of endogenous retroviruses (ERVs) there is, for various reasons, considerable confusion regarding names assigned to families/groups of ERVs as well as individual ERV loci. Human ERVs have been studied in greater detail, and naming of HERVs in the scientific literature is somewhat confusing not just to the outsider. Without guidelines, confusion for ERVs in other species will also probably increase if those ERVs are studied in greater detail. Based on previous experience, this review highlights some of the problems when naming and classifying ERVs, and provides some guidance for detecting and characterizing ERV sequences. Because of the close relationship between ERVs and exogenous retroviruses (XRVs) it is reasonable to reconcile their classification with that of XRVs. We here argue that classification should be based on a combination of similarity, structural features, (inferred) function, and previous nomenclature. Because the RepBase system is widely employed in genome annotation, RepBase designations should be considered in further taxonomic efforts. To lay a foundation for a phylogenetically based taxonomy, further analyses of ERVs in many hosts are needed. A dedicated, permanent, international consortium would best be suited to integrate and communicate our current and future knowledge on repetitive, mobile elements in general to the scientific community.
Collapse
Affiliation(s)
- Jonas Blomberg
- Section of Virology, Department of Medical Sciences, Academic Hospital, 75185 Uppsala, Sweden.
| | | | | | | | | |
Collapse
|
15
|
Benachenhou F, Jern P, Oja M, Sperber G, Blikstad V, Somervuo P, Kaski S, Blomberg J. Evolutionary conservation of orthoretroviral long terminal repeats (LTRs) and ab initio detection of single LTRs in genomic data. PLoS One 2009; 4:e5179. [PMID: 19365549 PMCID: PMC2664473 DOI: 10.1371/journal.pone.0005179] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2008] [Accepted: 03/10/2009] [Indexed: 01/06/2023] Open
Abstract
Background Retroviral LTRs, paired or single, influence the transcription of both retroviral and non-retroviral genomic sequences. Vertebrate genomes contain many thousand endogenous retroviruses (ERVs) and their LTRs. Single LTRs are difficult to detect from genomic sequences without recourse to repetitiveness or presence in a proviral structure. Understanding of LTR structure increases understanding of LTR function, and of functional genomics. Here we develop models of orthoretroviral LTRs useful for detection in genomes and for structural analysis. Principal Findings Although mutated, ERV LTRs are more numerous and diverse than exogenous retroviral (XRV) LTRs. Hidden Markov models (HMMs), and alignments based on them, were created for HML- (human MMTV-like), general-beta-, gamma- and lentiretroviruslike LTRs, plus a general-vertebrate LTR model. Training sets were XRV LTRs and RepBase LTR consensuses. The HML HMM was most sensitive and detected 87% of the HML LTRs in human chromosome 19 at 96% specificity. By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found. HMM consensus sequences had a conserved modular LTR structure. Target site duplications (TG-CA), TATA (occasionally absent), an AATAAA box and a T-rich region were prominent features. Most of the conservation was located in, or adjacent to, R and U5, with evidence for stem loops. Several of the long HML LTRs contained long ORFs inserted after the second A rich module. HMM consensus alignment allowed comparison of functional features like transcriptional start sites (sense and antisense) between XRVs and ERVs. Conclusion The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters. The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.
Collapse
Affiliation(s)
- Farid Benachenhou
- Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Patric Jern
- Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Merja Oja
- Helsinki Institute for Information Technology, Department of Computer Science, University of Helsinki and Laboratory of Computer and Information Science, Helsinki University of Technology, Helsinki, Finland
| | - Göran Sperber
- Unit of Physiology, Department of Neuroscience, Uppsala University, Uppsala, Sweden
| | - Vidar Blikstad
- Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Panu Somervuo
- Helsinki Institute for Information Technology, Department of Computer Science, University of Helsinki and Laboratory of Computer and Information Science, Helsinki University of Technology, Helsinki, Finland
| | - Samuel Kaski
- Helsinki Institute for Information Technology, Department of Computer Science, University of Helsinki and Laboratory of Computer and Information Science, Helsinki University of Technology, Helsinki, Finland
| | - Jonas Blomberg
- Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
- * E-mail:
| |
Collapse
|
16
|
Sperber GO, Airola T, Jern P, Blomberg J. Automated recognition of retroviral sequences in genomic data--RetroTector. Nucleic Acids Res 2007; 35:4964-76. [PMID: 17636050 PMCID: PMC1976444 DOI: 10.1093/nar/gkm515] [Citation(s) in RCA: 120] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Eukaryotic genomes contain many endogenous retroviral sequences (ERVs). ERVs are often severely mutated, therefore difficult to detect. A platform independent (Java) program package, RetroTector© (ReTe), was constructed. It has three basic modules: (i) detection of candidate long terminal repeats (LTRs), (ii) detection of chains of conserved retroviral motifs fulfilling distance constraints and (iii) attempted reconstruction of original retroviral protein sequences, combining alignment, codon statistics and properties of protein ends. Other features are prediction of additional open reading frames, automated database collection, graphical presentation and automatic classification. ReTe favors elements >1000-bp long due to its dependence on order of and distances between retroviral fragments. It detects single or low-copy-number elements. ReTe assigned a ‘retroviral’ score of 890–2827 to 10 exogenous retroviruses from seven genera, and accurately predicted their genes. In a simulated model, ReTe was robust against mutational decay. The human genome was analyzed in 1–2 days on a LINUX cluster. Retroviral sequences were detected in divergent vertebrate genomes. Most ReTe detected chains were coincident with Repeatmasker output and the HERVd database. ReTe did not report most of the evolutionary old HERV-L related and MalR sequences, and is not yet tailored for single LTR detection. Nevertheless, ReTe rationally detects and annotates many retroviral sequences.
Collapse
Affiliation(s)
- Göran O. Sperber
- Department of Neuroscience, Physiology and Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala and Department of Biology and Chemical Engineering, Mälardalens Högskola, Eskilstuna, Sweden
| | - Tove Airola
- Department of Neuroscience, Physiology and Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala and Department of Biology and Chemical Engineering, Mälardalens Högskola, Eskilstuna, Sweden
| | - Patric Jern
- Department of Neuroscience, Physiology and Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala and Department of Biology and Chemical Engineering, Mälardalens Högskola, Eskilstuna, Sweden
| | - Jonas Blomberg
- Department of Neuroscience, Physiology and Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala and Department of Biology and Chemical Engineering, Mälardalens Högskola, Eskilstuna, Sweden
- *To whom correspondence should be addressed.+46 18 611 55 93+46 18 55 10 12
| |
Collapse
|
17
|
Oja M, Peltonen J, Blomberg J, Kaski S. Methods for estimating human endogenous retrovirus activities from EST databases. BMC Bioinformatics 2007; 8 Suppl 2:S11. [PMID: 17493249 PMCID: PMC1892069 DOI: 10.1186/1471-2105-8-s2-s11] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background Human endogenous retroviruses (HERVs) are surviving traces of ancient retrovirus infections and now reside within the human DNA. Recently HERV expression has been detected in both normal tissues and diseased patients. However, the activities (expression levels) of individual HERV sequences are mostly unknown. Results We introduce a generative mixture model, based on Hidden Markov Models, for estimating the activities of the individual HERV sequences from EST (expressed sequence tag) databases. We use the model to estimate the relative activities of 181 HERVs. We also empirically justify a faster heuristic method for HERV activity estimation and use it to estimate the activities of 2450 HERVs. The majority of the HERV activities were previously unknown. Conclusion (i) Our methods estimate activity accurately based on experiments on simulated data. (ii) Our estimate on real data shows that 7% of the HERVs are active. The active ones are spread unevenly into HERV groups and relatively uniformly in terms of estimated age. HERVs with the retroviral env gene are more often active than HERVs without env. Few of the active HERVs have open reading frames for retroviral proteins.
Collapse
Affiliation(s)
- Merja Oja
- Department of Computer Science, University of Helsinki, P.O. Box 68, FI-00014 University of Helsinki, Finland
- Helsinki Institute for Information Technology, Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FI-02015 TKK, Finland
| | - Jaakko Peltonen
- Helsinki Institute for Information Technology, Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FI-02015 TKK, Finland
| | - Jonas Blomberg
- Section of Virology, Department of Medical Sciences, Uppsala University, Academic Hospital, 751 85 Uppsala, Sweden
| | - Samuel Kaski
- Helsinki Institute for Information Technology, Laboratory of Computer and Information Science, Helsinki University of Technology, P.O. Box 5400, FI-02015 TKK, Finland
| |
Collapse
|
18
|
Mahony S, Benos PV, Smith TJ, Golden A. Self-organizing neural networks to support the discovery of DNA-binding motifs. Neural Netw 2006; 19:950-62. [PMID: 16839740 DOI: 10.1016/j.neunet.2006.05.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Identification of the short DNA sequence motifs that serve as binding targets for transcription factors is an important challenge in bioinformatics. Unsupervised techniques from the statistical learning theory literature have often been applied to motif discovery, but effective solutions for large genomic datasets have yet to be found. We present here three self-organizing neural networks that have applicability to the motif-finding problem. The core system in this study is a previously described SOM-based motif-finder named SOMBRERO. The motif-finder is integrated in this work with a SOM-based method that automatically constructs generalized models for structurally related motifs and initializes SOMBRERO with relevant biological knowledge. A self-organizing tree method that displays the relationships between various motifs is also presented, and it is shown that such a method can act as an effective structural classifier of novel motifs. The performance of the three self-organizing neural networks is evaluated here using various datasets.
Collapse
Affiliation(s)
- Shaun Mahony
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| | | | | | | |
Collapse
|
19
|
Jern P, Sperber GO, Blomberg J. Divergent patterns of recent retroviral integrations in the human and chimpanzee genomes: probable transmissions between other primates and chimpanzees. J Virol 2006; 80:1367-75. [PMID: 16415014 PMCID: PMC1346942 DOI: 10.1128/jvi.80.3.1367-1375.2006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The human genome is littered by endogenous retrovirus sequences (HERVs), which constitute up to 8% of the total genomic sequence. The sequencing of the human (Homo sapiens) and chimpanzee (Pan troglodytes) genomes has facilitated the evolutionary study of ERVs and related sequences. We screened both the human genome (version hg16) and the chimpanzee genome (version PanTro1) for ERVs and conducted a phylogenetic analysis of recent integrations. We found a number of recent integrations within both genomes. They segregated into four groups. Two larger gammaretrovirus-like groups (PtG1 and PtG2) occurred in chimpanzees but not in humans. The PtG sequences were most similar to two baboon ERVs and a macaque sequence but neither to other chimpanzee ERVs nor to any human gammaretrovirus-like ERVs. The pattern was consistent with cross-species transfer via predation. This appears to be an example of horizontal transfer of retroviruses with occasional fixation in the germ line.
Collapse
Affiliation(s)
- Patric Jern
- Section of Virology, Department of Medical Sciences, Uppsala University, Academic Hospital, Dag Hammarskjolds v. 17, SE-751 85 Uppsala, Sweden.
| | | | | |
Collapse
|
20
|
Jern P, Sperber GO, Blomberg J. Use of endogenous retroviral sequences (ERVs) and structural markers for retroviral phylogenetic inference and taxonomy. Retrovirology 2005; 2:50. [PMID: 16092962 PMCID: PMC1224870 DOI: 10.1186/1742-4690-2-50] [Citation(s) in RCA: 121] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2005] [Accepted: 08/10/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Endogenous retroviral sequences (ERVs) are integral parts of most eukaryotic genomes and vastly outnumber exogenous retroviruses (XRVs). ERVs with a relatively complete structure were retrieved from the genetic archives of humans and chickens, diametrically opposite representatives of vertebrate retroviruses (over 3300 proviruses), and analyzed, using a bioinformatic program, RetroTector, developed by us. This rich source of proviral information, accumulated in a local database, and a collection of XRV sequences from the literature, allowed the reconstruction of a Pol based phylogenetic tree, more extensive than previously possible. The aim was to find traits useful for classification and evolutionary studies of retroviruses. Some of these traits have been used by others, but they are here tested in a wider context than before. RESULTS In the ERV collection we found sequences similar to the XRV-based genera: alpha-, beta-, gamma-, epsilon- and spumaretroviruses. However, the occurrence of intermediates between them indicated an evolutionary continuum and suggested that taxonomic changes eventually will be necessary. No delta or lentivirus representatives were found among ERVs. Classification based on Pol similarity is congruent with a number of structural traits. Acquisition of dUTPase occurred three times in retroviral evolution. Loss of one or two NC zinc fingers appears to have occurred several times during evolution. Nucleotide biases have been described earlier for lenti-, delta- and betaretroviruses and were here confirmed in a larger context. CONCLUSION Pol similarities and other structural traits contribute to a better understanding of retroviral phylogeny. "Global" genomic properties useful in phylogenies are i.) translational strategy, ii.) number of Gag NC zinc finger motifs, iii.) presence of Pro N-terminal dUTPase (dUTPasePro), iv.) presence of Pro C-terminal G-patch and v.) presence of a GPY/F motif in the Pol integrase (IN) C-terminal domain. "Local" retroviral genomic properties useful for delineation of lower level taxa are i.) host species range, ii.) nucleotide compositional bias and iii.) LTR lengths.
Collapse
Affiliation(s)
- Patric Jern
- Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Göran O Sperber
- Unit of Physiology, Department of Neuroscience, Uppsala University, Uppsala, Sweden
| | - Jonas Blomberg
- Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| |
Collapse
|