1
|
Abstract
In Cassandra: A Novel and Four Essays (1984), Wolf retells the Trojan War story from the perspective of the seer Cassandra, taking the Trojan War as a parallel to issues of her day. She uses the Amazons as important secondary characters, representing them as both woman-loving women and warriors. Wolf believes their valor in battle is only a version of men's militarism and thus provides no solution to the problem of war, her primary concern.
Collapse
|
2
|
Kojima KK. Helenus and Ajax, Two Groups of Non-Autonomous LTR Retrotransposons, Represent a New Type of Small RNA Gene-Derived Mobile Elements. Biology (Basel) 2024; 13:119. [PMID: 38392337 PMCID: PMC10886601 DOI: 10.3390/biology13020119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 02/06/2024] [Accepted: 02/10/2024] [Indexed: 02/24/2024]
Abstract
Terminal repeat retrotransposons in miniature (TRIMs) are short non-autonomous long terminal repeat (LTR) retrotransposons found from various eukaryotes. Cassandra is a unique TRIM lineage which contains a 5S rRNA-derived sequence in its LTRs. Here, two new groups of TRIMs, designated Helenus and Ajax, are reported based on bioinformatics analysis and the usage of Repbase. Helenus is found from fungi, animals, and plants, and its LTRs contain a tRNA-like sequence. It includes two LTRs and between them, a primer-binding site (PBS) and polypurine tract (PPT) exist. Fungal and plant Helenus generate 5 bp target site duplications (TSDs) upon integration, while animal Helenus generates 4 bp TSDs. Ajax includes a 5S rRNA-derived sequence in its LTR and is found from two nemertean genomes. Ajax generates 5 bp TSDs upon integration. These results suggest that despite their unique promoters, Helenus and Ajax are TRIMs whose transposition is dependent on autonomous LTR retrotransposon. These TRIMs can originate through an insertion of SINE in an LTR of TRIM. The discovery of Helenus and Ajax suggests the presence of TRIMs with a promoter for RNA polymerase III derived from a small RNA gene, which is here collectively termed TRIMp3.
Collapse
Affiliation(s)
- Kenji K Kojima
- Genetic Information Research Institute, Cupertino, CA 95014, USA
| |
Collapse
|
3
|
Maiwald S, Mann L, Garcia S, Heitkam T. Evolving Together: Cassandra Retrotransposons Gradually Mirror Promoter Mutations of the 5S rRNA Genes. Mol Biol Evol 2024; 41:msae010. [PMID: 38262464 PMCID: PMC10853983 DOI: 10.1093/molbev/msae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/26/2023] [Accepted: 12/11/2023] [Indexed: 01/25/2024] Open
Abstract
The 5S rRNA genes are among the most conserved nucleotide sequences across all species. Similar to the 5S preservation we observe the occurrence of 5S-related nonautonomous retrotransposons, so-called Cassandras. Cassandras harbor highly conserved 5S rDNA-related sequences within their long terminal repeats, advantageously providing them with the 5S internal promoter. However, the dynamics of Cassandra retrotransposon evolution in the context of 5S rRNA gene sequence information and structural arrangement are still unclear, especially: (1) do we observe repeated or gradual domestication of the highly conserved 5S promoter by Cassandras and (2) do changes in 5S organization such as in the linked 35S-5S rDNA arrangements impact Cassandra evolution? Here, we show evidence for gradual co-evolution of Cassandra sequences with their corresponding 5S rDNAs. To follow the impact of 5S rDNA variability on Cassandra TEs, we investigate the Asteraceae family where highly variable 5S rDNAs, including 5S promoter shifts and both linked and separated 35S-5S rDNA arrangements have been reported. Cassandras within the Asteraceae mirror 5S rDNA promoter mutations of their host genome, likely as an adaptation to the host's specific 5S transcription factors and hence compensating for evolutionary changes in the 5S rDNA sequence. Changes in the 5S rDNA sequence and in Cassandras seem uncorrelated with linked/separated rDNA arrangements. We place all these observations into the context of angiosperm 5S rDNA-Cassandra evolution, discuss Cassandra's origin hypotheses (single or multiple) and Cassandra's possible impact on rDNA and plant genome organization, giving new insights into the interplay of ribosomal genes and transposable elements.
Collapse
Affiliation(s)
- Sophie Maiwald
- Faculty of Biology, Technische Universität Dresden, 01069 Dresden, Germany
| | - Ludwig Mann
- Faculty of Biology, Technische Universität Dresden, 01069 Dresden, Germany
| | - Sònia Garcia
- Institut Botànic de Barcelona, IBB (CSIC-MCNB), 08038 Barcelona, Catalonia, Spain
| | - Tony Heitkam
- Faculty of Biology, Technische Universität Dresden, 01069 Dresden, Germany
- Institute of Biology, NAWI Graz, Karl-Franzens-Universität, 8010 Graz, Austria
| |
Collapse
|
4
|
Silva-Muñoz M, Franzin A, Bersini H. Automatic configuration of the Cassandra database using irace. PeerJ Comput Sci 2021; 7:e634. [PMID: 34435094 PMCID: PMC8356662 DOI: 10.7717/peerj-cs.634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 06/18/2021] [Indexed: 06/13/2023]
Abstract
Database systems play a central role in modern data-centered applications. Their performance is thus a key factor in the efficiency of data processing pipelines. Modern database systems expose several parameters that users and database administrators can configure to tailor the database settings to the specific application considered. While this task has traditionally been performed manually, in the last years several methods have been proposed to automatically find the best parameter configuration for a database. Many of these methods, however, use statistical models that require high amounts of data and fail to represent all the factors that impact the performance of a database, or implement complex algorithmic solutions. In this work we study the potential of a simple model-free general-purpose configuration tool to automatically find the best parameter configuration of a database. We use the irace configurator to automatically find the best parameter configuration for the Cassandra NoSQL database using the YCBS benchmark under different scenarios. We establish a reliable experimental setup and obtain speedups of up to 30% over the default configuration in terms of throughput, and we provide an analysis of the configurations obtained.
Collapse
Affiliation(s)
| | - Alberto Franzin
- IRIDIA-CoDE, Université Libre de Bruxelles (ULB), Brussels, Belgium
| | - Hugues Bersini
- IRIDIA-CoDE, Université Libre de Bruxelles (ULB), Brussels, Belgium
| |
Collapse
|
5
|
Maiwald S, Weber B, Seibt KM, Schmidt T, Heitkam T. The Cassandra retrotransposon landscape in sugar beet (Beta vulgaris) and related Amaranthaceae: recombination and re-shuffling lead to a high structural variability. Ann Bot 2021; 127:91-109. [PMID: 33009553 PMCID: PMC7750724 DOI: 10.1093/aob/mcaa176] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 09/28/2020] [Indexed: 05/26/2023]
Abstract
BACKGROUND AND AIMS Plant genomes contain many retrotransposons and their derivatives, which are subject to rapid sequence turnover. As non-autonomous retrotransposons do not encode any proteins, they experience reduced selective constraints leading to their diversification into multiple families, usually limited to a few closely related species. In contrast, the non-coding Cassandra terminal repeat retrotransposons in miniature (TRIMs) are widespread in many plants. Their hallmark is a conserved 5S rDNA-derived promoter in their long terminal repeats (LTRs). As sugar beet (Beta vulgaris) has a well-described LTR retrotransposon landscape, we aim to characterize TRIMs in beet and related genomes. METHODS We identified Cassandra retrotransposons in the sugar beet reference genome and characterized their structural relationships. Genomic organization, chromosomal localization, and distribution of Cassandra-TRIMs across the Amaranthaceae were verified by Southern and fluorescent in situ hybridization. KEY RESULTS All 638 Cassandra sequences in the sugar beet genome contain conserved LTRs and thus constitute a single family. Nevertheless, variable internal regions required a subdivision into two Cassandra subfamilies within B. vulgaris. The related Chenopodium quinoa harbours a third subfamily. These subfamilies vary in their distribution within Amaranthaceae genomes, their insertion times and the degree of silencing by small RNAs. Cassandra retrotransposons gave rise to many structural variants, such as solo LTRs or tandemly arranged Cassandra retrotransposons. These Cassandra derivatives point to an interplay of template switch and recombination processes - mechanisms that likely caused Cassandra's subfamily formation and diversification. CONCLUSIONS We traced the evolution of Cassandra in the Amaranthaceae and detected a considerable variability within the short internal regions, whereas the LTRs are strongly conserved in sequence and length. Presumably these hallmarks make Cassandra a prime target for unequal recombination, resulting in the observed structural diversity, an example of the impact of LTR-mediated evolutionary mechanisms on the host genome.
Collapse
Affiliation(s)
- Sophie Maiwald
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| | - Beatrice Weber
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| | - Kathrin M Seibt
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| | - Thomas Schmidt
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| | - Tony Heitkam
- Institute of Botany, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
6
|
Abstract
The growth of pathogen genomics shows no signs of abating. Whole-genome sequencing of clinical viral and bacterial isolates continues to grow in nearly exponential bounds. Reductions in cost driven by new technology have created a seamless environment for generating, sharing, and analyzing pathogen genomes. The high-resolution view of infectious disease transmission dynamics offered by analyzing whole genomes from pathogens, coupled with the genomicist ethic of widespread data sharing, has created a veritable Internet of pathogens, which inadvertently produces new threats to patient privacy and protected heath information. The health care system, and society more generally, have yet to explore the far-reaching privacy concerns raised by readily accessible pathogen genomic data. The recent use of human genomic databases, the existence of freely available alternative data and metadata sources, and lax regulation of collecting publicly available genomes to identify individuals in a criminal context raise concerning parallels about what is possible with pathogen genomics. The growing ability to ascertain culpability for infectious disease transmission at a nearly individual level could change our perspective on disease outbreaks from one based on public health to one based on individual liability. These technological breakthroughs in the absence of an understanding of potential privacy and liability issues lead to questions about the dominant paradigm of better living through pathogen genomics.
Collapse
|
7
|
McClay W. A Magnetoencephalographic/Encephalographic (MEG/EEG) Brain-Computer Interface Driver for Interactive iOS Mobile Videogame Applications Utilizing the Hadoop Ecosystem, MongoDB, and Cassandra NoSQL Databases. Diseases 2018; 6:diseases6040089. [PMID: 30274210 PMCID: PMC6313514 DOI: 10.3390/diseases6040089] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2018] [Revised: 08/08/2018] [Accepted: 08/21/2018] [Indexed: 11/17/2022] Open
Abstract
In Phase I, we collected data on five subjects yielding over 90% positive performance in Magnetoencephalographic (MEG) mid-and post-movement activity. In addition, a driver was developed that substituted the actions of the Brain Computer Interface (BCI) as mouse button presses for real-time use in visual simulations. The process was interfaced to a flight visualization demonstration utilizing left or right brainwave thought movement, the user experiences, the aircraft turning in the chosen direction, or on iOS Mobile Warfighter Videogame application. The BCI’s data analytics of a subject’s MEG brain waves and flight visualization performance videogame analytics were stored and analyzed using the Hadoop Ecosystem as a quick retrieval data warehouse. In Phase II portion of the project involves the Emotiv Encephalographic (EEG) Wireless Brain–Computer interfaces (BCIs) allow for people to establish a novel communication channel between the human brain and a machine, in this case, an iOS Mobile Application(s). The EEG BCI utilizes advanced and novel machine learning algorithms, as well as the Spark Directed Acyclic Graph (DAG), Cassandra NoSQL database environment, and also the competitor NoSQL MongoDB database for housing BCI analytics of subject’s response and users’ intent illustrated for both MEG/EEG brainwave signal acquisition. The wireless EEG signals that were acquired from the OpenVibe and the Emotiv EPOC headset can be connected via Bluetooth to an iPhone utilizing a thin Client architecture. The use of NoSQL databases were chosen because of its schema-less architecture and Map Reduce computational paradigm algorithm for housing a user’s brain signals from each referencing sensor. Thus, in the near future, if multiple users are playing on an online network connection and an MEG/EEG sensor fails, or if the connection is lost from the smartphone and the webserver due to low battery power or failed data transmission, it will not nullify the NoSQL document-oriented (MongoDB) or column-oriented Cassandra databases. Additionally, NoSQL databases have fast querying and indexing methodologies, which are perfect for online game analytics and technology. In Phase II, we collected data on five MEG subjects, yielding over 90% positive performance on iOS Mobile Applications with Objective-C and C++, however on EEG signals utilized on three subjects with the Emotiv wireless headsets and (n < 10) subjects from the OpenVibe EEG database the Variational Bayesian Factor Analysis Algorithm (VBFA) yielded below 60% performance and we are currently pursuing extending the VBFA algorithm to work in the time-frequency domain referred to as VBFA-TF to enhance EEG performance in the near future. The novel usage of NoSQL databases, Cassandra and MongoDB, were the primary main enhancements of the BCI Phase II MEG/EEG brain signal data acquisition, queries, and rapid analytics, with MapReduce and Spark DAG demonstrating future implications for next generation biometric MEG/EEG NoSQL databases.
Collapse
Affiliation(s)
- Wilbert McClay
- Information Systems Department, Northeastern University, Boston, MA 02115, USA.
- Department School of Social Work, Tulane University School of Medicine, New Orleans, LA 70112, USA.
- Department of Information Assurance, Northeastern University, Boston, MA 02115, USA.
- Lawrence Livermore National Laboratory, Livermore, CA 94550, USA.
- Department of Information Assurance, Brandeis University, Waltham, MA 02453, USA.
- Department of Mathematics, Brandeis University, Waltham, MA 02453, USA.
| |
Collapse
|
8
|
Abstract
Next-generation sequencing, also known as high-throughput sequencing, has increased the volume of genetic data processed by sequencers. In the bioinformatic scientific area, highly rated multiple sequence alignment tools, such as MAFFT, ProbCons, and T-Coffee (TC), use the probabilistic consistency as a prior step to the progressive alignment stage to improve the final accuracy. However, such methods are severely limited by the memory required to store the consistency information. Big data processing and persistence techniques are used to manage and store the huge amount of information that is generated. Although these techniques have significant advantages, few biological applications have adopted them. In this article, a novel approach named big data tree-based consistency objective function for alignment evaluation (BDT-Coffee) is presented. BDT-Coffee is based on the integration of consistency information through Cassandra database in TC, previously generated by the MapReduce processing paradigm, to enable large data sets to be processed with the aim of improving the performance and scalability of the original algorithm.
Collapse
Affiliation(s)
- Jordi Lladós
- INSPIRES Research Center, Universitat de Lleida , Lleida, Spain
| | - Fernando Cores
- INSPIRES Research Center, Universitat de Lleida , Lleida, Spain
| | | |
Collapse
|
9
|
Harrison C, Keleş S, Hudson R, Shin S, Dutra I. atSNPInfrastructure, a case study for searching billions of records while providing significant cost savings over cloud providers. IEEE Int Symp Parallel Distrib Process Workshops Phd Forum 2018; 2018:497-506. [PMID: 30349760 PMCID: PMC6195815 DOI: 10.1109/ipdpsw.2018.00086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We explore the feasibility of a database storage engine housing up to 307 billion genetic Single Nucleotide Polymorphisms (SNP) for online access. We evaluate database storage engines and implement a solution utilizing factors such as dataset size, information gain, cost and hardware constraints. Our solution provides a full feature functional model for scalable storage and query-ability for researchers exploring the SNP's in the human genome. We address the scalability problem by building physical infrastructure and comparing final costs to a major cloud provider.
Collapse
Affiliation(s)
- Christopher Harrison
- Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin - Madison, Madison, Wisconsin USA
- Departamento de Ciência de Computadores, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Sündüz Keleş
- Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin - Madison, Madison, Wisconsin USA
| | - Rebecca Hudson
- Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin - Madison, Madison, Wisconsin USA
| | - Sunyoung Shin
- Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin - Madison, Madison, Wisconsin USA
| | - Inês Dutra
- Departamento de Ciência de Computadores, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| |
Collapse
|