1
|
Doorenweerd C, San Jose M, Leblanc L, Barr N, Geib SM, Chung AYC, Dupuis JR, Ekayanti A, Fiegalan E, Hemachandra KS, Aftab Hossain M, Huang CL, Hsu YF, Morris KY, Maryani A Mustapeng A, Niogret J, Pham TH, Thi Nguyen N, Sirisena UGAI, Todd T, Rubinoff D. Towards a better future for DNA barcoding: Evaluating monophyly- and distance-based species identification using COI gene fragments of Dacini fruit flies. Mol Ecol Resour 2024; 24:e13987. [PMID: 38956928 DOI: 10.1111/1755-0998.13987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/14/2024] [Accepted: 06/17/2024] [Indexed: 07/04/2024]
Abstract
The utility of a universal DNA 'barcode' fragment (658 base pairs of the Cytochrome C Oxidase I [COI] gene) has been established as a useful tool for species identification, and widely criticized as one for understanding the evolutionary history of a group. Large amounts of COI sequence data have been produced that hold promise for rapid species identification, for example, for biosecurity. The fruit fly tribe Dacini holds about a thousand species, of which 80 are pests of economic concern. We generated a COI reference library for 265 species of Dacini containing 5601 sequences that span most of the COI gene using circular consensus sequencing. We compared distance metrics versus monophyly assessments for species identification and although we found a 'soft' barcode gap around 2% pairwise distance, the exceptions to this rule dictate that a monophyly assessment is the only reliable method for species identification. We found that all fragments regularly used for Dacini fruit fly identification >450 base pairs long provide similar resolution. 11.3% of the species in our dataset were non-monophyletic in a COI tree, which is mostly due to species complexes. We conclude with recommendations for the future generation and use of COI libraries. We revise the generic assignment of Dacus transversus stat. rev. Hardy 1982, and Dacus perpusillus stat. rev. Drew 1971 and we establish Dacus maculipterus White 1998 syn. nov. as a junior synonym of Dacus satanas Liang et al. 1993.
Collapse
Affiliation(s)
- Camiel Doorenweerd
- Entomology Section, Department of Plant and Environmental Protection Sciences, College of Tropical Agriculture and Human Resources, University of Hawai'i at Mānoa, Honolulu, Hawaii, USA
| | - Michael San Jose
- Entomology Section, Department of Plant and Environmental Protection Sciences, College of Tropical Agriculture and Human Resources, University of Hawai'i at Mānoa, Honolulu, Hawaii, USA
| | - Luc Leblanc
- Department of Entomology, Plant Pathology and Nematology, University of Idaho, Moscow, Idaho, USA
| | - Norman Barr
- United States Department of Agriculture, Animal and Plant Health Inspection Service, Plant Protection and Quarantine, Science & Technology, Insect Management and Molecular Diagnostics Laboratory, Edinburg, Texas, USA
| | - Scott M Geib
- Tropical Pest Genetics and Molecular Biology Research Unit, Daniel K. Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, Hawaii, USA
| | - Arthur Y C Chung
- Forest Research Centre, Sabah Forestry Department, Sandakan, Sabah, Malaysia
| | - Julian R Dupuis
- Department of Entomology, University of Kentucky, Lexington, Kentucky, USA
| | - Arni Ekayanti
- Niogret Ecology Consulting LLC, Wotu, Luwu Timor, Sulawesi Seleaton, Indonesia
| | - Elaida Fiegalan
- Department of Crop Protection, College of Agriculture, Central Luzon State University, Science City of Muñoz, Nueva Ecija, Philippines
| | | | - Mohammad Aftab Hossain
- Insect Biotechnology Division, Institute of Food and Radiation Biology, Bangladesh Atomic Energy Commission, Dhaka, Bangladesh
| | - Chia-Lung Huang
- Institute of Oceanography, Minjiang University, Fuzhou, Fujian, China
| | - Yu-Feng Hsu
- Department of Life Science, National Taiwan Normal University, Taipei, Taiwan, ROC
| | - Kimberly Y Morris
- Tropical Pest Genetics and Molecular Biology Research Unit, Daniel K. Inouye U.S. Pacific Basin Agricultural Center, USDA Agricultural Research Services, Hilo, Hawaii, USA
| | | | - Jerome Niogret
- Centre for Tropical Environmental & Sustainability Science, Nguma-Bada Campus, James Cook University, Smithfield, Queensland, Australia
| | - Thai Hong Pham
- Mientrung Institute for Scientific Research, Vietnam Academy of Science and Technology (VAST), Hue, Vietnam
- Vietnam National Museum of Nature & Graduate School of Science and Technology, VAST, Hanoi, Vietnam
| | - Nhien Thi Nguyen
- Faculty of Biotechnology, Vietnam National University of Agriculture, Hanoi, Vietnam
| | - Uda G A I Sirisena
- Department of Plant Sciences, Faculty of Agriculture, Rajarata University of Sri Lanka, Mihintale, Sri Lanka
| | - Terrence Todd
- United States Department of Agriculture, Animal and Plant Health Inspection Service, Plant Protection and Quarantine, Science & Technology, Insect Management and Molecular Diagnostics Laboratory, Edinburg, Texas, USA
| | - Daniel Rubinoff
- Entomology Section, Department of Plant and Environmental Protection Sciences, College of Tropical Agriculture and Human Resources, University of Hawai'i at Mānoa, Honolulu, Hawaii, USA
| |
Collapse
|
2
|
Sebastião CS, Abecasis AB, Jandondo D, Sebastião JMK, Vigário J, Comandante F, Pingarilho M, Pocongo B, Cassinela E, Gonçalves F, Gomes P, Giovanetti M, Francisco NM, Sacomboio E, Brito M, Neto de Vasconcelos J, Morais J, Pimentel V. HIV-1 diversity and pre-treatment drug resistance in the era of integrase inhibitor among newly diagnosed ART-naïve adult patients in Luanda, Angola. Sci Rep 2024; 14:15893. [PMID: 38987263 PMCID: PMC11237101 DOI: 10.1038/s41598-024-66905-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 07/05/2024] [Indexed: 07/12/2024] Open
Abstract
The surveillance of drug resistance in the HIV-1 naïve population remains critical to optimizing the effectiveness of antiretroviral therapy (ART), mainly in the era of integrase strand transfer inhibitor (INSTI) regimens. Currently, there is no data regarding resistance to INSTI in Angola since Dolutegravir-DTG was included in the first-line ART regimen. Herein, we investigated the HIV-1 genetic diversity and pretreatment drug resistance (PDR) profile against nucleoside/tide reverse transcriptase inhibitors (NRTIs), non-nucleoside reverse transcriptase inhibitors (NNRTIs), protease inhibitors (PIs), and INSTIs, using a next-generation sequencing (NGS) approach with MinION, established to track and survey DRMs in Angola. This was a cross-sectional study comprising 48 newly HIV-diagnosed patients from Luanda, Angola, screened between March 2022 and May 2023. PR, RT, and IN fragments were sequenced for drug resistance and molecular transmission cluster analysis. A total of 45 out of the 48 plasma samples were successfully sequenced. Of these, 10/45 (22.2%) presented PDR to PIs/NRTIs/NNRTIs. Major mutations for NRTIs (2.2%), NNRTIs (20%), PIs (2.2%), and accessory mutations against INSTIs (13.3%) were detected. No major mutations against INSTIs were detected. M41L (2%) and I85V (2%) mutations were detected for NRTI and PI, respectively. K103N (7%), Y181C (7%), and K101E (7%) mutations were frequently observed in NNRTI. The L74M (9%) accessory mutation was frequently observed in the INSTI class. HIV-1 pure subtypes C (33%), F1 (17%), G (15%), A1 (10%), H (6%), and D (4%), CRF01_AG (4%) were observed, while about 10% were recombinant strains. About 31% of detected HIV-1C sequences were in clusters, suggesting small-scale local transmission chains. No major mutations against integrase inhibitors were detected, supporting the continued use of INSTI in the country. Further studies assessing the HIV-1 epidemiology in the era of INSTI-based ART regimens are needed in Angola.
Collapse
Affiliation(s)
- Cruz S Sebastião
- Centro de Investigação em Saúde de Angola (CISA), Caxito, Angola.
- Instituto Nacional de Investigação em Saúde (INIS), Luanda, Angola.
- Instituto de Ciências da Saúde (ICISA), Universidade Agostinho Neto (UAN), Luanda, Angola.
- Global Health and Tropical Medicine, GHTM, Associate Laboratory in Translation and Innovation Towards Global Health, LA-REAL, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Rua da Junqueira 100, 1349-008, Lisboa, Portugal.
| | - Ana B Abecasis
- Global Health and Tropical Medicine, GHTM, Associate Laboratory in Translation and Innovation Towards Global Health, LA-REAL, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Rua da Junqueira 100, 1349-008, Lisboa, Portugal
| | | | | | - João Vigário
- Instituto Nacional de Sangue (INS), Ministério da Saúde, Luanda, Angola
| | | | - Marta Pingarilho
- Global Health and Tropical Medicine, GHTM, Associate Laboratory in Translation and Innovation Towards Global Health, LA-REAL, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Rua da Junqueira 100, 1349-008, Lisboa, Portugal
| | - Bárbara Pocongo
- Instituto Nacional de Luta contra SIDA (INLS), Ministério da Saúde, Luanda, Angola
| | - Edson Cassinela
- Centro Nacional de Investigação Científica (CNIC), Luanda, Angola
| | - Fátima Gonçalves
- Laboratório de Biologia Molecular (LMCBM, SPC, CHLO-HEM), 1349-019, Lisbon, Portugal
| | - Perpétua Gomes
- Laboratório de Biologia Molecular (LMCBM, SPC, CHLO-HEM), 1349-019, Lisbon, Portugal
- Egas Moniz Center for Interdisciplinary Research (CiiEM), Egas Moniz School of Health & Sicence, Caparica, Almada, Portugal
| | - Marta Giovanetti
- Department of Science and Technology for Humans and the Environment, University of Campus Bio-Medico di Roma, Rome, Italy
| | | | - Euclides Sacomboio
- Instituto de Ciências da Saúde (ICISA), Universidade Agostinho Neto (UAN), Luanda, Angola
| | - Miguel Brito
- Centro de Investigação em Saúde de Angola (CISA), Caxito, Angola
- H&TRC-Health & Technology Research Center, ESTeSL-Escola Superior de Tecnologia da Saúde, Instituto Politécnico de Lisboa, Lisbon, Portugal
| | | | - Joana Morais
- Instituto Nacional de Investigação em Saúde (INIS), Luanda, Angola
| | - Victor Pimentel
- Global Health and Tropical Medicine, GHTM, Associate Laboratory in Translation and Innovation Towards Global Health, LA-REAL, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, UNL, Rua da Junqueira 100, 1349-008, Lisboa, Portugal.
| |
Collapse
|
3
|
Spilsberg B, Leithaug M, Christiansen DH, Dahl MM, Petersen PE, Lagesen K, Fiskebeck EMLZ, Moldal T, Boye M. Development and application of a whole genome amplicon sequencing method for infectious salmon anemia virus (ISAV). Front Microbiol 2024; 15:1392607. [PMID: 38873156 PMCID: PMC11169708 DOI: 10.3389/fmicb.2024.1392607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 05/07/2024] [Indexed: 06/15/2024] Open
Abstract
Infectious salmon anemia (ISA) is an infectious disease primarily affecting farmed Atlantic salmon, Salmo salar, which is caused by the ISA virus (ISAV). ISAV belongs to the Orthomyxoviridae family. The disease is a serious condition resulting in reduced fish welfare and high mortality. In this study, we designed an amplicon-based sequencing protocol for whole genome sequencing of ISAV. The method consists of 80 ISAV-specific primers that cover 92% of the virus genome and was designed to be used on an Illumina MiSeq platform. The sequencing accuracy was investigated by comparing sequences with previously published Sanger sequences. The sequences obtained were nearly identical to those obtained by Sanger sequencing, thus demonstrating that sequences produced by this amplicon sequencing protocol had an acceptable accuracy. The amplicon-based sequencing method was used to obtain the whole genome sequence of 12 different ISAV isolates from a small local epidemic in the northern part of Norway. Analysis of the whole genome sequences revealed that segment reassortment took place between some of the isolates and could identify which segments that had been reassorted.
Collapse
Affiliation(s)
- Bjørn Spilsberg
- Department of Analysis and Diagnostics, Norwegian Veterinary Institute, Ås, Norway
| | - Magnus Leithaug
- Department of Analysis and Diagnostics, Norwegian Veterinary Institute, Ås, Norway
| | | | - Maria Marjunardóttir Dahl
- National Reference Laboratory for Fish and Animal Diseases, Faroese Food and Veterinary Authority, Torshavn, Faroe Islands
| | - Petra Elisabeth Petersen
- National Reference Laboratory for Fish and Animal Diseases, Faroese Food and Veterinary Authority, Torshavn, Faroe Islands
| | - Karin Lagesen
- Department of Animal Health and Food Safety, Norwegian Veterinary Institute, Ås, Norway
| | | | - Torfinn Moldal
- Department of Aquatic Animal Health and Welfare, Norwegian Veterinary Institute, Ås, Norway
| | - Mette Boye
- Department of Analysis and Diagnostics, Norwegian Veterinary Institute, Ås, Norway
| |
Collapse
|
4
|
Nawaz MS, Fournier-Viger P, Nawaz S, Zhu H, Yun U. SPM4GAC: SPM based approach for genome analysis and classification of macromolecules. Int J Biol Macromol 2024; 266:130984. [PMID: 38513910 DOI: 10.1016/j.ijbiomac.2024.130984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 03/16/2024] [Indexed: 03/23/2024]
Abstract
Genome sequence analysis and classification play critical roles in properly understanding an organism's main characteristics, functionalities, and changing (evolving) nature. However, the rapid expansion of genomic data makes genome sequence analysis and classification a challenging task due to the high computational requirements, proper management, and understanding of genomic data. Recently proposed models yielded promising results for the task of genome sequence classification. Nevertheless, these models often ignore the sequential nature of nucleotides, which is crucial for revealing their underlying structure and function. To address this limitation, we present SPM4GAC, a sequential pattern mining (SPM)-based framework to analyze and classify the macromolecule genome sequences of viruses. First, a large dataset containing the genome sequences of various RNA viruses is developed and transformed into a suitable format. On the transformed dataset, algorithms for SPM are used to identify frequent sequential patterns of nucleotide bases. The obtained frequent sequential patterns of bases are then used as features to classify different viruses. Ten classifiers are employed, and their performance is assessed by using several evaluation measures. Finally, a performance comparison of SPM4GAC with state-of-the-art methods for genome sequence classification/detection reveals that SPM4GAC performs better than those methods.
Collapse
Affiliation(s)
- M Saqib Nawaz
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China.
| | | | - Shoaib Nawaz
- Department of Pharmacy, The University of Lahore, Sargodha Campus, Pakistan.
| | - Haowei Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China.
| | - Unil Yun
- Sejong University, Seoul, Republic of Korea.
| |
Collapse
|
5
|
Bianchessi L, Flach E, Monacchia G, Dagleish M, Maley M, Turin L, Rocchi MS. Identification and characterisation of Gamma-herpesviruses in zoo artiodactyla. Virol J 2024; 21:49. [PMID: 38395934 PMCID: PMC10893651 DOI: 10.1186/s12985-024-02311-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 02/04/2024] [Indexed: 02/25/2024] Open
Abstract
BACKGROUND Viruses within the γ-herpesviruses subfamily include the causative agents of Malignant Catarrhal Fever (MCF) in several species of the order Artiodactyla. MCF is a usually fatal lymphoproliferative disease affecting non-adapted host species. In adapted host species these viruses become latent and recrudesce and transmit during times of stress or immunosuppression. The undetected presence of MCF-causing viruses (MCFVs) is a risk to non-adapted hosts, especially within non-sympatric zoological collections. This study investigated the presence of MCFVs in six different zoological collections in the UK, to evaluate the presence of subclinical/latent MCFVs in carrier animals. METHODS One-hundred and thirty eight samples belonging to 54 different species of Artiodactyla were tested by Consensus Pan-herpes PCR. The positive samples were sequenced and subjected to phylogenetic analyses to understand their own evolutionary relationships and those with their hosts. RESULTS Twenty-five samples from 18 different species tested positive. All viruses but one clustered in the γ-herpesvirus family and within the Macavirus as well as the non-Macavirus groups (caprinae and alcelaphinae/hippotraginae clusters, respectively). A strong association between virus and host species was evident in the Macavirus group and clustering within the caprinae group indicated potential pathogenicity. CONCLUSION This study shows the presence of pathogenic and non-pathogenic MCFVs, as well as other γ-herpesviruses, in Artiodactyla species of conservation importance and allowed the identification of new herpesviruses in some non-adapted species.
Collapse
Affiliation(s)
- Laura Bianchessi
- Department of Veterinary Medicine and Animal Sciences, University of Milan, Via dell'Università 6, 26900, Milan, Italy
| | - Edmund Flach
- Wildlife Health Services, Zoological Society of London (retired), Regents Park, NW1 4RY, London, UK
| | - Giulia Monacchia
- CIRM Italian Malaria Network, University of Perugia, Perugia, Italy
| | - Mark Dagleish
- Division of Veterinary Pathology, Public Health and Disease Investigation, School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, 464 Bearsden Road, G61 1QH, Glasgow, UK
| | - Madeleine Maley
- Moredun Research Institute, Pentlands Science Park, Bush Loan, EH26 OPZ, Penicuik, UK
| | - Lauretta Turin
- Department of Veterinary Medicine and Animal Sciences, University of Milan, Via dell'Università 6, 26900, Milan, Italy.
| | - Mara Silvia Rocchi
- Moredun Research Institute, Pentlands Science Park, Bush Loan, EH26 OPZ, Penicuik, UK
| |
Collapse
|
6
|
Zheludev IN, Edgar RC, Lopez-Galiano MJ, de la Peña M, Babaian A, Bhatt AS, Fire AZ. Viroid-like colonists of human microbiomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576352. [PMID: 38293115 PMCID: PMC10827157 DOI: 10.1101/2024.01.20.576352] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Here, we describe the "Obelisks," a previously unrecognised class of viroid-like elements that we first identified in human gut metatranscriptomic data. "Obelisks" share several properties: (i) apparently circular RNA ~1kb genome assemblies, (ii) predicted rod-like secondary structures encompassing the entire genome, and (iii) open reading frames coding for a novel protein superfamily, which we call the "Oblins". We find that Obelisks form their own distinct phylogenetic group with no detectable sequence or structural similarity to known biological agents. Further, Obelisks are prevalent in tested human microbiome metatranscriptomes with representatives detected in ~7% of analysed stool metatranscriptomes (29/440) and in ~50% of analysed oral metatranscriptomes (17/32). Obelisk compositions appear to differ between the anatomic sites and are capable of persisting in individuals, with continued presence over >300 days observed in one case. Large scale searches identified 29,959 Obelisks (clustered at 90% nucleotide identity), with examples from all seven continents and in diverse ecological niches. From this search, a subset of Obelisks are identified to code for Obelisk-specific variants of the hammerhead type-III self-cleaving ribozyme. Lastly, we identified one case of a bacterial species (Streptococcus sanguinis) in which a subset of defined laboratory strains harboured a specific Obelisk RNA population. As such, Obelisks comprise a class of diverse RNAs that have colonised, and gone unnoticed in, human, and global microbiomes.
Collapse
Affiliation(s)
- Ivan N Zheludev
- Stanford University, Department of Biochemistry, Stanford, CA, USA
| | | | - Maria Jose Lopez-Galiano
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain
| | - Marcos de la Peña
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain
| | - Artem Babaian
- University of Toronto, Department of Molecular Genetics, Ontario, Canada
- University of Toronto, Donnelly Centre for Cellular and Biomolecular Research, Ontario, Canada
| | - Ami S Bhatt
- Stanford University, Department of Genetics, Stanford, CA, USA
- Stanford University, Department of Medicine, Division of Hematology, Stanford, CA, USA
| | - Andrew Z Fire
- Stanford University, Department of Genetics, Stanford, CA, USA
- Stanford University, Department of Pathology, Stanford, CA, USA
| |
Collapse
|
7
|
Štambuk N, Konjevoda P, Štambuk A. How ambiguity codes specify molecular descriptors and information flow in Code Biology. Biosystems 2023; 233:105034. [PMID: 37739308 DOI: 10.1016/j.biosystems.2023.105034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 09/12/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023]
Abstract
The article presents IUPAC ambiguity codes for incomplete nucleic acid specification, and their use in Code Biology. It is shown how to use this nomenclature in order to extract accurate information on different properties of the biological systems. We investigated the use of ambiguity codes, as mathematical and logical operators and truth table elements, for the encoding of amino acids by means of the Standard Genetic Code. It is explained how to use ambiguity codes and truth functions in order to obtain accurate information on different properties of the biological systems. Nucleotide ambiguity codes could be applied to: 1. encoding descriptive information of nucleotides, amino acids and proteins (e.g., of polarity, relative solvent accessibility, atom depth, etc.), and 2. system modelling ranging from standard bioinformatics tools to classic evolutionary models (i.e. from Miyazawa-Jernigan statistical potential to Kimura three-substitution-type model, respectively). It is shown that the algorithms based on IUPAC ambiguity codes, Boolean functions and truth table, Probabilistic Square of Opposition/Semiotic Square and Klein 4-groups-could be used for the bioinformatics analyses and Relational data modelling in natural science. Underlying mathematical, logical and semiotic concepts of interest are presented and addressed.
Collapse
Affiliation(s)
- Nikola Štambuk
- Centre for Nuclear Magnetic Resonance, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Paško Konjevoda
- Laboratory for Epigenomics, Division of Molecular Medicine, Ruđer Bošković Institute, Bijenička cesta 54, HR-10000, Zagreb, Croatia.
| | - Albert Štambuk
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, HR-10000 Zagreb, Croatia
| |
Collapse
|
8
|
Nishiyama N, Shinozawa A, Matsumoto T, Izawa T. High genome heterozygosity revealed vegetative propagation over the sea in Moso bamboo. BMC Genomics 2023; 24:348. [PMID: 37355596 DOI: 10.1186/s12864-023-09428-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 06/04/2023] [Indexed: 06/26/2023] Open
Abstract
BACKGROUND Moso bamboo (Phyllostachys edulis) is a typical East Asian bamboo that does not flower for > 60 years and propagates without seed reproduction. Thus, Moso bamboo can be propagated vegetatively, possibly resulting in highly heterozygous genetic inheritance. Recently, a draft genome of Moso bamboo was reported, followed by whole genome single nucleotide polymorphisms (SNP) analysis, which showed that the genome of Moso bamboo in China has regional characteristics. Moso bamboo in Japan is thought to have been introduced from China over the sea in 1736. However, it is unclear where and how Moso bamboo was introduced in Japan from China. Here, based on detailed analysis of heterozygosity in genome diversity, we estimate the spread of genome diversity and its pedigree of Moso bamboo. RESULTS We sequenced the whole genome of Moso bamboo in Japan and compared them with data reported previously from 15 regions of China. Only 4.1 million loci (0.37% of the analyzed genomic region) were identified as polymorphic loci. We next narrowed down the number of polymorphic loci using several filters and extracted more reliable SNPs. Among the 414,952 high-quality SNPs, 319,431 (77%) loci were identified as heterozygous common to all tested samples. The result suggested that all tested samples were clones via vegetative reproduction. Somatic mutations may accumulate in a heterozygous manner within a single clone. We examined common heterozygous loci between samples from Japan and elsewhere, from which we inferred that an individual closely related to the sample from Fujian, China, was introduced to Japan across the sea without seed reproduction. In addition, we collected 16 samples from four nearby bamboo forests in Japan and performed SNP and insertion/deletion analyses using a genotyping by sequencing (GBS) method. The results suggested that a small number of somatic mutations would spread within and between bamboo groves. CONCLUSIONS High heterozygosity in the genome-wide diversity of Moso bamboo implies the vegetative propagation of Moso bamboo from China to Japan, the pedigree of Moso bamboo in Japan, and becomes a useful marker to approach the spread of genome diversity in clonal plants.
Collapse
Affiliation(s)
- Norihide Nishiyama
- Department of Agricultural and Environmental Biology, Laboratory of Plant Breeding & Genetics, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan
| | - Akihisa Shinozawa
- Department of Bioscience, Faculty of Life Sciences, Tokyo University of Agriculture, 1-1-1 Sakuragaoka Setagaya-Ku, Tokyo, 156-8502, Japan
- The NODAI Genome Research Center (NGRC), Tokyo University of Agriculture, 1-1-1 Sakuragaoka Setagaya-Ku, Tokyo, 156-8502, Japan
| | - Takashi Matsumoto
- Department of Bioscience, Faculty of Life Sciences, Tokyo University of Agriculture, 1-1-1 Sakuragaoka Setagaya-Ku, Tokyo, 156-8502, Japan
| | - Takeshi Izawa
- Department of Agricultural and Environmental Biology, Laboratory of Plant Breeding & Genetics, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-Ku, Tokyo, 113-8657, Japan.
| |
Collapse
|
9
|
Leski TA, Spangler JR, Wang Z, Schultzhaus Z, Taitt CR, Dean SN, Stenger DA. Machine learning for design of degenerate Cas13a crRNAs using lassa virus as a model of highly variable RNA target. Sci Rep 2023; 13:6506. [PMID: 37081092 PMCID: PMC10119381 DOI: 10.1038/s41598-023-33494-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/13/2023] [Indexed: 04/22/2023] Open
Abstract
The design of minimum CRISPR RNA (crRNA) sets for detection of diverse RNA targets using sequence degeneracy has not been systematically addressed. We tested candidate degenerate Cas13a crRNA sets designed for detection of diverse RNA targets (Lassa virus). A decision tree machine learning (ML) algorithm (RuleFit) was applied to define the top attributes that determine the specificity of degenerate crRNAs to elicit collateral nuclease activity. Although the total number of mismatches (0-4) is important, the specificity depends as well on the spacing of mismatches, and their proximity to the 5' end of the spacer. We developed a predictive algorithm for design of candidate degenerate crRNA sets, allowing improved discrimination between "included" and "excluded" groups of related target sequences. A single degenerate crRNA set adhering to these rules detected representatives of all Lassa lineages. Our general ML approach may be applied to the design of degenerate crRNA sets for any CRISPR/Cas system.
Collapse
Affiliation(s)
- T A Leski
- Center for Bio/Molecular Science & Engineering, U.S. Naval Research Laboratory, Washington, USA.
| | - J R Spangler
- Center for Bio/Molecular Science & Engineering, U.S. Naval Research Laboratory, Washington, USA
| | - Z Wang
- Center for Bio/Molecular Science & Engineering, U.S. Naval Research Laboratory, Washington, USA
| | - Z Schultzhaus
- Center for Bio/Molecular Science & Engineering, U.S. Naval Research Laboratory, Washington, USA
- U.S. Department of Agriculture, Riverdale, MD, USA
| | - C R Taitt
- Center for Bio/Molecular Science & Engineering, U.S. Naval Research Laboratory, Washington, USA
- Nova Research Inc., Alexandria, VA, USA
| | - S N Dean
- Center for Bio/Molecular Science & Engineering, U.S. Naval Research Laboratory, Washington, USA
| | - D A Stenger
- Center for Bio/Molecular Science & Engineering, U.S. Naval Research Laboratory, Washington, USA
| |
Collapse
|
10
|
Regueira-Iglesias A, Vázquez-González L, Balsa-Castro C, Blanco-Pintos T, Vila-Blanco N, Carreira MJ, Tomás I. Impact of 16S rRNA Gene Redundancy and Primer Pair Selection on the Quantification and Classification of Oral Microbiota in Next-Generation Sequencing. Microbiol Spectr 2023; 11:e0439822. [PMID: 36779795 PMCID: PMC10101033 DOI: 10.1128/spectrum.04398-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 01/16/2023] [Indexed: 02/14/2023] Open
Abstract
This study aimed to evaluate the number of 16S rRNA genes in the complete genomes of the bacterial and archaeal species inhabiting the human mouth and to assess how the use of different primer pairs would affect the detection and classification of redundant amplicons and matching amplicons (MAs) from different taxa. A total of 518 oral-bacterial and 191 oral-archaeal complete genomes were downloaded from the NCBI database, and their complete 16S rRNA genes were extracted. The numbers of genes and variants per genome were calculated. Next, 39 primer pairs were used to search for matches in the genomes and obtain amplicons. For each primer, we calculated the number of gene amplicons, variants, genomes, and species detected and the percentage of coverage at the species level with no MAs (SC-NMA). The results showed that 94.09% of oral bacteria and 52.59% of oral archaea had more than one intragenomic 16S rRNA gene. From 1.29% to 46.70% of bacterial species and from 4.65% to 38.89% of archaea detected by the primers had MAs. The best primers were the following (SC-NMA; region; position for Escherichia coli [GenBank version no. J01859.1]): KP_F048-OP_R030 for bacteria (93.55%; V3 to V7; 342 to 1079), KP_F018-KP_R063 for archaea (89.63%; V3 to V9; undefined to 1506), and OP_F114-OP_R121 for both domains (92.52%; V3 to V9; 340 to 1405). In addition to 16S rRNA gene redundancy, the presence of MAs must be controlled to ensure an accurate interpretation of microbial diversity data. The SC-NMA is a more useful parameter than the conventional coverage percentage for selecting the best primer pairs. The pairs used the most in the oral microbiome literature were not among the best performers. IMPORTANCE Hundreds of publications have studied the oral microbiome through 16S rRNA gene sequencing. However, none have assessed the number of 16S rRNA genes in the genomes of oral microbes, or how the use of primer pairs targeting different regions affects the detection of MAs from different taxa. Here, we found that almost all oral bacteria and more than half of oral archaea have more than one intragenomic 16S rRNA gene. The performance of the primer pairs in not detecting MAs increases as the length of the amplicon augments. As none of those most employed in the oral literature were among the best performers, we selected a series of primers to detect bacteria and/or archaea based on their percentage of species detected without MAs. The intragenomic 16S rRNA gene redundancy and the presence of MAs between distinct taxa need to be considered to ensure an accurate interpretation of microbial diversity data.
Collapse
Affiliation(s)
- Alba Regueira-Iglesias
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute Foundation of Santiago (FIDIS), Santiago de Compostela, Spain
| | - Lara Vázquez-González
- Centro Singular de Investigación en Tecnoloxías Intelixentes and Departamento de Electrónica e Computación, Universidade de Santiago de Compostela, Health Research Institute Foundation of Santiago (FIDIS), Santiago de Compostela, Spain
| | - Carlos Balsa-Castro
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute Foundation of Santiago (FIDIS), Santiago de Compostela, Spain
| | - Triana Blanco-Pintos
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute Foundation of Santiago (FIDIS), Santiago de Compostela, Spain
| | - Nicolás Vila-Blanco
- Centro Singular de Investigación en Tecnoloxías Intelixentes and Departamento de Electrónica e Computación, Universidade de Santiago de Compostela, Health Research Institute Foundation of Santiago (FIDIS), Santiago de Compostela, Spain
| | - Maria José Carreira
- Centro Singular de Investigación en Tecnoloxías Intelixentes and Departamento de Electrónica e Computación, Universidade de Santiago de Compostela, Health Research Institute Foundation of Santiago (FIDIS), Santiago de Compostela, Spain
| | - Inmaculada Tomás
- Oral Sciences Research Group, Special Needs Unit, Department of Surgery and Medical-Surgical Specialties, School of Medicine and Dentistry, Universidade de Santiago de Compostela, Health Research Institute Foundation of Santiago (FIDIS), Santiago de Compostela, Spain
| |
Collapse
|
11
|
Abstract
Bifidobacteria naturally inhabit diverse environments, including the gastrointestinal tracts of humans and animals. Members of the genus are of considerable scientific interest due to their beneficial effects on health and, hence, their potential to be used as probiotics. By definition, probiotic cells need to be viable despite being exposed to several stressors in the course of their production, storage, and administration. Examples of common stressors encountered by probiotic bifidobacteria include oxygen, acid, and bile salts. As bifidobacteria are highly heterogenous in terms of their tolerance to these stressors, poor stability and/or robustness can hamper the industrial-scale production and commercialization of many strains. Therefore, interest in the stress physiology of bifidobacteria has intensified in recent decades, and many studies have been established to obtain insights into the molecular mechanisms underlying their stability and robustness. By complementing traditional methodologies, omics technologies have opened new avenues for enhancing the understanding of the defense mechanisms of bifidobacteria against stress. In this review, we summarize and evaluate the current knowledge on the multilayered responses of bifidobacteria to stressors, including the most recent insights and hypotheses. We address the prevailing stressors that may affect the cell viability during production and use as probiotics. Besides phenotypic effects, molecular mechanisms that have been found to underlie the stress response are described. We further discuss strategies that can be applied to improve the stability of probiotic bifidobacteria and highlight knowledge gaps that should be addressed in future studies.
Collapse
Affiliation(s)
- Marie Schöpping
- Systems Biology, Discovery, Chr. Hansen A/S, Hørsholm, Denmark
- Division of Industrial Biotechnology, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Ahmad A. Zeidan
- Systems Biology, Discovery, Chr. Hansen A/S, Hørsholm, Denmark
| | - Carl Johan Franzén
- Division of Industrial Biotechnology, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
12
|
Arbuscular Mycorrhiza and Nitrification: Disentangling Processes and Players by Using Synthetic Nitrification Inhibitors. Appl Environ Microbiol 2022; 88:e0136922. [PMID: 36190238 PMCID: PMC9599619 DOI: 10.1128/aem.01369-22] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Both plants and their associated arbuscular mycorrhizal (AM) fungi require nitrogen (N) for their metabolism and growth. This can result in both positive and negative effects of AM symbiosis on plant N nutrition. Either way, the demand for and efficiency of uptake of mineral N from the soil by mycorrhizal plants are often higher than those of nonmycorrhizal plants. In consequence, the symbiosis of plants with AM fungi exerts important feedbacks on soil processes in general and N cycling in particular. Here, we investigated the role of the AM symbiosis in N uptake by Andropogon gerardii from an organic source (15N-labeled plant litter) that was provided beyond the direct reach of roots. In addition, we tested if pathways of 15N uptake from litter by mycorrhizal hyphae were affected by amendment with different synthetic nitrification inhibitors (dicyandiamide [DCD], nitrapyrin, or 3,4-dimethylpyrazole phosphate [DMPP]). We observed efficient acquisition of 15N by mycorrhizal plants through the mycorrhizal pathway, independent of nitrification inhibitors. These results were in stark contrast to 15N uptake by nonmycorrhizal plants, which generally took up much less 15N, and the uptake was further suppressed by nitrapyrin or DMPP amendments. Quantitative real-time PCR analyses showed that bacteria involved in the rate-limiting step of nitrification, ammonia oxidation, were suppressed similarly by the presence of AM fungi and by nitrapyrin or DMPP (but not DCD) amendments. On the other hand, abundances of ammonia-oxidizing archaea were not strongly affected by either the AM fungi or the nitrification inhibitors. IMPORTANCE Nitrogen is one of the most important elements for all life on Earth. In soil, N is present in various chemical forms and is fiercely competed for by various microorganisms as well as plants. Here, we address competition for reduced N (ammonia) between ammonia-oxidizing prokaryotes and arbuscular mycorrhizal fungi. These two functionally important groups of soil microorganisms, participating in nitrification and plant mineral nutrient acquisition, respectively, have often been studied in separation in the past. Here, we showed, using various biochemical and molecular approaches, that the fungi systematically suppress ammonia-oxidizing bacteria to an extent similar to that of some widely used synthetic nitrification inhibitors, whereas they have only a limited impact on abundance of ammonia-oxidizing archaea. Competition for free ammonium is a plausible explanation here, but it is also possible that the fungi produce some compounds acting as so-called biological nitrification inhibitors.
Collapse
|
13
|
Jacobson D, Zheng Y, Plucinski MM, Qvarnstrom Y, Barratt JLN. Evaluation of various distance computation methods for construction of haplotype-based phylogenies from large MLST dataset. Mol Phylogenet Evol 2022; 177:107608. [PMID: 35963590 PMCID: PMC10127246 DOI: 10.1016/j.ympev.2022.107608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 06/30/2022] [Accepted: 08/05/2022] [Indexed: 11/24/2022]
Abstract
Multi-locus sequence typing (MLST) is widely used to investigate genetic relationships among eukaryotic taxa, including parasitic pathogens. MLST analysis workflows typically involve construction of alignment-based phylogenetic trees - i.e., where tree structures are computed from nucleotide differences observed in a multiple sequence alignment (MSA). Notably, alignment-based phylogenetic methods require that all isolates/taxa are represented by a single sequence. When multiple loci are sequenced these sequences may be concatenated to produce one tree that includes information from all loci. Alignment-based phylogenetic techniques are robust and widely used yet possess some shortcomings, including how heterozygous sites are handled, intolerance for missing data (i.e., partial genotypes), and differences in the way insertions-deletions (indels) are scored/treated during tree construction. In certain contexts, 'haplotype-based' methods may represent a viable alternative to alignment-based techniques, as they do not possess the aforementioned limitations. This is namely because haplotype-based methods assess genetic similarity based on numbers of shared (i.e., intersecting) haplotypes as opposed to similarities in nucleotide composition observed in an MSA. For haplotype-based comparisons, choosing an appropriate distance statistic is fundamental, and several statistics are available to choose from. However, a comprehensive assessment of various available statistics for their ability to produce a robust haplotype-based phylogenetic reconstruction has not yet been performed. We evaluated seven distance statistics by applying them to extant MLST datasets from the gastrointestinal parasite Cyclospora cayetanensis and two species of pathogenic nematode of the genus Strongyloides. We compare the genetic relationships identified using each statistic to epidemiologic, geographic, and host metadata. We show that Barratt's heuristic definition of genetic distance was the most robust among the statistics evaluated. Consequently, it is proposed that Barratt's heuristic represents a useful approach for use in the context of challenging MLST datasets possessing features (i.e., high heterozygosity, partial genotypes, and indel or repeat-based polymorphisms) that confound or preclude the use of alignment-based methods.
Collapse
Affiliation(s)
- David Jacobson
- Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA; Oak Ridge Associated Universities, Oak Ridge, TN, USA
| | - Yueli Zheng
- Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA; Eagle Global Scientific, San Antonio, TX, USA
| | - Mateusz M Plucinski
- Malaria Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA; U.S. President's Malaria Initiative, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Yvonne Qvarnstrom
- Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Joel L N Barratt
- Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Centers for Disease Control and Prevention, Atlanta, GA, USA.
| |
Collapse
|
14
|
Eales O, de Oliveira Martins L, Page AJ, Wang H, Bodinier B, Tang D, Haw D, Jonnerby J, Atchison C, Ashby D, Barclay W, Taylor G, Cooke G, Ward H, Darzi A, Riley S, Elliott P, Donnelly CA, Chadeau-Hyam M. Dynamics of competing SARS-CoV-2 variants during the Omicron epidemic in England. Nat Commun 2022; 13:4375. [PMID: 35902613 PMCID: PMC9330949 DOI: 10.1038/s41467-022-32096-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 07/14/2022] [Indexed: 12/15/2022] Open
Abstract
The SARS-CoV-2 pandemic has been characterised by the regular emergence of genomic variants. With natural and vaccine-induced population immunity at high levels, evolutionary pressure favours variants better able to evade SARS-CoV-2 neutralising antibodies. The Omicron variant (first detected in November 2021) exhibited a high degree of immune evasion, leading to increased infection rates worldwide. However, estimates of the magnitude of this Omicron wave have often relied on routine testing data, which are prone to several biases. Using data from the REal-time Assessment of Community Transmission-1 (REACT-1) study, a series of cross-sectional surveys assessing prevalence of SARS-CoV-2 infection in England, we estimated the dynamics of England's Omicron wave (from 9 September 2021 to 1 March 2022). We estimate an initial peak in national Omicron prevalence of 6.89% (5.34%, 10.61%) during January 2022, followed by a resurgence in SARS-CoV-2 infections as the more transmissible Omicron sub-lineage, BA.2 replaced BA.1 and BA.1.1. Assuming the emergence of further distinct variants, intermittent epidemics of similar magnitudes may become the 'new normal'.
Collapse
Affiliation(s)
- Oliver Eales
- School of Public Health, Imperial College London, London, UK.
- MRC Centre for Global Infectious Disease Analysis and Jameel Institute, Imperial College London, London, UK.
| | | | | | - Haowei Wang
- School of Public Health, Imperial College London, London, UK
- MRC Centre for Global Infectious Disease Analysis and Jameel Institute, Imperial College London, London, UK
| | - Barbara Bodinier
- School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
| | - David Tang
- School of Public Health, Imperial College London, London, UK
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
| | - David Haw
- School of Public Health, Imperial College London, London, UK
- MRC Centre for Global Infectious Disease Analysis and Jameel Institute, Imperial College London, London, UK
| | - Jakob Jonnerby
- School of Public Health, Imperial College London, London, UK
- MRC Centre for Global Infectious Disease Analysis and Jameel Institute, Imperial College London, London, UK
| | | | - Deborah Ashby
- School of Public Health, Imperial College London, London, UK
| | - Wendy Barclay
- Department of Infectious Disease, Imperial College London, London, UK
| | - Graham Taylor
- Department of Infectious Disease, Imperial College London, London, UK
| | - Graham Cooke
- Department of Infectious Disease, Imperial College London, London, UK
- Imperial College Healthcare NHS Trust, London, UK
- National Institute for Health Research Imperial Biomedical Research Centre, London, UK
| | - Helen Ward
- School of Public Health, Imperial College London, London, UK
- Imperial College Healthcare NHS Trust, London, UK
- National Institute for Health Research Imperial Biomedical Research Centre, London, UK
| | - Ara Darzi
- Imperial College Healthcare NHS Trust, London, UK
- National Institute for Health Research Imperial Biomedical Research Centre, London, UK
- Institute of Global Health Innovation, Imperial College London, London, UK
| | - Steven Riley
- School of Public Health, Imperial College London, London, UK
- MRC Centre for Global Infectious Disease Analysis and Jameel Institute, Imperial College London, London, UK
| | - Paul Elliott
- School of Public Health, Imperial College London, London, UK.
- MRC Centre for Global Infectious Disease Analysis and Jameel Institute, Imperial College London, London, UK.
- Imperial College Healthcare NHS Trust, London, UK.
- National Institute for Health Research Imperial Biomedical Research Centre, London, UK.
- Health Data Research (HDR) UK, Imperial College London, London, UK.
- UK Dementia Research Institute Centre at Imperial, Imperial College London, London, UK.
| | - Christl A Donnelly
- School of Public Health, Imperial College London, London, UK.
- MRC Centre for Global Infectious Disease Analysis and Jameel Institute, Imperial College London, London, UK.
- Department of Statistics, University of Oxford, Oxford, UK.
| | - Marc Chadeau-Hyam
- School of Public Health, Imperial College London, London, UK.
- MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK.
| |
Collapse
|
15
|
A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains. Genes (Basel) 2022; 13:genes13081330. [PMID: 35893066 PMCID: PMC9394340 DOI: 10.3390/genes13081330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 07/21/2022] [Accepted: 07/23/2022] [Indexed: 02/04/2023] Open
Abstract
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a newly emerging virus well known as the major cause of the worldwide pandemic due to Coronavirus Disease 2019 (COVID-19). Major breakthroughs in the Next Generation Sequencing (NGS) field were elucidated following the first release of a full-length SARS-CoV-2 genome on the 10 January 2020, with the hope of turning the table against the worsening pandemic situation. Previous studies in respiratory virus characterization require mapping of raw sequences to the human genome in the downstream bioinformatics pipeline as part of metagenomic principles. Illumina, as the major player in the NGS arena, took action by releasing guidelines for improved enrichment kits called the Respiratory Virus Oligo Panel (RVOP) based on a hybridization capture method capable of capturing targeted respiratory viruses, including SARS-CoV-2; therefore, allowing a direct map of raw sequences data to SARS-CoV-2 genome in downstream bioinformatics pipeline. Consequently, two bioinformatics pipelines emerged with no previous studies benchmarking the pipelines. This study focuses on gaining insight and understanding of target enrichment workflow by Illumina through the utilization of different bioinformatics pipelines named as 'Fast Pipeline' and 'Normal Pipeline' to SARS-CoV-2 strains isolated from Yogyakarta and Central Java, Indonesia. Overall, both pipelines work well in the characterization of SARS-CoV-2 samples, including in the identification of major studied nucleotide substitutions and amino acid mutations. A higher number of reads mapped to the SARS-CoV-2 genome in Fast Pipeline and merely were discovered as a contributing factor in a higher number of coverage depth and identified variations (SNPs, insertion, and deletion). Fast Pipeline ultimately works well in a situation where time is a critical factor. On the other hand, Normal Pipeline would require a longer time as it mapped reads to the human genome. Certain limitations were identified in terms of pipeline algorithm, whereas it is highly recommended in future studies to design a pipeline in an integrated framework, for instance, by using NextFlow, a workflow framework to combine all scripts into one fully integrated pipeline.
Collapse
|
16
|
Linheiro R, Sabatino S, Lobo D, Archer J. CView: A network based tool for enhanced alignment visualization. PLoS One 2022; 17:e0259726. [PMID: 35696379 PMCID: PMC9191720 DOI: 10.1371/journal.pone.0259726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 05/31/2022] [Indexed: 11/19/2022] Open
Abstract
To date basic visualization of sequence alignments have largely focused on displaying per-site columns of nucleotide, or amino acid, residues along with associated frequency summarizations. The persistence of this tendency to the recent tools designed for viewing mapped read data indicates that such a perspective not only provides a reliable visualization of per-site alterations, but also offers implicit reassurance to the end-user in relation to data accessibility. However, the initial insight gained is limited, something that is especially true when viewing alignments consisting of many sequences representing differing factors such as location, date and subtype. A basic alignment viewer can have potential to increase initial insight through visual enhancement, whilst not delving into the realms of complex sequence analysis. We present CView, a visualizer that expands on the per-site representation of residues through the incorporation of a dynamic network that is based on the summarization of diversity present across different regions of the alignment. Within the network, nodes are based on the clustering of sequence fragments that span windows placed consecutively along the alignment. Edges are placed between nodes of neighbouring windows where they share sequence identification(s), i.e. different regions of the same sequence(s). Thus, if a node is selected on the network, then the relationship that sequences passing through that node have to other regions of diversity within the alignment can be observed through path tracing. In addition to augmenting visual insight, CView provides export features including variant summarization, per-site residue and kmer frequencies, consensus sequence, alignment dissection as well as clustering; each useful across a range of research areas. The software has been designed to be user friendly, intuitive and interactive. It is open source and an executable jar, source code, quick start, usage tutorial and test data are available (under the GNU General Public License) from https://sourceforge.net/projects/cview/.
Collapse
Affiliation(s)
- Raquel Linheiro
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão, Portugal
| | - Stephen Sabatino
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão, Portugal
- BIOPOLIS, Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão, Portugal
| | - Diana Lobo
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão, Portugal
- BIOPOLIS, Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - John Archer
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, Vairão, Portugal
- BIOPOLIS, Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, Vairão, Portugal
- * E-mail:
| |
Collapse
|
17
|
Busch A, Becker A, Schotte U, Plötz M, Abdulmawjood A. Mpl-Gene-Based Loop-Mediated Isothermal Amplification Assay for Specific and Rapid Detection of Listeria monocytogenes in Various Food Samples. Foodborne Pathog Dis 2022; 19:463-472. [PMID: 35099299 DOI: 10.1089/fpd.2021.0080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Listeria monocytogenes represents a high risk in food and can trigger potentially fatal listeriosis. The objective of this study was to detect L. monocytogenes in food using the LAMP method in a fast, specific, sensitive manner and thus to preventively test food for the presence of the target species. The reaction was performed and established using the portable real-time fluorometer Genie® II (OptiGene Ltd., Horsham, United Kingdom). In this new assay, six LAMP primers targeted the mpl-gene sequence of L. monocytogenes. A total of 148 different isolates, including 105 L. monocytogenes and 43 non-L. monocytogenes strains, were tested. Analytical sensitivity was determined based on different DNA- and cell concentrations. The detection limit with a detection rate of 100% was 5 pg of DNA or 275 colony-forming units (CFU) per reaction. Artificially contaminated minced beef and grated mozzarella were also tested. The assay was 100% successful to detect an initial bacterial contamination of 0.4-4 CFU g-1 food after 24 h enrichment in half-Fraser broth. Finally, natively contaminated samples were tested in comparison to the microbiological reference method and real-time polymerase chain reaction. Native sample testing revealed 100% consistent findings between LAMP and the standard culture method after first enrichment for 24 h. In addition, a rapid colony-confirmation method was established that enabled reliable identification of L. monocytogenes isolates on different selective culture media using a simplified DNA extraction by boiling. This study showed that the developed assay was able to determine whether a food is safe with respect to the food-safety criteria of 100 CFU per gram, according to standards of the European Union, for L. monocytogenes and provided faster results than the cultural reference method.
Collapse
Affiliation(s)
- Annemarie Busch
- Institute of Food Quality and Food Safety, University of Veterinary Medicine Hannover, Hannover, Germany
| | - André Becker
- Institute of Food Quality and Food Safety, University of Veterinary Medicine Hannover, Hannover, Germany
| | - Ulrich Schotte
- Department A (Veterinary Medicine), Central Institute of the Bundeswehr Medical Service Kiel, Kronshagen, Germany
| | - Madeleine Plötz
- Institute of Food Quality and Food Safety, University of Veterinary Medicine Hannover, Hannover, Germany
| | - Amir Abdulmawjood
- Institute of Food Quality and Food Safety, University of Veterinary Medicine Hannover, Hannover, Germany
| |
Collapse
|
18
|
Stott CJ, Sawattrakool K, Saeng-Chuto K, Tantituvanont A, Nilubol D. The phylodynamics of emerging porcine deltacoronavirus (PDCoV) in Southeast Asia. Transbound Emerg Dis 2021; 69:2816-2827. [PMID: 34928072 DOI: 10.1111/tbed.14434] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 11/17/2021] [Accepted: 12/05/2021] [Indexed: 11/29/2022]
Abstract
Porcine deltacoronavirus (PDCoV), a recently emerging pathogen, causes diarrhea in pigs. A previous phylogenetic analysis based on spike genes suggested that PDCoV was divided into 3 different groups, including China, US, and Southeast Asia (SEA). SEA PDCoV, however, is genetically separated from China and US but shares a common ancestor. Its origin and evolution have yet been identified. Herein, phylodynamic analyses based on the full-length genome were performed to investigate the origin and evolution of SEA PDCoV. In the study, 18 full-length genome sequences of SEA PDCoV identified in 2013-2016 together with PDCoV from other regions were used in analyses. The results demonstrated that PDCoV was classified into 2 genogroups including G1 and G2. G1 is further evolved into G1a (China), G1b (US). G2 (SEA) group is further evolved into 3 clades, including SEA-1 (Thailand), SEA-2 (Vietnam), and SEA-2r (Vietnam recombinant) clades. The time to the most recent common ancestor (MRCA) of global PDCoV was estimated to be approximately 1989-1990 and possibly have been circulated in SEA more than a decade. SEA PDCoV is genetically diverse compared to China and US PDCoV. The substitution rate of SEA PDCoV was lower than those of China and US, but the recombination rate of SEA was higher. Recombination analyses revealed 4 potential recombinant events in SEA PDCoV, suggesting that they were derived from the same ancestor of China PDCoV. The SEA-2r subgroup was potentially recombinant between SEA-2 and US strains. In conclusion, major mechanisms driving the complex evolution and genetic diversity of SEA PDCoV were multiple introductions of exotic PDCoV strains followed by recombination. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Christopher James Stott
- Department of Veterinary Microbiology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, 10330, Thailand.,Akkhraratchakumari Veterinary College, Walailak University, Nakhon Si Thammarat, 84000, Thailand
| | - Kanokon Sawattrakool
- Department of Veterinary Microbiology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Kepalee Saeng-Chuto
- Department of Veterinary Microbiology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Angkana Tantituvanont
- Department of Pharmaceutics and Industrial Pharmacy, Faculty of Pharmaceutical Sciences, Chulalongkorn University, Bangkok, 10330, Thailand
| | - Dachrit Nilubol
- Department of Veterinary Microbiology, Faculty of Veterinary Science, Chulalongkorn University, Bangkok, 10330, Thailand
| |
Collapse
|
19
|
Sanderson LA, Caron CT, Tan RL, Bett KE. A PostgreSQL Tripal solution for large-scale genotypic and phenotypic data. Database (Oxford) 2021; 2021:baab051. [PMID: 34389844 PMCID: PMC8363843 DOI: 10.1093/database/baab051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 05/11/2021] [Accepted: 08/03/2021] [Indexed: 11/15/2022]
Abstract
Researchers are seeking cost-effective solutions for management and analysis of large-scale genotypic and phenotypic data. Open-source software is uniquely positioned to fill this need through user-focused, crowd-sourced development. Tripal, an open-source toolkit for developing biological data web portals, uses the GMOD Chado database schema to achieve flexible, ontology-driven storage in PostgreSQL. Tripal also aids research-focused web portals in providing data according to findable, accessible, interoperable, reusable (FAIR) principles. We describe here a fully relational PostgreSQL solution to handle large-scale genotypic and phenotypic data that is implemented as a collection of freely available, open-source modules. These Tripal extension modules provide a holistic approach for importing, storage, display and analysis within a relational database schema. Furthermore, they embody the Tripal approach to FAIR data by providing multiple search tools and ensuring metadata is fully described and interoperable. Our solution focuses on data integrity, as well as optimizing performance to provide a fully functional system that is currently being used in the production of Tripal portals for crop species. We fully describe the implementation of our solution and discuss why a PostgreSQL-powered web portal provides an efficient environment for researcher-driven genotypic and phenotypic data analysis.
Collapse
Affiliation(s)
- Lacey-Anne Sanderson
- Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon SK S7N 5A8, Canada
| | - Carolyn T Caron
- Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon SK S7N 5A8, Canada
| | - Reynold L Tan
- Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon SK S7N 5A8, Canada
| | - Kirstin E Bett
- Department of Plant Sciences, University of Saskatchewan, 51 Campus Drive, Saskatoon SK S7N 5A8, Canada
| |
Collapse
|
20
|
Oliveira J, Antunes M, Godinho CP, Teixeira MC, Sá-Correia I, Monteiro PT. From a genome assembly to full regulatory network prediction: the case study of Rhodotorula toruloides putative Haa1-regulon. BMC Bioinformatics 2021; 22:399. [PMID: 34376148 PMCID: PMC8353774 DOI: 10.1186/s12859-021-04312-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 07/30/2021] [Indexed: 12/19/2022] Open
Abstract
Numerous genomes are sequenced and made available to the community through the NCBI portal. However, and, unlike what happens for gene function annotation, annotation of promoter sequences and the underlying prediction of regulatory associations is mostly unavailable, severely limiting the ability to interpret genome sequences in a functional genomics perspective. Here we present an approach where one can download a genome of interest from NCBI in the GenBank Flat File (.gbff) format and, with a minimum set of commands, have all the information parsed, organized and made available through the platform web interface. Also, the new genomes are compared with a given genome of reference in search of homologous genes, shared regulatory elements and predicted transcription associations. We present this approach within the context of Community YEASTRACT of the YEASTRACT + portal, thus benefiting from immediate access to all the comparative genomics queries offered in the YEASTRACT + portal. Besides the yeast community, other communities can install the platform independently, without any constraints. In this work, we exemplify the usefulness of the presented tool, within Community YEASTRACT, in constructing a dedicated database and analysing the genome of the highly promising oleaginous red yeast species Rhodotorula toruloides currently poorly studied at the genome and transcriptome levels and with limited genome editing tools. Regulatory prediction is based on the conservation of promoter sequences and available regulatory networks. The case-study examined is focused on the Haa1 transcription factor—a key regulator of yeast resistance to acetic acid, an important inhibitor of industrial bioconversion of lignocellulosic hydrolysates. The new tool described here led to the prediction of a RtHaa1 regulon with expected impact in the optimization of R. toruloides robustness for lignocellulosic and pectin-rich residue biorefinery processes.
Collapse
Affiliation(s)
| | - Miguel Antunes
- iBB - Institute for Bioengineering and Biosciences/ i4HB - Associate Laboratory Institute for Health and Bioeconomy, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.,Department of Bioengineering, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Claudia P Godinho
- iBB - Institute for Bioengineering and Biosciences/ i4HB - Associate Laboratory Institute for Health and Bioeconomy, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Miguel C Teixeira
- iBB - Institute for Bioengineering and Biosciences/ i4HB - Associate Laboratory Institute for Health and Bioeconomy, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.,Department of Bioengineering, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Isabel Sá-Correia
- iBB - Institute for Bioengineering and Biosciences/ i4HB - Associate Laboratory Institute for Health and Bioeconomy, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal.,Department of Bioengineering, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal
| | - Pedro T Monteiro
- INESC-ID, Lisbon, Portugal. .,Department of Computer Science and Engineering, Instituto Superior Técnico (IST), Universidade de Lisboa, Lisbon, Portugal.
| |
Collapse
|
21
|
Tan WH, Talla V, Mongue AJ, de Roode JC, Gerardo NM, Walters JR. Population genomics reveals variable patterns of immune gene evolution in monarch butterflies (Danaus plexippus). Mol Ecol 2021; 30:4381-4391. [PMID: 34245613 DOI: 10.1111/mec.16071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 07/07/2021] [Accepted: 07/08/2021] [Indexed: 11/27/2022]
Abstract
Humoral and cellular immune responses provide animals with major defences against harmful pathogens. While it is often assumed that immune genes undergo rapid diversifying selection, this assumption has not been tested in many species. Moreover, it is likely that different classes of immune genes experience different levels of evolutionary constraint, resulting in varying selection patterns. We examined the evolutionary patterns for a set of 91 canonical immune genes of North American monarch butterflies (Danaus plexippus), using as an outgroup the closely related soldier butterfly (Danaus eresimus). As a comparison to these immune genes, we selected a set of control genes that were paired with each immune for approximate size and genomic location. As a whole, these immune genes had a significant but modest reduction in Tajima's D relative to paired-control genes, but otherwise did not show distinct patterns of population genetic variation or evolutionary rates. When further partitioning these immune genes into four functional classes (recognition, signalling, modulation, and effector), we found distinct differences among these groups. Relative to control genes, recognition genes exhibit increased nonsynonymous diversity and divergence, suggesting reduced constraints on evolution, and supporting the notion that coevolution with pathogens results in diversifying selection. In contrast, signalling genes showed an opposite pattern of reduced diversity and divergence, suggesting evolutionary constraints and conservation. Modulator and effector genes showed no statistical differences from controls. These results are consistent with patterns found in immune genes in fruit flies and Pieris butterflies, suggesting that consistent selective pressures on different classes of immune genes broadly govern the evolution of innate immunity among insects.
Collapse
Affiliation(s)
- Wen-Hao Tan
- Department of Biology, Emory University, Atlanta, GA, USA
| | - Venkat Talla
- Department of Biology, Emory University, Atlanta, GA, USA
| | - Andrew J Mongue
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA
| | | | | | - James R Walters
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA
| |
Collapse
|
22
|
Sabary O, Orlev Y, Shafir R, Anavy L, Yaakobi E, Yakhini Z. SOLQC: Synthetic Oligo Library Quality Control tool. Bioinformatics 2021; 37:720-722. [PMID: 32840559 DOI: 10.1093/bioinformatics/btaa740] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 07/27/2020] [Accepted: 08/19/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Recent years have seen a growing number and an expanding scope of studies using synthetic oligo libraries for a range of applications in synthetic biology. As experiments are growing by numbers and complexity, analysis tools can facilitate quality control and support better assessment and inference. RESULTS We present a novel analysis tool, called SOLQC, which enables fast and comprehensive analysis of synthetic oligo libraries, based on NGS analysis performed by the user. SOLQC provides statistical information such as the distribution of variant representation, different error rates and their dependence on sequence or library properties. SOLQC produces graphical reports from the analysis, in a flexible format. We demonstrate SOLQC by analyzing literature libraries. We also discuss the potential benefits and relevance of the different components of the analysis. AVAILABILITY AND IMPLEMENTATION SOLQC is a free software for non-commercial use, available at https://app.gitbook.com/@yoav-orlev/s/solqc/. For commercial use please contact the authors. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Omer Sabary
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Haifa, 3200003, Israel
| | - Yoav Orlev
- School of Computer Science, Herzliya Interdisciplinary Center, Herzliya 4610101, Israel
| | - Roy Shafir
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Haifa, 3200003, Israel.,School of Computer Science, Herzliya Interdisciplinary Center, Herzliya 4610101, Israel
| | - Leon Anavy
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Haifa, 3200003, Israel
| | - Eitan Yaakobi
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Haifa, 3200003, Israel
| | - Zohar Yakhini
- The Henry and Marilyn Taub Faculty of Computer Science, Technion, Haifa, 3200003, Israel.,School of Computer Science, Herzliya Interdisciplinary Center, Herzliya 4610101, Israel
| |
Collapse
|
23
|
Sivaprakasam B, Sadagopan P. Development of shiny dashboard application for “genome-wide association study on analysis of SNPs injected in Homo sapiens genome (snips-HsG)”. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
24
|
New thermostable endoglucanase from Spirochaeta thermophila and its mutants with altered substrate preferences. Appl Microbiol Biotechnol 2021; 105:1133-1145. [PMID: 33427929 DOI: 10.1007/s00253-020-11077-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 11/30/2020] [Accepted: 12/27/2020] [Indexed: 10/22/2022]
Abstract
Endoglucanases are key elements in several industrial applications, such as cellulosic biomass hydrolysis, cellulose fiber modification for the production paper and composite materials, and in nanocellulose production. In all of these applications, the desired function of the endoglucanase is to create nicks in the amorphous regions of the cellulose. However, endoglucanase can be diverted from its activity on the fibers by other substrates-soluble oligosaccharides. This issue was addressed in the current study using enzyme engineering and an enzyme evolution approach. To this end, a hypothetical endoglucanase from a thermostable bacterium Spirochaeta thermophila was for the first time cloned and characterized. The wild-type enzyme was used as a starting point for mutagenesis and molecular evolution toward a preference for the higher molecular weight substrates. The best of the evolved enzymes was more active than the wild-type enzyme toward high molecular weight substrate at temperatures below 45 °C (3-fold more active at 30 °C) and showed little or no activity with low molecular weight substrates. These findings can be instrumental in bioeconomy sectors, such as second-generation biofuels and biomaterials from lignocellulosic biomass. KEY POINTS: • A new thermostable endoglucanase was characterized. • The substrate specificity of this endoglucanase was changed by means of genetic engineering. • A mutant with a preference for long molecular weight substrate was obtained and proposed to be beneficial for cellulose fiber modification.
Collapse
|
25
|
Wang M, Wang D, Zhang K, Ngo V, Fan S, Wang W. Motto: Representing Motifs in Consensus Sequences with Minimum Information Loss. Genetics 2020; 216:353-358. [PMID: 32816922 PMCID: PMC7536857 DOI: 10.1534/genetics.120.303597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 08/17/2020] [Indexed: 11/21/2022] Open
Abstract
Sequence analysis frequently requires intuitive understanding and convenient representation of motifs. Typically, motifs are represented as position weight matrices (PWMs) and visualized using sequence logos. However, in many scenarios, in order to interpret the motif information or search for motif matches, it is compact and sufficient to represent motifs by wildcard-style consensus sequences (such as [GC][AT]GATAAG[GAC]). Based on mutual information theory and Jensen-Shannon divergence, we propose a mathematical framework to minimize the information loss in converting PWMs to consensus sequences. We name this representation as sequence Motto and have implemented an efficient algorithm with flexible options for converting motif PWMs into Motto from nucleotides, amino acids, and customized characters. We show that this representation provides a simple and efficient way to identify the binding sites of 1156 common transcription factors (TFs) in the human genome. The effectiveness of the method was benchmarked by comparing sequence matches found by Motto with PWM scanning results found by FIMO. On average, our method achieves a 0.81 area under the precision-recall curve, significantly (P-value < 0.01) outperforming all existing methods, including maximal positional weight, Cavener's method, and minimal mean square error. We believe this representation provides a distilled summary of a motif, as well as the statistical justification.
Collapse
Affiliation(s)
- Mengchi Wang
- Bioinformatics and Systems Biology, University of California at San Diego, La Jolla, California 92093
| | - David Wang
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California 92093
| | - Kai Zhang
- Bioinformatics and Systems Biology, University of California at San Diego, La Jolla, California 92093
| | - Vu Ngo
- Bioinformatics and Systems Biology, University of California at San Diego, La Jolla, California 92093
| | - Shicai Fan
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California 92093
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China 610054
| | - Wei Wang
- Bioinformatics and Systems Biology, University of California at San Diego, La Jolla, California 92093
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, California 92093
- Department of Cellular and Molecular Medicine, University of California at San Diego, La Jolla, California 92093
| |
Collapse
|
26
|
Abstract
Base pairing plays a pivotal role in DNA functions and replication fidelity. But while the complementarity between Watson-Crick matched bases is generally believed to arise from the different number of hydrogen bonds in G|C pairs versus A|T, the energetics of these interactions are heavily renormalized by the aqueous solvent. Employing large-scale Monte Carlo simulations, we have extracted the solvent contribution to the free energy for canonical and some noncanonical and stacked base pairs. For all of them, the solvent's contribution to the base pairing free energy is exclusively destabilizing. While the direct hydrogen bonding interactions in the G|C pair is much stronger than A|T, the thermodynamic resistance produced by the solvent also pushes back much stronger against G|C compared to A|T, generating an only ∼1 kcal/mol free energy difference between them. We have profiled the density of water molecules in the solvent adjacent to the bases and observed a "freezing" behavior where waters are recruited into the gap between the bases to compensate for the unsatisfied hydrogen bonds between them. A very small number of water molecules that are associated with the Watson-Crick donor/acceptor atoms turn out to be responsible for the majority of the solvent's thermodynamic resistance to base pairing. The absence or presence of these near-field waters can be used to enhance fidelity during DNA replication.
Collapse
|
27
|
Soltész B, Pikó P, Sándor J, Kósa Z, Ádány R, Fiatal S. The genetic risk for hypertension is lower among the Hungarian Roma population compared to the general population. PLoS One 2020; 15:e0234547. [PMID: 32555714 PMCID: PMC7299387 DOI: 10.1371/journal.pone.0234547] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 05/28/2020] [Indexed: 01/11/2023] Open
Abstract
Estimating the prevalence of cardiovascular diseases (CVDs) and risk factors among the Roma population, the largest minority in Europe, and investigating the role of genetic or environmental/behavioral risk factors in CVD development are important issues in countries where they are significant minority. This study was designed to estimate the genetic susceptibility of the Hungarian Roma (HR) population to essential hypertension (EH) and compare it to that of the general (HG) population. Twenty EH associated SNPs (in AGT, FMO3, MTHFR-NPPB, NPPA, NPPA-AS1, AGTR1, ADD1, NPR3-C5orf23, NOS3, CACNB2, PLCE1, ATP2B1, GNB3, CYP1A1-ULK3, UMOD and GNAS-EDN3) were genotyped using DNA samples obtained from HR (N = 1176) and HG population (N = 1178) subjects assembled by cross-sectional studies. Allele frequencies and genetic risk scores (unweighted and weighted genetic risk scores (GRS and wGRS, respectively) were calculated for the study groups and compared to examine the joint effects of the SNPs. The susceptibility alleles were more frequent in the HG population, and both GRS and wGRS were found to be higher in the HG population than in the HR population (GRS: 18.98 ± 3.05 vs. 18.25 ± 2.97, p<0.001; wGRS: 1.52 [IQR: 0.99–2.00] vs. 1.4 [IQR: 0.93–1.89], p<0.01). Twenty-seven percent of subjects in the HR population were in the bottom fifth (GRS ≤ 16) of the risk allele count compared with 21% of those in the HG population. Thirteen percent of people in the HR group were in the top fifth (GRS ≥ 22) of the GRS compared with 21% of those in the HG population (p<0.001), i.e., the distribution of GRS was found to be left-shifted in the HR population compared to the HG population. The Roma population seems to be genetically less susceptible to EH than the general one. These results support preventive efforts to lower the risk of developing hypertension by encouraging a healthy lifestyle.
Collapse
Affiliation(s)
- Beáta Soltész
- Doctoral School of Health Sciences, Department of Preventive Medicine, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
| | - Péter Pikó
- MTA-DE Public Health Research Group of the Hungarian Academy of Sciences, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
| | - János Sándor
- Department of Preventive Medicine, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
- WHO Collaborating Centre on Vulnerability and Health, Department of Preventive Medicine, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
| | - Zsigmond Kósa
- Department of Health Visitor Methodology and Public Health, Faculty of Health, University of Debrecen, Nyíregyháza, Hungary
| | - Róza Ádány
- MTA-DE Public Health Research Group of the Hungarian Academy of Sciences, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
- Department of Preventive Medicine, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
- WHO Collaborating Centre on Vulnerability and Health, Department of Preventive Medicine, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
| | - Szilvia Fiatal
- Department of Preventive Medicine, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
- WHO Collaborating Centre on Vulnerability and Health, Department of Preventive Medicine, Faculty of Public Health, University of Debrecen, Debrecen, Hungary
- * E-mail:
| |
Collapse
|
28
|
Cancellieri S, Canver MC, Bombieri N, Giugno R, Pinello L. CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing. Bioinformatics 2020; 36:2001-2008. [PMID: 31764961 PMCID: PMC7141852 DOI: 10.1093/bioinformatics/btz867] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 11/16/2019] [Accepted: 11/21/2019] [Indexed: 12/26/2022] Open
Abstract
MOTIVATION Clustered regularly interspaced short palindromic repeats (CRISPR) technologies allow for facile genomic modification in a site-specific manner. A key step in this process is the in silico design of single guide RNAs to efficiently and specifically target a site of interest. To this end, it is necessary to enumerate all potential off-target sites within a given genome that could be inadvertently altered by nuclease-mediated cleavage. Currently available software for this task is limited by computational efficiency, variant support or annotation, and assessment of the functional impact of potential off-target effects. RESULTS To overcome these limitations, we have developed CRISPRitz, a suite of software tools to support the design and analysis of CRISPR/CRISPR-associated (Cas) experiments. Using efficient data structures combined with parallel computation, we offer a rapid, reliable, and exhaustive search mechanism to enumerate a comprehensive list of putative off-target sites. As proof-of-principle, we performed a head-to-head comparison with other available tools on several datasets. This analysis highlighted the unique features and superior computational performance of CRISPRitz including support for genomic searching with DNA/RNA bulges and mismatches of arbitrary size as specified by the user as well as consideration of genetic variants (variant-aware). In addition, graphical reports are offered for coding and non-coding regions that annotate the potential impact of putative off-target sites that lie within regions of functional genomic annotation (e.g. insulator and chromatin accessible sites from the ENCyclopedia Of DNA Elements [ENCODE] project). AVAILABILITY AND IMPLEMENTATION The software is freely available at: https://github.com/pinellolab/CRISPRitzhttps://github.com/InfOmics/CRISPRitz. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Matthew C Canver
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA
- Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Nicola Bombieri
- Computer Science Department, University of Verona, Verona 37134, Italy
| | - Rosalba Giugno
- Computer Science Department, University of Verona, Verona 37134, Italy
| | - Luca Pinello
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129, USA
- Department of Pathology, Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
29
|
Zhang K, Pan X, Yang Y, Shen HB. CRIP: predicting circRNA-RBP-binding sites using a codon-based encoding and hybrid deep neural networks. RNA (NEW YORK, N.Y.) 2019; 25:1604-1615. [PMID: 31537716 PMCID: PMC6859861 DOI: 10.1261/rna.070565.119] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 08/21/2019] [Indexed: 05/21/2023]
Abstract
Circular RNAs (circRNAs), with their crucial roles in gene regulation and disease development, have become rising stars in the RNA world. To understand the regulatory function of circRNAs, many studies focus on the interactions between circRNAs and RNA-binding proteins (RBPs). Recently, the abundant CLIP-seq experimental data has enabled the large-scale identification and analysis of circRNA-RBP interactions, whereas, as far as we know, no computational tool based on machine learning has been proposed yet. We develop CRIP (CircRNAs Interact with Proteins) for the prediction of RBP-binding sites on circRNAs using RNA sequences alone. CRIP consists of a stacked codon-based encoding scheme and a hybrid deep learning architecture, in which a convolutional neural network (CNN) learns high-level abstract features and a recurrent neural network (RNN) learns long dependency in the sequences. We construct 37 data sets including sequence fragments of binding sites on circRNAs, and each set corresponds to an RBP. The experimental results show that the new encoding scheme is superior to the existing feature representation methods for RNA sequences, and the hybrid network outperforms conventional classifiers by a large margin, where both the CNN and RNN components contribute to the performance improvement.
Collapse
Affiliation(s)
- Kaiming Zhang
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
- Department of Medical Informatics, Erasmus Medical Center, Rotterdam 3015 CE, Netherlands
| | - Yang Yang
- Center for Brain-Like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
30
|
RNApolis: Computational Platform for RNA Structure Analysis. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES 2019. [DOI: 10.2478/fcds-2019-0012] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
In the 1970s, computer scientists began to engage in research in the field of structural biology. The first structural databases, as well as models and methods supporting the analysis of biomolecule structures, started to be created. RNA was put at the centre of scientific interest quite late. However, more and more methods dedicated to this molecule are currently being developed. This paper presents RNApolis - a new computing platform, which offers access to seven bioinformatic tools developed to support the RNA structure study. The set of tools include a structural database and systems for predicting, modelling, annotating and evaluating the RNA structure. RNApolis supports research at different structural levels and allows the discovery, establishment, and validation of relationships between the primary, secondary and tertiary structure of RNAs. The platform is freely available at http://rnapolis.pl
Collapse
|
31
|
Martin MA, Lee RS, Cowley LA, Gardy JL, Hanage WP. Within-host Mycobacterium tuberculosis diversity and its utility for inferences of transmission. Microb Genom 2018; 4. [PMID: 30303479 PMCID: PMC6249434 DOI: 10.1099/mgen.0.000217] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Whole genome sequencing in conjunction with traditional epidemiology has been used to reconstruct transmission networks of Mycobacterium tuberculosis during outbreaks. Given its low mutation rate, genetic diversity within M. tuberculosis outbreaks can be extremely limited - making it difficult to determine precisely who transmitted to whom. In addition to consensus SNPs (cSNPs), examining heterogeneous alleles (hSNPs) has been proposed to improve resolution. However, few studies have examined the potential biases in detecting these hSNPs. Here, we analysed genome sequence data from 25 specimens from British Columbia, Canada. Specimens were sequenced to a depth of 112-296×. We observed biases in read depth, base quality, strand distribution and read placement where possible hSNPs were initially identified, so we applied conservative filters to reduce false positives. Overall, there was phylogenetic concordance between the observed 2542 cSNP and 63 hSNP loci. Furthermore, we identified hSNPs shared exclusively by epidemiologically linked patients, supporting their use in transmission inferences. We conclude that hSNPs may add resolution to transmission networks, particularly where the overall genetic diversity is low.
Collapse
Affiliation(s)
- Michael A Martin
- 1Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Robyn S Lee
- 2Department of Epidemiology, Harvard University, Boston, MA 02115, USA
| | - Lauren A Cowley
- 1Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Jennifer L Gardy
- 3School of Population and Public Health, University of British Columbia, Vancouver, Canada.,4British Columbia Centre for Disease Control, Vancouver, Canada
| | - William P Hanage
- 1Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
32
|
Höft N, Dally N, Hasler M, Jung C. Haplotype Variation of Flowering Time Genes of Sugar Beet and Its Wild Relatives and the Impact on Life Cycle Regimes. FRONTIERS IN PLANT SCIENCE 2018; 8:2211. [PMID: 29354149 PMCID: PMC5758561 DOI: 10.3389/fpls.2017.02211] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 12/15/2017] [Indexed: 05/24/2023]
Abstract
The species Beta vulgaris encompasses wild and cultivated members with a broad range of phenological development. The annual life cycle is commonly found in sea beets (ssp. maritima) from Mediterranean environments which germinate, bolt, and flower within one season under long day conditions. Biennials such as the cultivated sugar beet (B. vulgaris ssp. vulgaris) as well as sea beets from northern latitudes require prolonged exposure to cold temperature over winter to acquire floral competence. Sugar beet is mainly cultivated for sugar production in Europe and is likely to have originated from sea beet. Flowering time strongly affects seed yield and yield potential and is thus a trait of high agronomic relevance. Besides environmental cues, there are complex genetic networks known to impact life cycle switch in flowering plants. In sugar beet, BTC1, BvBBX19, BvFT1, and BvFT2 are major flowering time regulators. In this study, we phenotyped plants from a diversity Beta panel encompassing cultivated and wild species from different geographical origin. Plants were grown under different day length regimes with and without vernalization. Haplotype analysis of BTC1, BvBBX19, BvFT1, and BvFT2 was performed to identify natural diversity of these genes and their impact on flowering. We found that accessions from northern latitudes flowered significantly later than those from southern latitudes. Some plants did not flower at all, indicating a strong impact of latitude of origin on life cycle. Haplotype analysis revealed a high conservation of the CCT-, REC-, BBX-, and PEBP-domains with regard to SNP occurrence. We identified sequence variation which may impact life cycle adaptation in beet. Our data endorse the importance of BTC1 in the domestication process of cultivated beets and contribute to the understanding of distribution and adaption of Beta species to different life cycle regimes in response to different environments. Moreover, our data provide a resource for haplotypes identified for the major floral regulators in beet.
Collapse
Affiliation(s)
- Nadine Höft
- Plant Breeding Institute, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Nadine Dally
- Plant Breeding Institute, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Mario Hasler
- Lehrfach Variationsstatistik, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Christian Jung
- Plant Breeding Institute, Christian-Albrechts-University of Kiel, Kiel, Germany
| |
Collapse
|
33
|
Krause R, Halwachs B, Thallinger GG, Klymiuk I, Gorkiewicz G, Hoenigl M, Prattes J, Valentin T, Heidrich K, Buzina W, Salzer HJF, Rabensteiner J, Prüller F, Raggam RB, Meinitzer A, Moissl-Eichinger C, Högenauer C, Quehenberger F, Kashofer K, Zollner-Schwetz I. Characterisation of Candida within the Mycobiome/Microbiome of the Lower Respiratory Tract of ICU Patients. PLoS One 2016; 11:e0155033. [PMID: 27206014 PMCID: PMC4874575 DOI: 10.1371/journal.pone.0155033] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 04/22/2016] [Indexed: 02/06/2023] Open
Abstract
Whether the presence of Candida spp. in lower respiratory tract (LRT) secretions is a marker of underlying disease, intensive care unit (ICU) treatment and antibiotic therapy or contributes to poor clinical outcome is unclear. We investigated healthy controls, patients with proposed risk factors for Candida growth in LRT (antibiotic therapy, ICU treatment with and without antibiotic therapy), ICU patients with pneumonia and antibiotic therapy and candidemic patients (for comparison of truly invasive and colonizing Candida spp.). Fungal patterns were determined by conventional culture based microbiology combined with molecular approaches (next generation sequencing, multilocus sequence typing) for description of fungal and concommitant bacterial microbiota in LRT, and host and fungal biomarkes were investigated. Admission to and treatment on ICUs shifted LRT fungal microbiota to Candida spp. dominated fungal profiles but antibiotic therapy did not. Compared to controls, Candida was part of fungal microbiota in LRT of ICU patients without pneumonia with and without antibiotic therapy (63% and 50% of total fungal genera) and of ICU patients with pneumonia with antibiotic therapy (73%) (p<0.05). No case of invasive candidiasis originating from Candida in the LRT was detected. There was no common bacterial microbiota profile associated or dissociated with Candida spp. in LRT. Colonizing and invasive Candida strains (from candidemic patients) did not match to certain clades withdrawing the presence of a particular pathogenic and invasive clade. The presence of Candida spp. in the LRT rather reflected rapidly occurring LRT dysbiosis driven by ICU related factors than was associated with invasive candidiasis.
Collapse
Affiliation(s)
- Robert Krause
- Section of Infectious Diseases and Tropical Medicine, Department of Internal Medicine, Medical University of Graz, Graz, Austria
- * E-mail:
| | - Bettina Halwachs
- Bioinformatics, Institute for Knowledge Discovery, University of Technology, Graz, Austria and OMICS Center Graz, Graz, Austria
- BioTechMed-Graz, Graz, Austria
- Institute of Pathology, Medical University of Graz, Graz, Austria
| | | | - Ingeborg Klymiuk
- Center for Medical Research, Medical University of Graz, Graz, Austria
| | | | - Martin Hoenigl
- Section of Infectious Diseases and Tropical Medicine, Department of Internal Medicine, Medical University of Graz, Graz, Austria
| | - Jürgen Prattes
- Section of Infectious Diseases and Tropical Medicine, Department of Internal Medicine, Medical University of Graz, Graz, Austria
| | - Thomas Valentin
- Section of Infectious Diseases and Tropical Medicine, Department of Internal Medicine, Medical University of Graz, Graz, Austria
| | - Katharina Heidrich
- Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
- Department of Medicine I, University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Walter Buzina
- Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
| | - Helmut J. F. Salzer
- Division of Clinical Infectious Diseases, Research Center Borstel, Leibnitz Center for Medicine and Biosciences, Borstel, Germany
| | - Jasmin Rabensteiner
- Clinical Institute of Medical and Chemical Laboratory Diagnostics, Medical University of Graz, Graz, Austria
| | - Florian Prüller
- Clinical Institute of Medical and Chemical Laboratory Diagnostics, Medical University of Graz, Graz, Austria
| | - Reinhard B. Raggam
- Division of Angiology, Department of Internal Medicine, Medical University of Graz, Graz, Austria
| | - Andreas Meinitzer
- Clinical Institute of Medical and Chemical Laboratory Diagnostics, Medical University of Graz, Graz, Austria
| | - Christine Moissl-Eichinger
- Section of Infectious Diseases and Tropical Medicine, Department of Internal Medicine, Medical University of Graz, Graz, Austria
| | - Christoph Högenauer
- Theodor Escherich Laboratory for Microbiome Research, Division of Gastroenterology and Hepatology, Department of Internal Medicine, Medical University of Graz, Graz, Austria
| | - Franz Quehenberger
- Institute for Medical Informatics, Statistics, and Documentation, Medical University of Graz, Graz, Austria
| | - Karl Kashofer
- Institute of Pathology, Medical University of Graz, Graz, Austria
| | - Ines Zollner-Schwetz
- Section of Infectious Diseases and Tropical Medicine, Department of Internal Medicine, Medical University of Graz, Graz, Austria
| |
Collapse
|
34
|
Lichtenstein F, Antoneli F, Briones MRS. MIA: Mutual Information Analyzer, a graphic user interface program that calculates entropy, vertical and horizontal mutual information of molecular sequence sets. BMC Bioinformatics 2015; 16:409. [PMID: 26652707 PMCID: PMC4676106 DOI: 10.1186/s12859-015-0837-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 12/02/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Short and long range correlations in biological sequences are central in genomic studies of covariation. These correlations can be studied using mutual information because it measures the amount of information one random variable contains about the other. Here we present MIA (Mutual Information Analyzer) a user friendly graphic interface pipeline that calculates spectra of vertical entropy (VH), vertical mutual information (VMI) and horizontal mutual information (HMI), since currently there is no user friendly integrated platform that in a single package perform all these calculations. MIA also calculates Jensen-Shannon Divergence (JSD) between pair of different species spectra, herein called informational distances. Thus, the resulting distance matrices can be presented by distance histograms and informational dendrograms, giving support to discrimination of closely related species. RESULTS In order to test MIA we analyzed sequences from Drosophila Adh locus, because the taxonomy and evolutionary patterns of different Drosophila species are well established and the gene Adh is extensively studied. The search retrieved 959 sequences of 291 species. From the total, 450 sequences of 17 species were selected. With this dataset MIA performed all tasks in less than three hours: gathering, storing and aligning fasta files; calculating VH, VMI and HMI spectra; and calculating JSD between pair of different species spectra. For each task MIA saved tables and graphics in the local disk, easily accessible for future analysis. CONCLUSIONS Our tests revealed that the "informational model free" spectra may represent species signatures. Since JSD applied to Horizontal Mutual Information spectra resulted in statistically significant distances between species, we could calculate respective hierarchical clusters, herein called Informational Dendrograms (ID). When compared to phylogenetic trees all Informational Dendrograms presented similar taxonomy and species clusterization.
Collapse
Affiliation(s)
- Flavio Lichtenstein
- Departamento de Informática em Saúde, Escola Paulista de Medicina, Universidade Federal de Sao Paulo, Rua Botucatu, 862, Ed. José Leal Prado, andar térreo, Vila Clementino, CEP 04023-062, Sao Paulo, SP, Brazil. .,Laboratory of Evolutionary Genomics and Biocomplexity, Escola Paulista de Medicina, Universidade Federal de São Paulo, Rua Pedro de Toledo, 669, 4 andar L4E, CEP 04039-032, São Paulo, SP, Brazil.
| | - Fernando Antoneli
- Departamento de Informática em Saúde, Escola Paulista de Medicina, Universidade Federal de Sao Paulo, Rua Botucatu, 862, Ed. José Leal Prado, andar térreo, Vila Clementino, CEP 04023-062, Sao Paulo, SP, Brazil. .,Laboratory of Evolutionary Genomics and Biocomplexity, Escola Paulista de Medicina, Universidade Federal de São Paulo, Rua Pedro de Toledo, 669, 4 andar L4E, CEP 04039-032, São Paulo, SP, Brazil.
| | - Marcelo R S Briones
- Departamento de Microbiologia, Immunologia and Parasitologia, Escola Paulista de Medicina, Universidade Federal de Sao Paulo, Rua Botucatu, 862, Ed. Ciências Biomédicas, 3 andar, Vila Clementino, CEP 04023-062, Sao Paulo, SP, Brazil. .,Laboratory of Evolutionary Genomics and Biocomplexity, Escola Paulista de Medicina, Universidade Federal de São Paulo, Rua Pedro de Toledo, 669, 4 andar L4E, CEP 04039-032, São Paulo, SP, Brazil.
| |
Collapse
|
35
|
Pironti A, Sierra S, Kaiser R, Lengauer T, Pfeifer N. Effects of sequence alterations on results from genotypic tropism testing. J Clin Virol 2015; 65:68-73. [DOI: 10.1016/j.jcv.2015.02.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Revised: 01/28/2015] [Accepted: 02/06/2015] [Indexed: 11/30/2022]
|
36
|
Rozak DA, Rozak AJ. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes. BMC Genomics 2014; 15:52. [PMID: 24447494 PMCID: PMC3916809 DOI: 10.1186/1471-2164-15-52] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2013] [Accepted: 01/08/2014] [Indexed: 11/29/2022] Open
Abstract
Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application (
http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte.
Collapse
|