1
|
Elrashedy A, Nayel M, Salama A, Zaghawa A, Abdelsalam NR, Hasan ME. Phylogenetic Analysis and Comparative Genomics of Brucella abortus and Brucella melitensis Strains in Egypt. J Mol Evol 2024; 92:338-357. [PMID: 38809331 PMCID: PMC11169049 DOI: 10.1007/s00239-024-10173-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 05/02/2024] [Indexed: 05/30/2024]
Abstract
Brucellosis is a notifiable disease induced by a facultative intracellular Brucella pathogen. In this study, eight Brucella abortus and eighteen Brucella melitensis strains from Egypt were annotated and compared with RB51 and REV1 vaccines respectively. RAST toolkit in the BV-BRC server was used for annotation, revealing genome length of 3,250,377 bp and 3,285,803 bp, 3289 and 3323 CDS, 48 and 49 tRNA genes, the same number of rRNA (3) genes, 583 and 586 hypothetical proteins, 2697 and 2726 functional proteins for B. abortus and B. melitensis respectively. B. abortus strains exhibit a similar number of candidate genes, while B. melitensis strains showed some differences, especially in the SRR19520422 Faiyum strain. Also, B. melitensis clarified differences in antimicrobial resistance genes (KatG, FabL, MtrA, MtrB, OxyR, and VanO-type) in SRR19520319 Faiyum and (Erm C and Tet K) in SRR19520422 Faiyum strain. Additionally, the whole genome phylogeny analysis proved that all B. abortus strains were related to vaccinated animals and all B. melitensis strains of Menoufia clustered together and closely related to Gharbia, Dameitta, and Kafr Elshiek. The Bowtie2 tool identified 338 (eight B. abortus) and 4271 (eighteen B. melitensis) single nucleotide polymorphisms (SNPs) along the genomes. These variants had been annotated according to type and impact. Moreover, thirty candidate genes were predicted and submitted at GenBank (24 in B. abortus) and (6 in B. melitensis). This study contributes significant insights into genetic variation, virulence factors, and vaccine-related associations of Brucella pathogens, enhancing our knowledge of brucellosis epidemiology and evolution in Egypt.
Collapse
Affiliation(s)
- Alyaa Elrashedy
- Department of Animal Medicine and Infectious Diseases (Infectious Diseases), Faculty of Veterinary Medicine, University of Sadat City, Sadat City, Egypt.
| | - Mohamed Nayel
- Department of Animal Medicine and Infectious Diseases (Infectious Diseases), Faculty of Veterinary Medicine, University of Sadat City, Sadat City, Egypt
| | - Akram Salama
- Department of Animal Medicine and Infectious Diseases (Infectious Diseases), Faculty of Veterinary Medicine, University of Sadat City, Sadat City, Egypt
| | - Ahmed Zaghawa
- Department of Animal Medicine and Infectious Diseases (Infectious Diseases), Faculty of Veterinary Medicine, University of Sadat City, Sadat City, Egypt
| | - Nader R Abdelsalam
- Agricultural Botany Department, Faculty of Agriculture (Saba Basha), Alexandria University, Alexandria, 21531, Egypt
| | - Mohamed E Hasan
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, University of Sadat City, Sadat City, Egypt
| |
Collapse
|
2
|
Oury N, Magalon H. Investigating the potential roles of intra-colonial genetic variability in Pocillopora corals using genomics. Sci Rep 2024; 14:6437. [PMID: 38499737 PMCID: PMC10948807 DOI: 10.1038/s41598-024-57136-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 03/14/2024] [Indexed: 03/20/2024] Open
Abstract
Intra-colonial genetic variability (IGV), the presence of more than one genotype in a single colony, has been increasingly studied in scleractinians, revealing its high prevalence. Several studies hypothesised that IGV brings benefits, but few have investigated its roles from a genetic perspective. Here, using genomic data (SNPs), we investigated these potential benefits in populations of the coral Pocillopora acuta from Reunion Island (southwestern Indian Ocean). As the detection of IGV depends on sequencing and bioinformatics errors, we first explored the impact of the bioinformatics pipeline on its detection. Then, SNPs and genes variable within colonies were characterised. While most of the tested bioinformatics parameters did not significantly impact the detection of IGV, filtering on genotype depth of coverage strongly improved its detection by reducing genotyping errors. Mosaicism and chimerism, the two processes leading to IGV (the first through somatic mutations, the second through fusion of distinct organisms), were found in 7% and 12% of the colonies, respectively. Both processes led to several intra-colonial allelic differences, but most were non-coding or silent. However, 7% of the differences were non-silent and found in genes involved in a high diversity of biological processes, some of which were directly linked to responses to environmental stresses. IGV, therefore, appears as a source of genetic diversity and genetic plasticity, increasing the adaptive potential of colonies. Such benefits undoubtedly play an important role in the maintenance and the evolution of scleractinian populations and appear crucial for the future of coral reefs in the context of ongoing global changes.
Collapse
Affiliation(s)
- Nicolas Oury
- UMR ENTROPIE (Université de La Réunion, IRD, IFREMER, Université de Nouvelle-Calédonie, CNRS), Université de La Réunion, 97744, St Denis Cedex 09, La Réunion, France.
- Laboratoire Cogitamus, Paris, France.
- KAUST Red Sea Research Center and Marine Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia.
| | - Hélène Magalon
- UMR ENTROPIE (Université de La Réunion, IRD, IFREMER, Université de Nouvelle-Calédonie, CNRS), Université de La Réunion, 97744, St Denis Cedex 09, La Réunion, France
- Laboratoire Cogitamus, Paris, France
- Laboratoire d'Excellence CORAIL, Perpignan, France
| |
Collapse
|
3
|
Xie S, Isaacs K, Becker G, Murdoch BM. A computational framework for improving genetic variants identification from 5,061 sheep sequencing data. J Anim Sci Biotechnol 2023; 14:127. [PMID: 37779189 PMCID: PMC10544426 DOI: 10.1186/s40104-023-00923-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 08/01/2023] [Indexed: 10/03/2023] Open
Abstract
BACKGROUND Pan-genomics is a recently emerging strategy that can be utilized to provide a more comprehensive characterization of genetic variation. Joint calling is routinely used to combine identified variants across multiple related samples. However, the improvement of variants identification using the mutual support information from multiple samples remains quite limited for population-scale genotyping. RESULTS In this study, we developed a computational framework for joint calling genetic variants from 5,061 sheep by incorporating the sequencing error and optimizing mutual support information from multiple samples' data. The variants were accurately identified from multiple samples by using four steps: (1) Probabilities of variants from two widely used algorithms, GATK and Freebayes, were calculated by Poisson model incorporating base sequencing error potential; (2) The variants with high mapping quality or consistently identified from at least two samples by GATK and Freebayes were used to construct the raw high-confidence identification (rHID) variants database; (3) The high confidence variants identified in single sample were ordered by probability value and controlled by false discovery rate (FDR) using rHID database; (4) To avoid the elimination of potentially true variants from rHID database, the variants that failed FDR were reexamined to rescued potential true variants and ensured high accurate identification variants. The results indicated that the percent of concordant SNPs and Indels from Freebayes and GATK after our new method were significantly improved 12%-32% compared with raw variants and advantageously found low frequency variants of individual sheep involved several traits including nipples number (GPC5), scrapie pathology (PAPSS2), seasonal reproduction and litter size (GRM1), coat color (RAB27A), and lentivirus susceptibility (TMEM154). CONCLUSION The new method used the computational strategy to reduce the number of false positives, and simultaneously improve the identification of genetic variants. This strategy did not incur any extra cost by using any additional samples or sequencing data information and advantageously identified rare variants which can be important for practical applications of animal breeding.
Collapse
Affiliation(s)
- Shangqian Xie
- Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA
| | | | - Gabrielle Becker
- Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA
| | - Brenda M Murdoch
- Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA.
| |
Collapse
|
4
|
Weinstein JY, Martí-Gómez C, Lipsh-Sokolik R, Hoch SY, Liebermann D, Nevo R, Weissman H, Petrovich-Kopitman E, Margulies D, Ivankov D, McCandlish DM, Fleishman SJ. Designed active-site library reveals thousands of functional GFP variants. Nat Commun 2023; 14:2890. [PMID: 37210560 PMCID: PMC10199939 DOI: 10.1038/s41467-023-38099-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 04/13/2023] [Indexed: 05/22/2023] Open
Abstract
Mutations in a protein active site can lead to dramatic and useful changes in protein activity. The active site, however, is sensitive to mutations due to a high density of molecular interactions, substantially reducing the likelihood of obtaining functional multipoint mutants. We introduce an atomistic and machine-learning-based approach, called high-throughput Functional Libraries (htFuncLib), that designs a sequence space in which mutations form low-energy combinations that mitigate the risk of incompatible interactions. We apply htFuncLib to the GFP chromophore-binding pocket, and, using fluorescence readout, recover >16,000 unique designs encoding as many as eight active-site mutations. Many designs exhibit substantial and useful diversity in functional thermostability (up to 96 °C), fluorescence lifetime, and quantum yield. By eliminating incompatible active-site mutations, htFuncLib generates a large diversity of functional sequences. We envision that htFuncLib will be used in one-shot optimization of activity in enzymes, binders, and other proteins.
Collapse
Affiliation(s)
| | - Carlos Martí-Gómez
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Rosalie Lipsh-Sokolik
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Shlomo Yakir Hoch
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Demian Liebermann
- Department of Chemical and Biological Physics, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Reinat Nevo
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Haim Weissman
- Department of Molecular Chemistry and Materials Science, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | | | - David Margulies
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, 7610001, Israel
| | - Dmitry Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - David M McCandlish
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, 7610001, Israel.
| |
Collapse
|
5
|
Luiza Atella A, Fatima Grossi-de-Sá M, Alves-Ferreira M. Cotton promoters for controlled gene expression. ELECTRON J BIOTECHN 2023. [DOI: 10.1016/j.ejbt.2022.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
|
6
|
Wu Z, Che Y, Dang C, Zhang M, Zhang X, Sun Y, Li X, Zhang T, Xia Y. Nanopore-based long-read metagenomics uncover the resistome intrusion by antibiotic resistant bacteria from treated wastewater in receiving water body. WATER RESEARCH 2022; 226:119282. [PMID: 36332295 DOI: 10.1016/j.watres.2022.119282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 10/17/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
Wastewater treatment plant (WWTP) effluent discharge could induce the resistome enrichment in the receiving water environments. However, because of the general lack of a robust antibiotic-resistant bacteria (ARB) identification method, the driving mechanism for resistome accumulation in receiving environment is unclear. Here, we took advantage of the enhanced ARBs recognition by nanopore long reads to distinguish the indigenous ARBs and the accumulation of WWTP-borne ARBs in the receiving water body of a domestic WWTP. A bioinformatic framework (named ARGpore2: https://github.com/sustc-xylab/ARGpore2) was constructed and evaluate to facilitate antibiotic resistance genes (ARGs) and ARBs identification in nanopore reads. ARGs identification by ARGpore2 showed comparable precision and recall to that of the commonly adopt BLASTP-based method, whereas the spectrum of ARBs doubled that of the assembled Illumina dataset. Totally, we identified 33 ARBs genera carrying 65 ARG subtypes in the receiving seawater, whose concentration was in general 10 times higher than clean seawater's. Notably we report a primary resistome intrusion caused by the revival of residual microbes survived from disinfection treatment. These WWTP-borne ARBs, including several animal/human enteric pathogens, contributed up to 85% of the receiving water resistome. Plasmids and class 1 integrons were reckoned as major vehicles facilitating the persistence and dissemination of ARGs. Moreover, our work demonstrated the importance of extensive carrier identification in determining the driving force of multifactor coupled resistome booming in complicated environmental conditions, thereby paving the way for establishing priority for effective ARGs mitigation strategies.
Collapse
Affiliation(s)
- Ziqi Wu
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China; Section of Microbiology, University of Copenhagen, Universitetsparken 15, 2100, Copenhagen, Denmark
| | - You Che
- Environmental Microbiome Engineering and Biotechnology Laboratory, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR
| | - Chenyuan Dang
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Miao Zhang
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Xuyang Zhang
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Yuhong Sun
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Xiang Li
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China; State Environmental Protection Key Laboratory of Integrated Surface Water-Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China; Guangdong Provincial Key Laboratory of Soil and Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Tong Zhang
- Environmental Microbiome Engineering and Biotechnology Laboratory, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR
| | - Yu Xia
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China; State Environmental Protection Key Laboratory of Integrated Surface Water-Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China; Guangdong Provincial Key Laboratory of Soil and Groundwater Pollution Control, School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China.
| |
Collapse
|
7
|
Yim WC, Swain ML, Ma D, An H, Bird KA, Curdie DD, Wang S, Ham HD, Luzuriaga-Neira A, Kirkwood JS, Hur M, Solomon JKQ, Harper JF, Kosma DK, Alvarez-Ponce D, Cushman JC, Edger PP, Mason AS, Pires JC, Tang H, Zhang X. The final piece of the Triangle of U: Evolution of the tetraploid Brassica carinata genome. THE PLANT CELL 2022; 34:4143-4172. [PMID: 35961044 PMCID: PMC9614464 DOI: 10.1093/plcell/koac249] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 06/24/2022] [Indexed: 05/05/2023]
Abstract
Ethiopian mustard (Brassica carinata) is an ancient crop with remarkable stress resilience and a desirable seed fatty acid profile for biofuel uses. Brassica carinata is one of six Brassica species that share three major genomes from three diploid species (AA, BB, and CC) that spontaneously hybridized in a pairwise manner to form three allotetraploid species (AABB, AACC, and BBCC). Of the genomes of these species, that of B. carinata is the least understood. Here, we report a chromosome scale 1.31-Gbp genome assembly with 156.9-fold sequencing coverage for B. carinata, completing the reference genomes comprising the classic Triangle of U, a classical theory of the evolutionary relationships among these six species. Our assembly provides insights into the hybridization event that led to the current B. carinata genome and the genomic features that gave rise to the superior agronomic traits of B. carinata. Notably, we identified an expansion of transcription factor networks and agronomically important gene families. Completion of the Triangle of U comparative genomics platform has allowed us to examine the dynamics of polyploid evolution and the role of subgenome dominance in the domestication and continuing agronomic improvement of B. carinata and other Brassica species.
Collapse
Affiliation(s)
| | | | - Dongna Ma
- Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Hong An
- Division of Biological Sciences, University of Missouri, Columbia, Missouri 65201, USA
| | - Kevin A Bird
- Department of Horticulture, Michigan State University, East Lansing, Michigan 48824, USA
| | - David D Curdie
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
| | - Samuel Wang
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
| | - Hyun Don Ham
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
| | | | - Jay S Kirkwood
- Metabolomics Core Facility, Institute for Integrative Genome Biology, University of California, Riverside, California 92521, USA
| | - Manhoi Hur
- Metabolomics Core Facility, Institute for Integrative Genome Biology, University of California, Riverside, California 92521, USA
| | - Juan K Q Solomon
- Department of Agriculture, Veterinary & Rangeland Sciences, University of Nevada, Reno, Nevada 89557, USA
| | - Jeffrey F Harper
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
| | - Dylan K Kosma
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
| | | | - John C Cushman
- Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
| | - Patrick P Edger
- Department of Horticulture, Michigan State University, East Lansing, Michigan 48824, USA
| | - Annaliese S Mason
- Plant Breeding Department, INRES, The University of Bonn, Bonn 53115, Germany
| | - J Chris Pires
- Division of Biological Sciences, Bond Life Sciences Center, , University of Missouri, Columbia, Missouri 65211, USA
| | - Haibao Tang
- Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xingtan Zhang
- Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
8
|
Müller R, Nebel M. On the use of sequence-quality information in OTU clustering. PeerJ 2021; 9:e11717. [PMID: 34458017 PMCID: PMC8375510 DOI: 10.7717/peerj.11717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 06/11/2021] [Indexed: 11/20/2022] Open
Abstract
Background High-throughput sequencing has become an essential technology in life science research. Despite continuous improvements in technology, the produced sequences are still not entirely accurate. Consequently, the sequences are usually equipped with error probabilities. The quality information is already employed to find better solutions to a number of bioinformatics problems (e.g. read mapping). Data processing pipelines benefit in particular (especially when incorporating the quality information early), since enhanced outcomes of one step can improve all subsequent ones. Preprocessing steps, thus, quite regularly consider the sequence quality to fix errors or discard low-quality data. Other steps, however, like clustering sequences into operational taxonomic units (OTUs), a common task in the analysis of microbial communities, are typically performed without making use of the available quality information. Results In this paper, we present quality-aware clustering methods inspired by quality-weighted alignments and model-based denoising, and explore their applicability to OTU clustering. We implemented the quality-aware methods in a revised version of our de novo clustering tool GeFaST and evaluated their clustering quality and performance on mock-community data sets. Quality-weighted alignments were able to improve the clustering quality of GeFaST by up to 10%. The examination of the model-supported methods provided a more diverse picture, hinting at a narrower applicability, but they were able to attain similar improvements. Considering the quality information enlarged both runtime and memory consumption, even though the increase of the former depended heavily on the applied method and clustering threshold. Conclusions The quality-aware methods expand the iterative, de novo clustering approach by new clustering and cluster refinement methods. Our results indicate that OTU clustering constitutes yet another analysis step benefiting from the integration of quality information. Beyond the shown potential, the quality-aware methods offer a range of opportunities for fine-tuning and further extensions.
Collapse
Affiliation(s)
- Robert Müller
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Markus Nebel
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| |
Collapse
|
9
|
Boatwright JL, Yeh CT, Hu HC, Susanna A, Soltis DE, Soltis PS, Schnable PS, Barbazuk WB. Trajectories of Homoeolog-Specific Expression in Allotetraploid Tragopogon castellanus Populations of Independent Origins. FRONTIERS IN PLANT SCIENCE 2021; 12:679047. [PMID: 34249049 PMCID: PMC8261302 DOI: 10.3389/fpls.2021.679047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Accepted: 05/20/2021] [Indexed: 06/13/2023]
Abstract
Polyploidization can have a significant ecological and evolutionary impact by providing substantially more genetic material that may result in novel phenotypes upon which selection may act. While the effects of polyploidization are broadly reviewed across the plant tree of life, the reproducibility of these effects within naturally occurring, independently formed polyploids is poorly characterized. The flowering plant genus Tragopogon (Asteraceae) offers a rare glimpse into the intricacies of repeated allopolyploid formation with both nascent (< 90 years old) and more ancient (mesopolyploids) formations. Neo- and mesopolyploids in Tragopogon have formed repeatedly and have extant diploid progenitors that facilitate the comparison of genome evolution after polyploidization across a broad span of evolutionary time. Here, we examine four independently formed lineages of the mesopolyploid Tragopogon castellanus for homoeolog expression changes and fractionation after polyploidization. We show that expression changes are remarkably similar among these independently formed polyploid populations with large convergence among expressed loci, moderate convergence among loci lost, and stochastic silencing. We further compare and contrast these results for T. castellanus with two nascent Tragopogon allopolyploids. While homoeolog expression bias was balanced in both nascent polyploids and T. castellanus, the degree of additive expression was significantly different, with the mesopolyploid populations demonstrating more non-additive expression. We suggest that gene dosage and expression noise minimization may play a prominent role in regulating gene expression patterns immediately after allopolyploidization as well as deeper into time, and these patterns are conserved across independent polyploid lineages.
Collapse
Affiliation(s)
- J. Lucas Boatwright
- Advanced Plant Technology Program, Clemson University, Clemson, SC, United States
| | - Cheng-Ting Yeh
- Department of Agronomy, Iowa State University, Ames, IA, United States
| | - Heng-Cheng Hu
- Department of Agronomy, Iowa State University, Ames, IA, United States
- Covance Inc., Indianapolis, IN, United States
| | - Alfonso Susanna
- Botanic Institute of Barcelona, Consejo Superior de Investigaciones Científicas, ICUB, Barcelona, Spain
| | - Douglas E. Soltis
- Department of Biology, University of Florida, Gainesville, FL, United States
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
- Biodiversity Institute, University of Florida, Gainesville, FL, United States
| | - Pamela S. Soltis
- Plant Molecular and Cellular Biology Program, University of Florida, Gainesville, FL, United States
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
- Genetics Institute, University of Florida, Gainesville, FL, United States
- Biodiversity Institute, University of Florida, Gainesville, FL, United States
| | | | - William B. Barbazuk
- Department of Biology, University of Florida, Gainesville, FL, United States
| |
Collapse
|
10
|
Fischer C, Koblmüller S, Börger C, Michelitsch G, Trajanoski S, Schlötterer C, Guelly C, Thallinger GG, Sturmbauer C. Genome sequences of Tropheus moorii and Petrochromis trewavasae, two eco-morphologically divergent cichlid fishes endemic to Lake Tanganyika. Sci Rep 2021; 11:4309. [PMID: 33619328 PMCID: PMC7900123 DOI: 10.1038/s41598-021-81030-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Accepted: 12/28/2020] [Indexed: 01/01/2023] Open
Abstract
With more than 1000 species, East African cichlid fishes represent the fastest and most species-rich vertebrate radiation known, providing an ideal model to tackle molecular mechanisms underlying recurrent adaptive diversification. We add high-quality genome reconstructions for two phylogenetic key species of a lineage that diverged about ~ 3-9 million years ago (mya), representing the earliest split of the so-called modern haplochromines that seeded additional radiations such as those in Lake Malawi and Victoria. Along with the annotated genomes we analysed discriminating genomic features of the study species, each representing an extreme trophic morphology, one being an algae browser and the other an algae grazer. The genomes of Tropheus moorii (TM) and Petrochromis trewavasae (PT) comprise 911 and 918 Mbp with 40,300 and 39,600 predicted genes, respectively. Our DNA sequence data are based on 5 and 6 individuals of TM and PT, and the transcriptomic sequences of one individual per species and sex, respectively. Concerning variation, on average we observed 1 variant per 220 bp (interspecific), and 1 variant per 2540 bp (PT vs PT)/1561 bp (TM vs TM) (intraspecific). GO enrichment analysis of gene regions affected by variants revealed several candidates which may influence phenotype modifications related to facial and jaw morphology, such as genes belonging to the Hedgehog pathway (SHH, SMO, WNT9A) and the BMP and GLI families.
Collapse
Affiliation(s)
- C Fischer
- Institute of Biology, University of Graz, Graz, Austria
- Institute of Biomedical Informatics, Graz University of Technology, Graz, Austria
| | - S Koblmüller
- Institute of Biology, University of Graz, Graz, Austria
| | - C Börger
- Institute of Biology, University of Graz, Graz, Austria
| | - G Michelitsch
- Center for Medical Research, Medical University of Graz, Graz, Austria
| | - S Trajanoski
- Center for Medical Research, Medical University of Graz, Graz, Austria
| | - C Schlötterer
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - C Guelly
- Center for Medical Research, Medical University of Graz, Graz, Austria
| | - G G Thallinger
- Institute of Biomedical Informatics, Graz University of Technology, Graz, Austria.
- BioTechMed-Graz, Graz, Austria.
| | - C Sturmbauer
- Institute of Biology, University of Graz, Graz, Austria.
- BioTechMed-Graz, Graz, Austria.
| |
Collapse
|
11
|
Frith MC. How sequence alignment scores correspond to probability models. Bioinformatics 2019; 36:408-415. [PMID: 31329241 PMCID: PMC9883716 DOI: 10.1093/bioinformatics/btz576] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 05/31/2019] [Accepted: 07/17/2019] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to each kind of data, calculate the reliability of alignment parts and measure sequence similarity integrated over possible alignments. RESULTS This study shows how multiple models correspond to one set of scores. Scores can be converted to probabilities by partition functions with a 'temperature' parameter: for any temperature, this corresponds to some PHMM. There is a special class of models with balanced length probability, i.e. no bias toward either longer or shorter alignments. The best way to score alignments and assess their significance depends on the aim: judging whether whole sequences are related versus finding related parts. This clarifies the statistical basis of sequence alignment. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
12
|
Liu T, Wang X, Wang G, Jia S, Liu G, Shan G, Chi S, Zhang J, Yu Y, Xue T, Yu J. Evolution of Complex Thallus Alga: Genome Sequencing of Saccharina japonica. Front Genet 2019; 10:378. [PMID: 31118944 PMCID: PMC6507550 DOI: 10.3389/fgene.2019.00378] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 04/09/2019] [Indexed: 01/15/2023] Open
Abstract
Saccharina, as one of the most important brown algae (Phaeophyceae) with multicellular thallus, has a very remarkable evolutionary history, and globally accounts for most of the economic marine aquaculture production worldwide. Here, we present the 580.5 million base pairs of genome sequence of Saccharina japonica, whose current assembly contains 35,725 protein-coding genes. In a comparative analysis with Ectocarpus siliculosus, the integrated virus sequence suggested the genome evolutionary footprints, which derived from their co-ancestry and experienced genomic arrangements. Furthermore, the gene expansion was found to be an important strategy for functional evolution, especially with regard to extracelluar components, stress-related genes, and vanadium-dependent haloperoxidases, and we proposed a hypothesis that gene duplication events were the main driving force for the evolution history from multicellular filamentous algae to thallus algae. The sequenced Saccharina genome paves the way for further molecular studies and is useful for genome-assisted breeding of S. japonica and other related algae species.
Collapse
Affiliation(s)
- Tao Liu
- College of Marine Life Science, Ocean University of China, Qingdao, China
- College of Life Sciences, Yantai University, Yantai, China
| | - Xumin Wang
- College of Life Sciences, Yantai University, Yantai, China
| | - Guoliang Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Shangang Jia
- College of Grassland Science and Technology, China Agricultural University, Beijing, China
| | - Guiming Liu
- Beijing Agro-Biotechnology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
| | - Guangle Shan
- University of Chinese Academy of Sciences, Beijing, China
| | - Shan Chi
- College of Marine Life Science, Ocean University of China, Qingdao, China
- Qingdao Haida Blue Tek Biotechnology Co., Ltd, Qingdao, China
| | - Jing Zhang
- College of Biological Engineering, Qilu University of Technology, Shandong Academy of Sciences, Jinan, China
| | - Yahui Yu
- College of Marine Life Science, Ocean University of China, Qingdao, China
| | - Ting Xue
- The Public Service Platform for Industrialization Development Technology of Marine Biological Medicine and Product of State Oceanic Administration, College of Life Sciences, Fujian Normal University, Fuzhou, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
13
|
Díaz-Sánchez S, Hernández-Jarguín A, Torina A, de Mera IGF, Blanda V, Caracappa S, Gortazar C, de la Fuente J. Characterization of the bacterial microbiota in wild-caught Ixodes ventalloi. Ticks Tick Borne Dis 2018; 10:336-343. [PMID: 30482513 DOI: 10.1016/j.ttbdis.2018.11.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 10/10/2018] [Accepted: 11/15/2018] [Indexed: 11/24/2022]
Abstract
Exploring the microbial diversity of ticks is crucial to understand geographical dispersion and pathogen transmission. Tick microbes participate in many biological processes implicated in the acquisition, maintenance, and transmission of pathogens, and actively promote host phenotypic changes, and adaptation to new environments. The microbial community of Ixodes ventalloi still remains unexplored. In this study, the bacterial microbiota of wild-caught I. ventalloi was characterized using shotgun-metagenomic sequencing in samples from unfed adults collected during December 2013-January 2014 in two locations from Sicily, Italy. The microbiota identified in I. ventalloi was mainly composed of symbiotic, commensal, and environmental bacteria. Interestingly, we identified the genera Anaplasma and Borrelia as members of the microbiota of I. ventalloi. These results advance our information on I. ventalloi microbiota composition, with potential implications in tick-host adaptation, geographic expansion, and vector competence.
Collapse
Affiliation(s)
- Sandra Díaz-Sánchez
- SaBio, Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005, Ciudad Real, Spain.
| | - Angélica Hernández-Jarguín
- SaBio, Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005, Ciudad Real, Spain
| | - Alessandra Torina
- Intituto Zooprofilattico Sperimentale della Sicilia, Via G. Marinuzzi no3, 90129, Palermo, Italy
| | - Isabel G Fernández de Mera
- SaBio, Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005, Ciudad Real, Spain
| | - Valeria Blanda
- Intituto Zooprofilattico Sperimentale della Sicilia, Via G. Marinuzzi no3, 90129, Palermo, Italy
| | - Santo Caracappa
- Intituto Zooprofilattico Sperimentale della Sicilia, Via G. Marinuzzi no3, 90129, Palermo, Italy
| | - Christian Gortazar
- SaBio, Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005, Ciudad Real, Spain
| | - José de la Fuente
- SaBio, Instituto de Investigación en Recursos Cinegéticos IREC-CSIC-UCLM-JCCM, Ronda de Toledo s/n, 13005, Ciudad Real, Spain; Department of Veterinary Pathobiology, Center for Veterinary Health Sciences, Oklahoma State University, Stillwater, OK, 74078, USA
| |
Collapse
|
14
|
Frith MC, Shrestha AMS. A Simplified Description of Child Tables for Sequence Similarity Search. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2067-2073. [PMID: 29994365 DOI: 10.1109/tcbb.2018.2796064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Finding related nucleotide or protein sequences is a fundamental, diverse, and incompletely-solved problem in bioinformatics. It is often tackled by seed-and-extend methods, which first find "seed" matches of diverse types, such as spaced seeds, subset seeds, or minimizers. Seeds are usually found using an index of the reference sequence(s), which stores seed positions in a suffix array or related data structure. A child table is a fundamental way to achieve fast lookup in an index, but previous descriptions have been overly complex. This paper aims to provide a more accessible description of child tables, and demonstrate their generality: they apply equally to all the above-mentioned seed types and more. We also show that child tables can be used without LCP (longest common prefix) tables, reducing the memory requirement.
Collapse
|
15
|
Abbas-Aghababazadeh F, Li Q, Fridley BL. Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing. PLoS One 2018; 13:e0206312. [PMID: 30379879 PMCID: PMC6209231 DOI: 10.1371/journal.pone.0206312] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 10/10/2018] [Indexed: 01/07/2023] Open
Abstract
Normalization of RNA-Seq data has proven essential to ensure accurate inferences and replication of findings. Hence, various normalization methods have been proposed for various technical artifacts that can be present in high-throughput sequencing transcriptomic studies. In this study, we set out to compare the widely used library size normalization methods (UQ, TMM, and RLE) and across sample normalization methods (SVA, RUV, and PCA) for RNA-Seq data using publicly available data from The Cancer Genome Atlas (TCGA) cervical cancer study. Additionally, an extensive simulation study was completed to compare the performance of the across sample normalization methods in estimating technical artifacts. Lastly, we investigated the effect of reduction in degrees of freedom in the normalized data and their impact on downstream differential expression analysis results. Based on this study, the TMM and RLE library size normalization methods give similar results for CESC dataset. In addition, the simulated datasets results show that the SVA ("BE") method outperforms the other methods (SVA "Leek", PCA) by correctly estimating the number of latent artifacts. Moreover, ignoring the loss of degrees of freedom due to normalization results in an inflated type I error rates. We recommend adjusting not only for library size differences but also the assessment of known and unknown technical artifacts in the data, and if needed, complete across sample normalization. In addition, we suggest that one includes the known and estimated latent artifacts in the design matrix to correctly account for the loss in degrees of freedom, as opposed to completing the analysis on the post-processed normalized data.
Collapse
Affiliation(s)
| | - Qian Li
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL, United States of America
- Health Informatics Institute, University of South Florida, Tampa, FL, United States of America
| | - Brooke L. Fridley
- Department of Biostatistics & Bioinformatics, Moffitt Cancer Center, Tampa, FL, United States of America
| |
Collapse
|
16
|
A Robust Methodology for Assessing Differential Homeolog Contributions to the Transcriptomes of Allopolyploids. Genetics 2018; 210:883-894. [PMID: 30213855 DOI: 10.1534/genetics.118.301564] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Accepted: 09/07/2018] [Indexed: 12/18/2022] Open
Abstract
Polyploidy has played a pivotal and recurring role in angiosperm evolution. Allotetraploids arise from hybridization between species and possess duplicated gene copies (homeologs) that serve redundant roles immediately after polyploidization. Although polyploidization is a major contributor to plant evolution, it remains poorly understood. We describe an analytical approach for assessing homeolog-specific expression that begins with de novo assembly of parental transcriptomes and effectively (i) reduces redundancy in de novo assemblies, (ii) identifies putative orthologs, (iii) isolates common regions between orthologs, and (iv) assesses homeolog-specific expression using a robust Bayesian Poisson-Gamma model to account for sequence bias when mapping polyploid reads back to parental references. Using this novel methodology, we examine differential homeolog contributions to the transcriptome in the recently formed allopolyploids Tragopogon mirus and T. miscellus (Compositae). Notably, we assess a larger Tragopogon gene set than previous studies of this system. Using carefully identified orthologous regions and filtering biased orthologs, we find in both allopolyploids largely balanced expression with no strong parental bias. These new methods can be used to examine homeolog expression in any tetrapolyploid system without requiring a reference genome.
Collapse
|
17
|
Suzuki A, Suzuki M, Mizushima-Sugano J, Frith MC, Makalowski W, Kohno T, Sugano S, Tsuchihara K, Suzuki Y. Sequencing and phasing cancer mutations in lung cancers using a long-read portable sequencer. DNA Res 2018; 24:585-596. [PMID: 29117310 PMCID: PMC5726485 DOI: 10.1093/dnares/dsx027] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2017] [Accepted: 05/29/2017] [Indexed: 01/18/2023] Open
Abstract
Here, we employed cDNA amplicon sequencing using a long-read portable sequencer, MinION, to characterize various types of mutations in cancer-related genes, namely, EGFR, KRAS, NRAS and NF1. For homozygous SNVs, the precision and recall rates were 87.5% and 91.3%, respectively. For previously reported hotspot mutations, the precision and recall rates reached 100%. The precise junctions of EML4-ALK, CCDC6-RET and five other gene fusions were also detected. Taking advantages of long-read sequencing, we conducted phasing of EGFR mutations and elucidated the mutational allelic backgrounds of anti-tumor drug-sensitive and resistant mutations, which could provide useful information for selecting therapeutic approaches. In the H1975 cells, 72% of the reads harbored both L858R and T790M mutations, and 22% of the reads harbored neither mutation. To ensure that the clinical requirements can be met in potentially low cancer cell populations, we further conducted a serial dilution analysis of the template for EGFR mutations. Several percent of the mutant alleles could be detected depending on the yield and quality of the sequencing data. Finally, we characterized the mutation genotypes in eight clinical samples. This method could be a convenient long-read sequencing-based analytical approach and thus may change the current approaches used for cancer genome sequencing.
Collapse
Affiliation(s)
- Ayako Suzuki
- Division of Translational Genomics, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
| | - Mizuto Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Junko Mizushima-Sugano
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.,Department of Chemistry and Life Science, Kogakuin University, Nishi-Shinjuku, Shinjuku-Ku, Tokyo, Japan
| | - Martin C Frith
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan.,Computational Biology Research Center, The National Institute for Advanced Industrial Science and Technology, Aomi, Koto-Ku, Tokyo, Japan
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Muenster, Munster, Germany
| | - Takashi Kohno
- Division of Genome Biology, National Cancer Center Research Institute, Tsukiji, Chuo-Ku, Tokyo, Japan
| | - Sumio Sugano
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| | - Katsuya Tsuchihara
- Division of Translational Genomics, Exploratory Oncology Research and Clinical Trial Center, National Cancer Center, Kashiwa, Chiba, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, Japan
| |
Collapse
|
18
|
Integrated metatranscriptomics and metaproteomics for the characterization of bacterial microbiota in unfed Ixodes ricinus. Ticks Tick Borne Dis 2018; 9:1241-1251. [DOI: 10.1016/j.ttbdis.2018.04.020] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 04/28/2018] [Accepted: 04/29/2018] [Indexed: 12/12/2022]
|
19
|
Lin HN, Hsu WL. Kart: a divide-and-conquer algorithm for NGS read alignment. Bioinformatics 2018; 33:2281-2287. [PMID: 28379292 PMCID: PMC5860120 DOI: 10.1093/bioinformatics/btx189] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Accepted: 04/05/2017] [Indexed: 02/02/2023] Open
Abstract
Motivation Next-generation sequencing (NGS) provides a great opportunity to investigate genome-wide variation at nucleotide resolution. Due to the huge amount of data, NGS applications require very fast and accurate alignment algorithms. Most existing algorithms for read mapping basically adopt seed-and-extend strategy, which is sequential in nature and takes much longer time on longer reads. Results We develop a divide-and-conquer algorithm, called Kart, which can process long reads as fast as short reads by dividing a read into small fragments that can be aligned independently. Our experiment result indicates that the average size of fragments requiring the more time-consuming gapped alignment is around 20 bp regardless of the original read length. Furthermore, it can tolerate much higher error rates. The experiments show that Kart spends much less time on longer reads than other aligners and still produce reliable alignments even when the error rate is as high as 15%. Availability and Implementation Kart is available at https://github.com/hsinnan75/Kart/ . Contact hsu@iis.sinica.edu.tw. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hsin-Nan Lin
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
20
|
Gan RC, Chen TW, Wu TH, Huang PJ, Lee CC, Yeh YM, Chiu CH, Huang HD, Tang P. PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms. BMC Bioinformatics 2016; 17:513. [PMID: 28155708 PMCID: PMC5260104 DOI: 10.1186/s12859-016-1366-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Background Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. Results Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. Conclusions In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw.
Collapse
Affiliation(s)
- Ruei-Chi Gan
- Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, 300, Taiwan.,Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Ting-Wen Chen
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Timothy H Wu
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei City, Taiwan
| | - Po-Jung Huang
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Yuan-Ming Yeh
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan
| | - Cheng-Hsun Chiu
- Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Hsien-Da Huang
- Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, 300, Taiwan. .,Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsin-Chu, 300, Taiwan.
| | - Petrus Tang
- Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan. .,Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan. .,Molecular Regulation & Bioinformatics Laboratory, Chang Gung University, Taoyuan, Taiwan.
| |
Collapse
|
21
|
Wattam AR, Davis JJ, Assaf R, Boisvert S, Brettin T, Bun C, Conrad N, Dietrich EM, Disz T, Gabbard JL, Gerdes S, Henry CS, Kenyon RW, Machi D, Mao C, Nordberg EK, Olsen GJ, Murphy-Olson DE, Olson R, Overbeek R, Parrello B, Pusch GD, Shukla M, Vonstein V, Warren A, Xia F, Yoo H, Stevens RL. Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res 2016; 45:D535-D542. [PMID: 27899627 PMCID: PMC5210524 DOI: 10.1093/nar/gkw1017] [Citation(s) in RCA: 1079] [Impact Index Per Article: 134.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Revised: 10/14/2016] [Accepted: 11/09/2016] [Indexed: 12/14/2022] Open
Abstract
The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by ‘virtual integration’ to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.
Collapse
Affiliation(s)
- Alice R Wattam
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - James J Davis
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Rida Assaf
- Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
| | | | - Thomas Brettin
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Christopher Bun
- Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
| | - Neal Conrad
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Emily M Dietrich
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Terry Disz
- Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Joseph L Gabbard
- Grado Department of Industrial & Systems Engineering, Virginia Tech, Blacksburg, VA 24060, USA
| | - Svetlana Gerdes
- Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Christopher S Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Ronald W Kenyon
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Dustin Machi
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Chunhong Mao
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Eric K Nordberg
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Gary J Olsen
- Department of Microbiology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Daniel E Murphy-Olson
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Robert Olson
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Ross Overbeek
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA.,Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Bruce Parrello
- Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA.,Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Gordon D Pusch
- Fellowship for Interpretation of Genomes, Burr Ridge, IL 60527, USA
| | - Maulik Shukla
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | | | - Andrew Warren
- Biocomplexity Institute, Virginia Tech University, Blacksburg, VA 24060, USA
| | - Fangfang Xia
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA
| | - Hyunseung Yoo
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Rick L Stevens
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.,Computing, Environment and Life Sciences, Argonne National Laboratory, Argonne, IL 60439, USA.,Department of Computer Science, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
22
|
Schmidt K, Mwaigwisya S, Crossman LC, Doumith M, Munroe D, Pires C, Khan AM, Woodford N, Saunders NJ, Wain J, O'Grady J, Livermore DM. Identification of bacterial pathogens and antimicrobial resistance directly from clinical urines by nanopore-based metagenomic sequencing. J Antimicrob Chemother 2016; 72:104-114. [PMID: 27667325 DOI: 10.1093/jac/dkw397] [Citation(s) in RCA: 208] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Revised: 08/09/2016] [Accepted: 08/21/2016] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVES The introduction of metagenomic sequencing to diagnostic microbiology has been hampered by slowness, cost and complexity. We explored whether MinION nanopore sequencing could accelerate diagnosis and resistance profiling, using complicated urinary tract infections as an exemplar. METHODS Bacterial DNA was enriched from clinical urines (n = 10) and from healthy urines 'spiked' with multiresistant Escherichia coli (n = 5), then sequenced by MinION. Sequences were analysed using external databases and bioinformatic pipelines or, ultimately, using integrated real-time analysis applications. Results were compared with Illumina data and resistance phenotypes. RESULTS MinION correctly identified pathogens without culture and, among 55 acquired resistance genes detected in the cultivated bacteria by Illumina sequencing, 51 were found by MinION sequencing directly from the urines; with three of the four failures in an early run with low genome coverage. Resistance-conferring mutations and allelic variants were not reliably identified. CONCLUSIONS MinION sequencing comprehensively identified pathogens and acquired resistance genes from urine in a timeframe similar to PCR (4 h from sample to result). Bioinformatic pipeline optimization is needed to better detect resistances conferred by point mutations. Metagenomic-sequencing-based diagnosis will enable clinicians to adjust antimicrobial therapy before the second dose of a typical (i.e. every 8 h) antibiotic.
Collapse
Affiliation(s)
- K Schmidt
- Norwich Medical School, University of East Anglia, Norwich, UK
| | - S Mwaigwisya
- Norwich Medical School, University of East Anglia, Norwich, UK
| | - L C Crossman
- SequenceAnalysis.co.uk, Norwich Research Park, Norwich, UK
| | - M Doumith
- AMRHAI Reference Unit, National Infection Service, Public Health England, London, UK
| | - D Munroe
- Microbiology Department, Norfolk and Norwich University Hospital, Norwich, UK
| | - C Pires
- Brunel University London, Uxbridge, UK
| | - A M Khan
- Brunel University London, Uxbridge, UK
| | - N Woodford
- AMRHAI Reference Unit, National Infection Service, Public Health England, London, UK
| | | | - J Wain
- Norwich Medical School, University of East Anglia, Norwich, UK
| | - J O'Grady
- Norwich Medical School, University of East Anglia, Norwich, UK
| | - D M Livermore
- Norwich Medical School, University of East Anglia, Norwich, UK.,AMRHAI Reference Unit, National Infection Service, Public Health England, London, UK
| |
Collapse
|
23
|
Buffering of Genetic Regulatory Networks in Drosophila melanogaster. Genetics 2016; 203:1177-90. [PMID: 27194752 DOI: 10.1534/genetics.116.188797] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 05/17/2016] [Indexed: 01/01/2023] Open
Abstract
Regulatory variation in gene expression can be described by cis- and trans-genetic components. Here we used RNA-seq data from a population panel of Drosophila melanogaster test crosses to compare allelic imbalance (AI) in female head tissue between mated and virgin flies, an environmental change known to affect transcription. Indeed, 3048 exons (1610 genes) are differentially expressed in this study. A Bayesian model for AI, with an intersection test, controls type I error. There are ∼200 genes with AI exclusively in mated or virgin flies, indicating an environmental component of expression regulation. On average 34% of genes within a cross and 54% of all genes show evidence for genetic regulation of transcription. Nearly all differentially regulated genes are affected in cis, with an average of 63% of expression variation explained by the cis-effects. Trans-effects explain 8% of the variance in AI on average and the interaction between cis and trans explains an average of 11% of the total variance in AI. In both environments cis- and trans-effects are compensatory in their overall effect, with a negative association between cis- and trans-effects in 85% of the exons examined. We hypothesize that the gene expression level perturbed by cis-regulatory mutations is compensated through trans-regulatory mechanisms, e.g., trans and cis by trans-factors buffering cis-mutations. In addition, when AI is detected in both environments, cis-mated, cis-virgin, and trans-mated-trans-virgin estimates are highly concordant with 99% of all exons positively correlated with a median correlation of 0.83 for cis and 0.95 for trans We conclude that the gene regulatory networks (GRNs) are robust and that trans-buffering explains robustness.
Collapse
|
24
|
Roy Chowdhury P, DeMaere M, Chapman T, Worden P, Charles IG, Darling AE, Djordjevic SP. Comparative genomic analysis of toxin-negative strains of Clostridium difficile from humans and animals with symptoms of gastrointestinal disease. BMC Microbiol 2016; 16:41. [PMID: 26971047 PMCID: PMC4789261 DOI: 10.1186/s12866-016-0653-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 03/02/2016] [Indexed: 12/13/2022] Open
Abstract
Background Clostridium difficile infections (CDI) are a significant health problem to humans and food animals. Clostridial toxins ToxA and ToxB encoded by genes tcdA and tcdB are located on a pathogenicity locus known as the PaLoc and are the major virulence factors of C. difficile. While toxin-negative strains of C. difficile are often isolated from faeces of animals and patients suffering from CDI, they are not considered to play a role in disease. Toxin-negative strains of C. difficile have been used successfully to treat recurring CDI but their propensity to acquire the PaLoc via lateral gene transfer and express clinically relevant levels of toxins has reinforced the need to characterise them genetically. In addition, further studies that examine the pathogenic potential of toxin-negative strains of C. difficile and the frequency by which toxin-negative strains may acquire the PaLoc are needed. Results We undertook a comparative genomic analysis of five Australian toxin-negative isolates of C. difficile that lack tcdA, tcdB and both binary toxin genes cdtA and cdtB that were recovered from humans and farm animals with symptoms of gastrointestinal disease. Our analyses show that the five C. difficile isolates cluster closely with virulent toxigenic strains of C. difficile belonging to the same sequence type (ST) and have virulence gene profiles akin to those in toxigenic strains. Furthermore, phage acquisition appears to have played a key role in the evolution of C. difficile. Conclusions Our results are consistent with the C. difficile global population structure comprising six clades each containing both toxin-positive and toxin-negative strains. Our data also suggests that toxin-negative strains of C. difficile encode a repertoire of putative virulence factors that are similar to those found in toxigenic strains of C. difficile, raising the possibility that acquisition of PaLoc by toxin-negative strains poses a threat to human health. Studies in appropriate animal models are needed to examine the pathogenic potential of toxin-negative strains of C. difficile and to determine the frequency by which toxin-negative strains may acquire the PaLoc. Electronic supplementary material The online version of this article (doi:10.1186/s12866-016-0653-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Piklu Roy Chowdhury
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia. .,NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, PMB 8, Camden, NSW, 2570, Australia.
| | - Matthew DeMaere
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia
| | - Toni Chapman
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, PMB 8, Camden, NSW, 2570, Australia
| | - Paul Worden
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia
| | - Ian G Charles
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia.,Institute of Food Research, Norwich Research Park, Colney, Norwich, NR4 7UA, UK
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia
| | - Steven P Djordjevic
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia.
| |
Collapse
|
25
|
Wei S, Williams Z. Rapid Short-Read Sequencing and Aneuploidy Detection Using MinION Nanopore Technology. Genetics 2016; 202:37-44. [PMID: 26500254 PMCID: PMC4701100 DOI: 10.1534/genetics.115.182311] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 10/20/2015] [Indexed: 12/30/2022] Open
Abstract
MinION is a memory stick-sized nanopore-based sequencer designed primarily for single-molecule sequencing of long DNA fragments (>6 kb). We developed a library preparation and data-analysis method to enable rapid real-time sequencing of short DNA fragments (<1 kb) that resulted in the sequencing of 500 reads in 3 min and 40,000-80,000 reads in 2-4 hr at a rate of 30 nt/sec. We then demonstrated the clinical applicability of this approach by performing successful aneuploidy detection in prenatal and miscarriage samples with sequencing in <4 hr. This method broadens the application of nanopore-based single-molecule sequencing and makes it a promising and versatile tool for rapid clinical and research applications.
Collapse
Affiliation(s)
- Shan Wei
- Department of Obstetrics and Gynecology and Women's Health, Albert Einstein College of Medicine. Bronx, New York 10461
| | - Zev Williams
- Department of Obstetrics and Gynecology and Women's Health, Albert Einstein College of Medicine. Bronx, New York 10461
| |
Collapse
|
26
|
Torreno O, Trelles O. Breaking the computational barriers of pairwise genome comparison. BMC Bioinformatics 2015; 16:250. [PMID: 26260162 PMCID: PMC4531504 DOI: 10.1186/s12859-015-0679-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 07/20/2015] [Indexed: 11/25/2022] Open
Abstract
Background Conventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcoming the barriers that limit the efficient computational analysis of large biological sequence datasets by retrofitting existing algorithms or by creating new applications represents a major challenge for the bioinformatics community. Results We have developed C libraries for pairwise sequence comparison within diverse architectures, ranging from commodity systems to high performance and cloud computing environments. Exhaustive tests were performed using different datasets of closely- and distantly-related sequences that span from small viral genomes to large mammalian chromosomes. The tests demonstrated that our solution is capable of generating high quality results with a linear-time response and controlled memory consumption, being comparable or faster than the current state-of-the-art methods. Conclusions We have addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size. The approach described here is based on a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs (HSPs) between the sequences under comparison. Software engineering concepts were applied to avoid intermediate result re-calculation, to minimise the performance impact of input/output (I/O) operations and to modularise the process, thus enhancing application flexibility and extendibility. Our computationally-efficient approach allows tasks such as the massive comparison of complete genomes, evolutionary event detection, the identification of conserved synteny blocks and inter-genome distance calculations to be performed more effectively. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0679-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Oscar Torreno
- Advanced Computing Technologies Unit, RISC Software GmbH, Softwarepark 35, Hagenberg, 4232, Austria
| | - Oswaldo Trelles
- Computer Architecture Department, University of Malaga, Bulevar Luis Pasteur 35, Malaga, 29071, Spain.
| |
Collapse
|
27
|
Frith MC, Kawaguchi R. Split-alignment of genomes finds orthologies more accurately. Genome Biol 2015; 16:106. [PMID: 25994148 PMCID: PMC4464727 DOI: 10.1186/s13059-015-0670-9] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 05/08/2015] [Indexed: 04/29/2023] Open
Abstract
We present a new pair-wise genome alignment method, based on a simple concept of finding an optimal set of local alignments. It gains accuracy by not masking repeats, and by using a statistical model to quantify the (un)ambiguity of each alignment part. Compared to previous animal genome alignments, it aligns thousands of locations differently and with much higher similarity, strongly suggesting that the previous alignments are non-orthologous. The previous methods suffer from an overly-strong assumption of long un-rearranged blocks. The new alignments should help find interesting and unusual features, such as fast-evolving elements and micro-rearrangements, which are confounded by alignment errors.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| | - Risa Kawaguchi
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan. .,Department of Computational Biology, Faculty of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan.
| |
Collapse
|
28
|
Sosa OA, Gifford SM, Repeta DJ, DeLong EF. High molecular weight dissolved organic matter enrichment selects for methylotrophs in dilution to extinction cultures. ISME JOURNAL 2015; 9:2725-39. [PMID: 25978545 PMCID: PMC4817625 DOI: 10.1038/ismej.2015.68] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 03/04/2015] [Accepted: 03/18/2015] [Indexed: 02/06/2023]
Abstract
The role of bacterioplankton in the cycling of marine dissolved organic matter (DOM) is central to the carbon and energy balance in the ocean, yet there are few model organisms available to investigate the genes, metabolic pathways, and biochemical mechanisms involved in the degradation of this globally important carbon pool. To obtain microbial isolates capable of degrading semi-labile DOM for growth, we conducted dilution to extinction cultivation experiments using seawater enriched with high molecular weight (HMW) DOM. In total, 93 isolates were obtained. Amendments using HMW DOM to increase the dissolved organic carbon concentration 4x (280 μM) or 10x (700 μM) the ocean surface water concentrations yielded positive growth in 4–6% of replicate dilutions, whereas <1% scored positive for growth in non-DOM-amended controls. The majority (71%) of isolates displayed a distinct increase in cell yields when grown in increasing concentrations of HMW DOM. Whole-genome sequencing was used to screen the culture collection for purity and to determine the phylogenetic identity of the isolates. Eleven percent of the isolates belonged to the gammaproteobacteria including Alteromonadales (the SAR92 clade) and Vibrio. Surprisingly, 85% of isolates belonged to the methylotrophic OM43 clade of betaproteobacteria, bacteria thought to metabolically specialize in degrading C1 compounds. Growth of these isolates on methanol confirmed their methylotrophic phenotype. Our results indicate that dilution to extinction cultivation enriched with natural sources of organic substrates has a potential to reveal the previously unsuspected relationships between naturally occurring organic nutrients and the microorganisms that consume them.
Collapse
Affiliation(s)
- Oscar A Sosa
- Center for Microbial Oceanography: Research and Education, University of Hawaii, Honolulu, HI, USA.,Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.,Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
| | - Scott M Gifford
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Daniel J Repeta
- Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
| | - Edward F DeLong
- Center for Microbial Oceanography: Research and Education, University of Hawaii, Honolulu, HI, USA.,Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
29
|
Bu D, Nan X, Wang F, Loor J, Wang J. Identification and characterization of microRNA sequences from bovine mammary epithelial cells. J Dairy Sci 2015; 98:1696-705. [DOI: 10.3168/jds.2014-8217] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 11/22/2014] [Indexed: 11/19/2022]
|
30
|
Jain M, Fiddes IT, Miga KH, Olsen HE, Paten B, Akeson M. Improved data analysis for the MinION nanopore sequencer. Nat Methods 2015; 12:351-6. [PMID: 25686389 PMCID: PMC4907500 DOI: 10.1038/nmeth.3290] [Citation(s) in RCA: 377] [Impact Index Per Article: 41.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2014] [Accepted: 01/20/2015] [Indexed: 12/31/2022]
Abstract
The Oxford Nanopore MinION sequences individual DNA molecules using an array of pores that read nucleotide identities based on ionic current steps. We evaluated and optimized MinION performance using M13 genomic dsDNA. Using expectation-maximization (EM) we obtained robust maximum likelihood (ML) estimates for read insertion, deletion and substitution error rates (4.9%, 7.8%, and 5.1% respectively). We found that 99% of high-quality ‘2D’ MinION reads mapped to reference at a mean identity of 85%. We present a MinION-tailored tool for single nucleotide variant (SNV) detection that uses ML parameter estimates and marginalization over many possible read alignments to achieve precision and recall of up to 99%. By pairing our high-confidence alignment strategy with long MinION reads, we resolved the copy number for a cancer/testis gene family (CT47) within an unresolved region of human chromosome Xq24.
Collapse
Affiliation(s)
- Miten Jain
- 1] UC Santa Cruz Genomics Institute, Santa Cruz, California, USA. [2] Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
| | - Ian T Fiddes
- 1] UC Santa Cruz Genomics Institute, Santa Cruz, California, USA. [2] Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
| | - Karen H Miga
- 1] UC Santa Cruz Genomics Institute, Santa Cruz, California, USA. [2] Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
| | - Hugh E Olsen
- 1] UC Santa Cruz Genomics Institute, Santa Cruz, California, USA. [2] Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
| | - Benedict Paten
- 1] UC Santa Cruz Genomics Institute, Santa Cruz, California, USA. [2] Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
| | - Mark Akeson
- 1] UC Santa Cruz Genomics Institute, Santa Cruz, California, USA. [2] Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
| |
Collapse
|
31
|
Zhao H, Chen J, Liu J, Han B. Transcriptome analysis reveals the oxidative stress response in Saccharomyces cerevisiae. RSC Adv 2015. [DOI: 10.1039/c4ra14600j] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A global regulatory network involving the response to the oxidation stress inSaccharomyces cerevisiaewas revealed in this study.
Collapse
Affiliation(s)
- Hongwei Zhao
- Beijing Laboratory for Food Quality and Safety
- College of Food Science and Nutritional Engineering
- China Agricultural University
- Beijing
- China
| | - Jingyu Chen
- Beijing Laboratory for Food Quality and Safety
- College of Food Science and Nutritional Engineering
- China Agricultural University
- Beijing
- China
| | - Jingjing Liu
- Beijing Laboratory for Food Quality and Safety
- College of Food Science and Nutritional Engineering
- China Agricultural University
- Beijing
- China
| | - Beizhong Han
- Beijing Laboratory for Food Quality and Safety
- College of Food Science and Nutritional Engineering
- China Agricultural University
- Beijing
- China
| |
Collapse
|
32
|
Bustin SA. The reproducibility of biomedical research: Sleepers awake! BIOMOLECULAR DETECTION AND QUANTIFICATION 2014; 2:35-42. [PMID: 27896142 PMCID: PMC5121206 DOI: 10.1016/j.bdq.2015.01.002] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Revised: 01/08/2015] [Accepted: 01/12/2015] [Indexed: 01/03/2023]
Abstract
There is increasing concern about the reliability of biomedical research, with recent articles suggesting that up to 85% of research funding is wasted. This article argues that an important reason for this is the inappropriate use of molecular techniques, particularly in the field of RNA biomarkers, coupled with a tendency to exaggerate the importance of research findings.
Collapse
Affiliation(s)
- Stephen A. Bustin
- Faculty of Medical Science, Postgraduate Medical Institute, Anglia Ruskin University, Chelmsford CM1 1SQ, UK
| |
Collapse
|
33
|
León-Novelo LG, McIntyre LM, Fear JM, Graze RM. A flexible Bayesian method for detecting allelic imbalance in RNA-seq data. BMC Genomics 2014; 15:920. [PMID: 25339465 PMCID: PMC4230747 DOI: 10.1186/1471-2164-15-920] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 10/09/2014] [Indexed: 01/01/2023] Open
Abstract
Background One method of identifying cis regulatory differences is to analyze allele-specific expression (ASE) and identify cases of allelic imbalance (AI). RNA-seq is the most common way to measure ASE and a binomial test is often applied to determine statistical significance of AI. This implicitly assumes that there is no bias in estimation of AI. However, bias has been found to result from multiple factors including: genome ambiguity, reference quality, the mapping algorithm, and biases in the sequencing process. Two alternative approaches have been developed to handle bias: adjusting for bias using a statistical model and filtering regions of the genome suspected of harboring bias. Existing statistical models which account for bias rely on information from DNA controls, which can be cost prohibitive for large intraspecific studies. In contrast, data filtering is inexpensive and straightforward, but necessarily involves sacrificing a portion of the data. Results Here we propose a flexible Bayesian model for analysis of AI, which accounts for bias and can be implemented without DNA controls. In lieu of DNA controls, this Poisson-Gamma (PG) model uses an estimate of bias from simulations. The proposed model always has a lower type I error rate compared to the binomial test. Consistent with prior studies, bias dramatically affects the type I error rate. All of the tested models are sensitive to misspecification of bias. The closer the estimate of bias is to the true underlying bias, the lower the type I error rate. Correct estimates of bias result in a level alpha test. Conclusions To improve the assessment of AI, some forms of systematic error (e.g., map bias) can be identified using simulation. The resulting estimates of bias can be used to correct for bias in the PG model, without data filtering. Other sources of bias (e.g., unidentified variant calls) can be easily captured by DNA controls, but are missed by common filtering approaches. Consequently, as variant identification improves, the need for DNA controls will be reduced. Filtering does not significantly improve performance and is not recommended, as information is sacrificed without a measurable gain. The PG model developed here performs well when bias is known, or slightly misspecified. The model is flexible and can accommodate differences in experimental design and bias estimation. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-920) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Rita M Graze
- Department of Biological Sciences, Auburn University, 101 Rouse Life Science Building, 36849 Auburn, AL, USA.
| |
Collapse
|
34
|
Chong LC, Albuquerque MA, Harding NJ, Caloian C, Chan-Seng-Yue M, de Borja R, Fraser M, Denroche RE, Beck TA, van der Kwast T, Bristow RG, McPherson JD, Boutros PC. SeqControl: process control for DNA sequencing. Nat Methods 2014; 11:1071-5. [PMID: 25173705 DOI: 10.1038/nmeth.3094] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 07/27/2014] [Indexed: 12/15/2022]
Abstract
As high-throughput sequencing continues to increase in speed and throughput, routine clinical and industrial application draws closer. These 'production' settings will require enhanced quality monitoring and quality control to optimize output and reduce costs. We developed SeqControl, a framework for predicting sequencing quality and coverage using a set of 15 metrics describing overall coverage, coverage distribution, basewise coverage and basewise quality. Using whole-genome sequences of 27 prostate cancers and 26 normal references, we derived multivariate models that predict sequencing quality and depth. SeqControl robustly predicted how much sequencing was required to reach a given coverage depth (area under the curve (AUC) = 0.993), accurately classified clinically relevant formalin-fixed, paraffin-embedded samples, and made predictions from as little as one-eighth of a sequencing lane (AUC = 0.967). These techniques can be immediately incorporated into existing sequencing pipelines to monitor data quality in real time. SeqControl is available at http://labs.oicr.on.ca/Boutros-lab/software/SeqControl/.
Collapse
Affiliation(s)
- Lauren C Chong
- Informatics &Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Marco A Albuquerque
- Informatics &Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Nicholas J Harding
- Informatics &Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Cristian Caloian
- Informatics &Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Michelle Chan-Seng-Yue
- Informatics &Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Richard de Borja
- Informatics &Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Michael Fraser
- Department of Pathology, University Health Network, Toronto, Ontario, Canada
| | - Robert E Denroche
- Informatics &Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Timothy A Beck
- Informatics &Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Robert G Bristow
- 1] Ontario Cancer Institute, University Health Network, Toronto, Ontario, Canada. [2] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - John D McPherson
- 1] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. [2] Genomics Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Paul C Boutros
- 1] Informatics &Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario, Canada. [2] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. [3] Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
35
|
Kerpedjiev P, Frellsen J, Lindgreen S, Krogh A. Adaptable probabilistic mapping of short reads using position specific scoring matrices. BMC Bioinformatics 2014; 15:100. [PMID: 24717095 PMCID: PMC4021105 DOI: 10.1186/1471-2105-15-100] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 03/28/2014] [Indexed: 11/10/2022] Open
Abstract
Background Modern DNA sequencing methods produce vast amounts of data that often requires mapping to a reference genome. Most existing programs use the number of mismatches between the read and the genome as a measure of quality. This approach is without a statistical foundation and can for some data types result in many wrongly mapped reads. Here we present a probabilistic mapping method based on position-specific scoring matrices, which can take into account not only the quality scores of the reads but also user-specified models of evolution and data-specific biases. Results We show how evolution, data-specific biases, and sequencing errors are naturally dealt with probabilistically. Our method achieves better results than Bowtie and BWA on simulated and real ancient and PAR-CLIP reads, as well as on simulated reads from the AT rich organism P. falciparum, when modeling the biases of these data. For simulated Illumina reads, the method has consistently higher sensitivity for both single-end and paired-end data. We also show that our probabilistic approach can limit the problem of random matches from short reads of contamination and that it improves the mapping of real reads from one organism (D. melanogaster) to a related genome (D. simulans). Conclusion The presented work is an implementation of a novel approach to short read mapping where quality scores, prior mismatch probabilities and mapping qualities are handled in a statistically sound manner. The resulting implementation provides not only a tool for biologists working with low quality and/or biased sequencing data but also a demonstration of the feasibility of using a probability based alignment method on real and simulated data sets.
Collapse
Affiliation(s)
| | | | | | - Anders Krogh
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen, Denmark.
| |
Collapse
|
36
|
Saito Y, Tsuji J, Mituyama T. Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions. Nucleic Acids Res 2014; 42:e45. [PMID: 24423865 PMCID: PMC3973284 DOI: 10.1093/nar/gkt1373] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Analysis of bisulfite sequencing data usually requires two tasks: to call methylated cytosines (mCs) in a sample, and to detect differentially methylated regions (DMRs) between paired samples. Although numerous tools have been proposed for mC calling, methods for DMR detection have been largely limited. Here, we present Bisulfighter, a new software package for detecting mCs and DMRs from bisulfite sequencing data. Bisulfighter combines the LAST alignment tool for mC calling, and a novel framework for DMR detection based on hidden Markov models (HMMs). Unlike previous attempts that depend on empirical parameters, Bisulfighter can use the expectation-maximization algorithm for HMMs to adjust parameters for each data set. We conduct extensive experiments in which accuracy of mC calling and DMR detection is evaluated on simulated data with various mC contexts, read qualities, sequencing depths and DMR lengths, as well as on real data from a wide range of biological processes. We demonstrate that Bisulfighter consistently achieves better accuracy than other published tools, providing greater sensitivity for mCs with fewer false positives, more precise estimates of mC levels, more exact locations of DMRs and better agreement of DMRs with gene expression and DNase I hypersensitivity. The source code is available at http://epigenome.cbrc.jp/bisulfighter.
Collapse
Affiliation(s)
- Yutaka Saito
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan, Japan Science and Technology Agency, CREST, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan and Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, MA 01655, USA
| | | | | |
Collapse
|
37
|
Hong C, Clement NL, Clement S, Hammoud SS, Carrell DT, Cairns BR, Snell Q, Clement MJ, Johnson WE. Probabilistic alignment leads to improved accuracy and read coverage for bisulfite sequencing data. BMC Bioinformatics 2013; 14:337. [PMID: 24261665 PMCID: PMC3924334 DOI: 10.1186/1471-2105-14-337] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 11/19/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA methylation has been linked to many important biological phenomena. Researchers have recently begun to sequence bisulfite treated DNA to determine its pattern of methylation. However, sequencing reads from bisulfite-converted DNA can vary significantly from the reference genome because of incomplete bisulfite conversion, genome variation, sequencing errors, and poor quality bases. Therefore, it is often difficult to align reads to the correct locations in the reference genome. Furthermore, bisulfite sequencing experiments have the additional complexity of having to estimate the DNA methylation levels within the sample. RESULTS Here, we present a highly accurate probabilistic algorithm, which is an extension of the Genomic Next-generation Universal MAPper to accommodate bisulfite sequencing data (GNUMAP-bs), that addresses the computational problems associated with aligning bisulfite sequencing data to a reference genome. GNUMAP-bs integrates uncertainty from read and mapping qualities to help resolve the difference between poor quality bases and the ambiguity inherent in bisulfite conversion. We tested GNUMAP-bs and other commonly-used bisulfite alignment methods using both simulated and real bisulfite reads and found that GNUMAP-bs and other dynamic programming methods were more accurate than the more heuristic methods. CONCLUSIONS The GNUMAP-bs aligner is a highly accurate alignment approach for processing the data from bisulfite sequencing experiments. The GNUMAP-bs algorithm is freely available for download at: http://dna.cs.byu.edu/gnumap. The software runs on multiple threads and multiple processors to increase the alignment speed.
Collapse
Affiliation(s)
- Changjin Hong
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| | - Nathan L Clement
- Department of Computer Science, University of Texas, Austin, TX, USA
| | - Spencer Clement
- Department of Computer Science, Brigham Young University, Provo, UT, USA
| | - Saher Sue Hammoud
- IVF and Andrology Laboratories, Departments of Surgery, Obstetrics and Gynecology, and Physiology, University of Utah School of Medicine, Salt Lake City, UT, USA
- Department of Oncological Sciences, Huntsman Cancer Institute, Salt Lake City, UT, USA
| | - Douglas T Carrell
- IVF and Andrology Laboratories, Departments of Surgery, Obstetrics and Gynecology, and Physiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Bradley R Cairns
- Department of Oncological Sciences, Huntsman Cancer Institute, Salt Lake City, UT, USA
| | - Quinn Snell
- Department of Computer Science, Brigham Young University, Provo, UT, USA
| | - Mark J Clement
- Department of Computer Science, Brigham Young University, Provo, UT, USA
| | - William Evan Johnson
- Division of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA
| |
Collapse
|
38
|
Dalton JE, Fear JM, Knott S, Baker BS, McIntyre LM, Arbeitman MN. Male-specific Fruitless isoforms have different regulatory roles conferred by distinct zinc finger DNA binding domains. BMC Genomics 2013; 14:659. [PMID: 24074028 PMCID: PMC3852243 DOI: 10.1186/1471-2164-14-659] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/20/2013] [Indexed: 11/25/2022] Open
Abstract
Background Drosophila melanogaster adult males perform an elaborate courtship ritual to entice females to mate. fruitless (fru), a gene that is one of the key regulators of male courtship behavior, encodes multiple male-specific isoforms (FruM). These isoforms vary in their carboxy-terminal zinc finger domains, which are predicted to facilitate DNA binding. Results By over-expressing individual FruM isoforms in fru-expressing neurons in either males or females and assaying the global transcriptional response by RNA-sequencing, we show that three FruM isoforms have different regulatory activities that depend on the sex of the fly. We identified several sets of genes regulated downstream of FruM isoforms, including many annotated with neuronal functions. By determining the binding sites of individual FruM isoforms using SELEX we demonstrate that the distinct zinc finger domain of each FruM isoforms confers different DNA binding specificities. A genome-wide search for these binding site sequences finds that the gene sets identified as induced by over-expression of FruM isoforms in males are enriched for genes that contain the binding sites. An analysis of the chromosomal distribution of genes downstream of FruM shows that those that are induced and repressed in males are highly enriched and depleted on the X chromosome, respectively. Conclusions This study elucidates the different regulatory and DNA binding activities of three FruM isoforms on a genome-wide scale and identifies genes regulated by these isoforms. These results add to our understanding of sex chromosome biology and further support the hypothesis that in some cell-types genes with male-biased expression are enriched on the X chromosome.
Collapse
Affiliation(s)
- Justin E Dalton
- Biomedical Sciences Department and Program in Neuroscience, Florida State University, College of Medicine, Tallahassee, FL 32303, USA.
| | | | | | | | | | | |
Collapse
|
39
|
Mahmud MP, Wiedenhoeft J, Schliep A. Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees. Bioinformatics 2013; 28:i325-i332. [PMID: 22962448 PMCID: PMC3436807 DOI: 10.1093/bioinformatics/bts380] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Motivation: Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics. Results: For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L1 distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L1 distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants. Availability and implementation: TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net. Contact:pavelm@cs.rutgers.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Md Pavel Mahmud
- Department of Computer Science, Rutgers University, New Jersey, USA.
| | | | | |
Collapse
|
40
|
Umemura M, Koyama Y, Takeda I, Hagiwara H, Ikegami T, Koike H, Machida M. Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40. PLoS One 2013; 8:e63673. [PMID: 23667655 PMCID: PMC3646829 DOI: 10.1371/journal.pone.0063673] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Accepted: 04/05/2013] [Indexed: 11/18/2022] Open
Abstract
The development of next-generation sequencing (NGS) technologies has dramatically increased the throughput, speed, and efficiency of genome sequencing. The short read data generated from NGS platforms, such as SOLiD and Illumina, are quite useful for mapping analysis. However, the SOLiD read data with lengths of <60 bp have been considered to be too short for de novo genome sequencing. Here, to investigate whether de novo sequencing of fungal genomes is possible using only SOLiD short read sequence data, we performed de novo assembly of the Aspergillus oryzae RIB40 genome using only SOLiD read data of 50 bp generated from mate-paired libraries with 2.8- or 1.9-kb insert sizes. The assembled scaffolds showed an N50 value of 1.6 Mb, a 22-fold increase than those obtained using only SOLiD short read in other published reports. In addition, almost 99% of the reference genome was accurately aligned by the assembled scaffold fragments in long lengths. The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds. Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi. We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33.
Collapse
Affiliation(s)
- Myco Umemura
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Hokkaido, Japan
| | - Yoshinori Koyama
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
| | - Itaru Takeda
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, Koganei, Tokyo, Japan
| | - Hiroko Hagiwara
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
| | - Tsutomu Ikegami
- Information Technology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan
| | - Hideaki Koike
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
| | - Masayuki Machida
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Hokkaido, Japan
- Department of Biotechnology and Life Science, Tokyo University of Agriculture and Technology, Koganei, Tokyo, Japan
| |
Collapse
|
41
|
Shrestha AMS, Frith MC. An approximate Bayesian approach for mapping paired-end DNA reads to a reference genome. ACTA ACUST UNITED AC 2013; 29:965-72. [PMID: 23413433 PMCID: PMC3624798 DOI: 10.1093/bioinformatics/btt073] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Summary: Many high-throughput sequencing experiments produce paired DNA reads. Paired-end DNA reads provide extra positional information that is useful in reliable mapping of short reads to a reference genome, as well as in downstream analyses of structural variations. Given the importance of paired-end alignments, it is surprising that there have been no previous publications focusing on this topic. In this article, we present a new probabilistic framework to predict the alignment of paired-end reads to a reference genome. Using both simulated and real data, we compare the performance of our method with six other read-mapping tools that provide a paired-end option. We show that our method provides a good combination of accuracy, error rate and computation time, especially in more challenging and practical cases, such as when the reference genome is incomplete or unavailable for the sample, or when there are large variations between the reference genome and the source of the reads. An open-source implementation of our method is available as part of Last, a multi-purpose alignment program freely available at http://last.cbrc.jp. Contact:martin@cbrc.jp Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anish Man Singh Shrestha
- Computational Biology Research Center, National Institute for Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo, Japan
| | | |
Collapse
|
42
|
Improved base-calling and quality scores for 454 sequencing based on a Hurdle Poisson model. BMC Bioinformatics 2012; 13:303. [PMID: 23151247 PMCID: PMC3534400 DOI: 10.1186/1471-2105-13-303] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2012] [Accepted: 11/01/2012] [Indexed: 11/25/2022] Open
Abstract
Background 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. Results We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. Conclusions Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies.
Collapse
|
43
|
Umemura M, Koike H, Yamane N, Koyama Y, Satou Y, Kikuzato I, Teruya M, Tsukahara M, Imada Y, Wachi Y, Miwa Y, Yano S, Tamano K, Kawarabayasi Y, Fujimori KE, Machida M, Hirano T. Comparative genome analysis between Aspergillus oryzae strains reveals close relationship between sites of mutation localization and regions of highly divergent genes among Aspergillus species. DNA Res 2012; 19:375-82. [PMID: 22912434 PMCID: PMC3473370 DOI: 10.1093/dnares/dss019] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome.
Collapse
Affiliation(s)
- Myco Umemura
- Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Higashi-Nijo 17-2-1, Tsukisamu, Sapporo, Hokkaido 062-8517, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Frith MC, Mori R, Asai K. A mostly traditional approach improves alignment of bisulfite-converted DNA. Nucleic Acids Res 2012; 40:e100. [PMID: 22457070 PMCID: PMC3401460 DOI: 10.1093/nar/gks275] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Cytosines in genomic DNA are sometimes methylated. This affects many biological processes and diseases. The standard way of measuring methylation is to use bisulfite, which converts unmethylated cytosines to thymines, then sequence the DNA and compare it to a reference genome sequence. We describe a method for the critical step of aligning the DNA reads to the correct genomic locations. Our method builds on classic alignment techniques, including likelihood-ratio scores and spaced seeds. In a realistic benchmark, our method has a better combination of sensitivity, specificity and speed than nine other high-throughput bisulfite aligners. This study enables more accurate and rational analysis of DNA methylation. It also illustrates how to adapt general-purpose alignment methods to a special case with distorted base patterns: this should be informative for other special cases such as ancient DNA and AT-rich genomes.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center, National Institute for Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
45
|
Peláez P, Trejo MS, Iñiguez LP, Estrada-Navarrete G, Covarrubias AA, Reyes JL, Sanchez F. Identification and characterization of microRNAs in Phaseolus vulgaris by high-throughput sequencing. BMC Genomics 2012; 13:83. [PMID: 22394504 PMCID: PMC3359237 DOI: 10.1186/1471-2164-13-83] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 03/06/2012] [Indexed: 12/16/2022] Open
Abstract
Background MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression. MiRNAs play essential roles in almost all plant biological processes. Currently, few miRNAs have been identified in the model food legume Phaseolus vulgaris (common bean). Recent advances in next generation sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used Illumina's sequencing by synthesis (SBS) technology to identify and characterize the miRNA population of Phaseolus vulgaris. Results Small RNA libraries were generated from roots, flowers, leaves, and seedlings of P. vulgaris. Based on similarity to previously reported plant miRNAs,114 miRNAs belonging to 33 conserved miRNA families were identified. Stem-loop precursors and target gene sequences for several conserved common bean miRNAs were determined from publicly available databases. Less conserved miRNA families and species-specific common bean miRNA isoforms were also characterized. Moreover, novel miRNAs based on the small RNAs were found and their potential precursors were predicted. In addition, new target candidates for novel and conserved miRNAs were proposed. Finally, we studied organ-specific miRNA family expression levels through miRNA read frequencies. Conclusions This work represents the first massive-scale RNA sequencing study performed in Phaseolus vulgaris to identify and characterize its miRNA population. It significantly increases the number of miRNAs, precursors, and targets identified in this agronomically important species. The miRNA expression analysis provides a foundation for understanding common bean miRNA organ-specific expression patterns. The present study offers an expanded picture of P. vulgaris miRNAs in relation to those of other legumes.
Collapse
Affiliation(s)
- Pablo Peláez
- Departamento de Biología Molecular de Plantas, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | | | | | | | | | | | | |
Collapse
|
46
|
Graze RM, Novelo LL, Amin V, Fear JM, Casella G, Nuzhdin SV, McIntyre LM. Allelic imbalance in Drosophila hybrid heads: exons, isoforms, and evolution. Mol Biol Evol 2012; 29:1521-32. [PMID: 22319150 DOI: 10.1093/molbev/msr318] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Unraveling how regulatory divergence contributes to species differences and adaptation requires identifying functional variants from among millions of genetic differences. Analysis of allelic imbalance (AI) reveals functional genetic differences in cis regulation and has demonstrated differences in cis regulation within and between species. Regulatory mechanisms are often highly conserved, yet differences between species in gene expression are extensive. What evolutionary forces explain widespread divergence in cis regulation? AI was assessed in Drosophila melanogaster-Drosophila simulans hybrid female heads using RNA-seq technology. Mapping bias was virtually eliminated by using genotype-specific references. Allele representation in DNA sequencing was used as a prior in a novel Bayesian model for the estimation of AI in RNA. Cis regulatory divergence was common in the organs and tissues of the head with 41% of genes analyzed showing significant AI. Using existing population genomic data, the relationship between AI and patterns of sequence evolution was examined. Evidence of positive selection was found in 30% of cis regulatory divergent genes. Genes involved in defense, RNAi/RISC complex genes, and those that are sex regulated are enriched among adaptively evolving cis regulatory divergent genes. For genes in these groups, adaptive evolution may play a role in regulatory divergence between species. However, there is no evidence that adaptive evolution drives most of the cis regulatory divergence that is observed. The majority of genes showed patterns consistent with stabilizing selection and neutral evolutionary processes.
Collapse
Affiliation(s)
- R M Graze
- Department of Molecular Genetics and Microbiology, University of Florida, USA
| | | | | | | | | | | | | |
Collapse
|
47
|
Abstract
The recent development of next-generation sequencing (NGS) technologies allowed various authors to imagine, test, and validate new approaches for TE analysis, in their nature, type, activity, or quantity. In this chapter, we describe briefly the technologies used, then the various approaches and methods used already, and finally some potential new methods. In contrast to the more molecular chapters of the book, the approaches described here are purely bioinformatics, and have a set of NGS data as a starting point. Moreover, as these analyses are quite recent in the field, most of them were only performed once, and we cannot be sure that they could be reused in other species or context than the original one. However, there are a lot of interesting approaches and results that NGS can provide in the TE field.
Collapse
Affiliation(s)
- Cristian Chaparro
- UMR LGDP, CNRS/UPVD, Université de Perpignan Via Domitia, Perpignan Cedex, France
| | | |
Collapse
|
48
|
Yang Y, Graze RM, Walts BM, Lopez CM, Baker HV, Wayne ML, Nuzhdin SV, McIntyre LM. Partitioning transcript variation in Drosophila: abundance, isoforms, and alleles. G3 (BETHESDA, MD.) 2011; 1:427-36. [PMID: 22384353 PMCID: PMC3276160 DOI: 10.1534/g3.111.000596] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2011] [Accepted: 09/11/2011] [Indexed: 12/25/2022]
Abstract
Multilevel analysis of transcription is facilitated by a new array design that includes modules for assessment of differential expression, isoform usage, and allelic imbalance in Drosophila. The ∼2.5 million feature chip incorporates a large number of controls, and it contains 18,769 3' expression probe sets and 61,919 exon probe sets with probe sequences from Drosophila melanogaster and 60,118 SNP probe sets focused on Drosophila simulans. An experiment in D. simulans identified genes differentially expressed between males and females (34% in the 3' expression module; 32% in the exon module). These proportions are consistent with previous reports, and there was good agreement (κ = 0.63) between the modules. Alternative isoform usage between the sexes was identified for 164 genes. The SNP module was verified with resequencing data. Concordance between resequencing and the chip design was greater than 99%. The design also proved apt in separating alleles based upon hybridization intensity. Concordance between the highest hybridization signals and the expected alleles in the genotype was greater than 96%. Intriguingly, allelic imbalance was detected for 37% of 6579 probe sets examined that contained heterozygous SNP loci. The large number of probes and multiple probe sets per gene in the 3' expression and exon modules allows the array to be used in D. melanogaster and in closely related species. The SNP module can be used for allele specific expression and genotyping of D. simulans.
Collapse
Affiliation(s)
- Yajie Yang
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
| | - Rita M. Graze
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
| | - Brandon M. Walts
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
| | - Cecilia M. Lopez
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
| | - Henry V. Baker
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
| | - Marta L. Wayne
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Zoology, University of Florida, Gainesville, FL, 32611-8525
| | - Sergey V. Nuzhdin
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089-2910
| | - Lauren M. McIntyre
- Genetics Institute, University of Florida, Gainesville, FL 32610-3610
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32610-0266
- Department of Statistics, University of Florida, Gainesville, FL 32611-8545
| |
Collapse
|
49
|
Hamada M, Wijaya E, Frith MC, Asai K. Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection. ACTA ACUST UNITED AC 2011; 27:3085-92. [PMID: 21976422 DOI: 10.1093/bioinformatics/btr537] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
MOTIVATION Recent studies have revealed the importance of considering quality scores of reads generated by next-generation sequence (NGS) platforms in various downstream analyses. It is also known that probabilistic alignments based on marginal probabilities (e.g. aligned-column and/or gap probabilities) provide more accurate alignment than conventional maximum score-based alignment. There exists, however, no study about probabilistic alignment that considers quality scores explicitly, although the method is expected to be useful in SNP/indel callers and bisulfite mapping, because accurate estimation of aligned columns or gaps is important in those analyses. RESULTS In this study, we propose methods of probabilistic alignment that consider quality scores of (one of) the sequences as well as a usual score matrix. The method is based on posterior decoding techniques in which various marginal probabilities are computed from a probabilistic model of alignments with quality scores, and can arbitrarily trade-off sensitivity and positive predictive value (PPV) of prediction (aligned columns and gaps). The method is directly applicable to read mapping (alignment) toward accurate detection of SNPs and indels. Several computational experiments indicated that probabilistic alignments can estimate aligned columns and gaps accurately, compared with other mapping algorithms e.g. SHRiMP2, Stampy, BWA and Novoalign. The study also suggested that our approach yields favorable precision for SNP/indel calling.
Collapse
Affiliation(s)
- Michiaki Hamada
- Graduate School of Frontier Sciences, University of Tokyo, Kashiwa 277-8562, Japan.
| | | | | | | |
Collapse
|
50
|
Kazlauskas D, Venclovas C. Computational analysis of DNA replicases in double-stranded DNA viruses: relationship with the genome size. Nucleic Acids Res 2011; 39:8291-305. [PMID: 21742758 PMCID: PMC3201878 DOI: 10.1093/nar/gkr564] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Genome duplication in free-living cellular organisms is performed by DNA replicases that always include a DNA polymerase, a DNA sliding clamp and a clamp loader. What are the evolutionary solutions for DNA replicases associated with smaller genomes? Are there some general principles? To address these questions we analyzed DNA replicases of double-stranded (ds) DNA viruses. In the process we discovered highly divergent B-family DNA polymerases in phiKZ-like phages and remote sliding clamp homologs in Ascoviridae family and Ma-LMM01 phage. The analysis revealed a clear dependency between DNA replicase components and the viral genome size. As the genome size increases, viruses universally encode their own DNA polymerases and frequently have homologs of DNA sliding clamps, which sometimes are accompanied by clamp loader subunits. This pattern is highly non-random. The absence of sliding clamps in large viral genomes usually coincides with the presence of atypical polymerases. Meanwhile, sliding clamp homologs, not accompanied by clamp loaders, have an elevated positive electrostatic potential, characteristic of non-ring viral processivity factors that bind the DNA directly. Unexpectedly, we found that similar electrostatic properties are shared by the eukaryotic 9-1-1 clamp subunits, Hus1 and, to a lesser extent, Rad9, also suggesting the possibility of direct DNA binding.
Collapse
Affiliation(s)
- Darius Kazlauskas
- Institute of Biotechnology, Vilnius University, Graičiūno 8, LT-02241 Vilnius, Lithuania
| | | |
Collapse
|