Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

31
(from Reference Citation Analysis)

Article PDFs (19)

Cited by > 0 (25)

Searched Name

Kirill Kryukov

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Functional genomic analysis of the isolated potential probiotic Lactobacillus delbrueckii subsp. indicus TY-11 and its comparison with other Lactobacillus delbrueckii strains. Microbiol Spectr 2024:e0347023. [PMID: 38771133 DOI: 10.1128/spectrum.03470-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 04/10/2024] [Indexed: 05/22/2024] Open Abstract Probiotics refer to living microorganisms that exert a variety of beneficial effects on human health. On the contrary, they also can cause infection, produce toxins within the body, and transfer antibiotic-resistant genes to the other microorganisms in the digestive tract necessitating a comprehensive safety assessment. This study aimed to conduct functional genomic analysis and some relevant biochemical tests to uncover the probiotic potentials of Lactobacillus delbrueckii subsp. indicus TY-11 isolated from native yogurt in Bangladesh. We also performed transmission electron microscopic (TEM) analysis, comparative genomic study as well as phylogenetic tree construction with 332 core genes from 262 genomes. The strain TY-11 was identified as Lactobacillus delbrueckii subsp. indicus, whose genome (1,916,674 bp) contained 1911 CDS, and no gene was identified for either antibiotic resistance or toxic metabolites. It carried genes for the degradation of toxic metabolites, treatment of lactose intolerance, toll-like receptor 2-dependent innate immune response, heat and cold shock, bile salts tolerance, and acidic pH tolerance. Genes were annotated for inhibiting pathogenic bacteria by inhibitory substances [bacteriocin: Helveticin-J (331 bp) and Enterolysin-A (275 bp), hydrogen peroxide, and acid]; blockage of adhesion sites; and competition for nutrients. The genes involved in its metabolic pathway were detected as suitable for digesting indigestible nutrients in the human gut. The TY-11 genome possessed an additional 37 core genes of subspecies indicus which were deficient in the core genome of the most popular subsp. bulgaricus. During the phenotypic testing, the isolate TY-11 demonstrated high antagonistic activity (inhibition zone of 21.33 ± 1.53 mm) against Escherichia coli ATCC 8739 and was not sensitive to any of the 10 tested antibiotics. This study was the first study to explore the molecular insights into probiotic roles, including antimicrobial activities and antibiotic sensitivity, of a representative strain (TY-11) of Lactobacillus delbrueckii subsp. indicus. IMPORTANCE This study aimed to conduct functional genomic analysis to uncover the probiotic potential of Lactobacillus delbrueckii subsp. indicus TY-11 isolated from native yogurt in Bangladesh. We also performed transmission electron microscopic (TEM) analysis, comparative genomic study as well as phylogenetic tree construction with 332 core genes from 262 genomes. In our current investigation, we revealed a number of common and unique excellences of the probiotic Lactobacillus delbrueckii subsp. indicus TY-11 that are likely to be important to illustrate its intestinal residence and probiotic roles. This is the first study to explore the molecular insights into intestinal residence and probiotic roles, including antimicrobial activities and antibiotic sensitivity, of a representative strain (TY-11) of Lactobacillus delbrueckii subsp. indicus. Collapse Key Words TEM draft genome sequencing the strain TY-11 Collapse MESH Headings Collapse Grants Collapse
2	SARS-CoV-2 HaploGraph: visualization of SARS-CoV-2 haplotype spread in Japan. Genes Genet Syst 2023;98:221-237. [PMID: 37839865 DOI: 10.1266/ggs.23-00085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2023] Open Abstract Since the early phase of the coronavirus disease 2019 (COVID-19) pandemic, a number of research institutes have been sequencing and sharing high-quality severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes to trace the route of infection in Japan. To provide insight into the spread of COVID-19, we developed a web platform named SARS-CoV-2 HaploGraph to visualize the emergence timing and geographical transmission of SARS-CoV-2 haplotypes. Using data from the GISAID EpiCoV database as of June 4, 2022, we created a haplotype naming system by determining the ancestral haplotype for each epidemic wave and showed prefecture- or region-specific haplotypes in each of four waves in Japan. The SARS-CoV-2 HaploGraph allows for interactive tracking of virus evolution and of geographical prevalence of haplotypes, and aids in developing effective public health control strategies during the global pandemic. The code and the data used for this study are publicly available at: https://github.com/ktym/covid19/. Collapse Key Words COVID-19 SARS-CoV-2 genomic surveillance haplotype web visualization Collapse MESH Headings Humans SARS-CoV-2/genetics COVID-19/epidemiology COVID-19/genetics Haplotypes Japan/epidemiology Pandemics Genome, Viral Collapse Grants Collapse
3	Nanopore Sequencing Data Analysis of 16S rRNA Genes Using the GenomeSync-GSTK System. Methods Mol Biol 2023;2632:215-226. [PMID: 36781731 DOI: 10.1007/978-1-0716-2996-3_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/27/2023] Abstract With the development of nanopore sequencing technology, long reads of DNA sequences can now be determined rapidly from various samples. This protocol introduces the GenomeSync-GSTK system for bacterial species identification in a given sample using nanopore sequencing data of 16S rRNA genes as an example. GenomeSync is a collection of genome sequences designed to provide easy access to genomic data of the species as demanded. GSTK (genome search toolkit) is a set of scripts for managing local homology searches using genomes obtained from the GenomeSync database. Based on this protocol, nanopore sequencing data analyses of metagenomes and amplicons could be efficiently performed. We also noted reanalysis in conjunction with future developments in nanopore sequencing technology and the accumulation of genome sequencing data. Collapse Key Words GenomeSync Meta 16S rRNA analysis NAF Nanopore sequencing minimap2 Collapse MESH Headings Sequence Analysis, DNA/methods RNA, Ribosomal, 16S/genetics Genes, rRNA Nanopore Sequencing Genomics High-Throughput Nucleotide Sequencing/methods Nanopores Collapse Grants Collapse
4	A circulating subset of iNKT cells mediates antitumor and antiviral immunity. Sci Immunol 2022;7:eabj8760. [DOI: 10.1126/sciimmunol.abj8760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Abstract Invariant natural killer T (iNKT) cells are a group of innate-like T lymphocytes that recognize lipid antigens. They are supposed to be tissue resident and important for systemic and local immune regulation. To investigate the heterogeneity of iNKT cells, we recharacterized iNKT cells in the thymus and peripheral tissues. iNKT cells in the thymus were divided into three subpopulations by the expression of the natural killer cell receptor CD244 and the chemokine receptor CXCR6 and designated as C0 (CD244 − CXCR6 − ), C1 (CD244 − CXCR6 + ), or C2 (CD244 + CXCR6 + ) iNKT cells. The development and maturation of C2 iNKT cells from C0 iNKT cells strictly depended on IL-15 produced by thymic epithelial cells. C2 iNKT cells expressed high levels of IFN-γ and granzymes and exhibited more NK cell–like features, whereas C1 iNKT cells showed more T cell–like characteristics. C2 iNKT cells were influenced by the microbiome and aging and suppressed the expression of the autoimmune regulator AIRE in the thymus. In peripheral tissues, C2 iNKT cells were circulating that were distinct from conventional tissue-resident C1 iNKT cells. Functionally, C2 iNKT cells protected mice from the tumor metastasis of melanoma cells by enhancing antitumor immunity and promoted antiviral immune responses against influenza virus infection. Furthermore, we identified human CD244 + CXCR6 + iNKT cells with high cytotoxic properties as a counterpart of mouse C2 iNKT cells. Thus, this study reveals a circulating subset of iNKT cells with NK cell–like properties distinct from conventional tissue-resident iNKT cells. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
5	Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format. PATTERNS 2022;3:100562. [PMID: 35818472 PMCID: PMC9259476 DOI: 10.1016/j.patter.2022.100562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
6	Helicobacter pylori genomes reveal Paleolithic human migration to the east end of Asia. iScience 2022;25:104477. [PMID: 35720267 PMCID: PMC9204748 DOI: 10.1016/j.isci.2022.104477] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 09/29/2021] [Accepted: 04/28/2022] [Indexed: 11/25/2022] Open Abstract A virulence bacterium, Helicobacter pylori, evolved parallel to its host human, therefore, can work as a marker for tracing the human migration. We found H. pylori strains indigenous in the southernmost islands of Japanese Archipelago, Okinawa, and defined them as hspOkinawa and hpRyukyu. Genome data of the strains revealed that hspOkinawa diverged from other East Asian strains about 20,000 years ago, and that hpRyukyu diverged about 45,000 years ago. The closest strains of hpRyukyu were found from Afghanistan, Punjab, and Nepal, which suggest this strain originated in the central Asia and traveled across the Eurasian continent during Paleolithic era. The divergence date of hpRyukyu corresponds with human fossil records in Okinawa. Although it is controversial from human DNA analyses whether descendants of the Paleolithic migrants remain in the modern Japanese population, this study reveals that the bacterium of Paleolithic origin remains in the stomachs of current Japanese. Collapse Key Words Bacteriology Evolutionary history Gastroenterology Medical Microbiology Microbial genetics Phylogeny Collapse MESH Headings Collapse Grants R01 DK062813 NIDDK NIH HHS Collapse
7	MinION, a portable long-read sequencer, enables rapid vaginal microbiota analysis in a clinical setting. BMC Med Genomics 2022;15:68. [PMID: 35337329 PMCID: PMC8953062 DOI: 10.1186/s12920-022-01218-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 02/14/2022] [Indexed: 01/13/2023] Open Abstract Background It has been suggested that the local microbiota in the reproductive organs is relevant to women's health and may also affect pregnancy outcomes. Analysis of partial 16S ribosomal RNA (rRNA) gene sequences generated by short-read sequencers has been used to identify vaginal and endometrial microbiota, but it requires a long time to obtain the results, making it unsuitable for rapid bacterial identification from a small specimen amount in a clinical context. Methods We developed a simple workflow using the nanopore sequencer MinION that allows high-resolution and rapid differentiation of vaginal microbiota. Vaginal samples collected from 18 participants were subjected to DNA extraction and full-length 16S rRNA gene sequencing with MinION. Results The principal coordinate analysis showed no differences in the bacterial compositions regardless of the sample collection method. The analysis of vaginal microbiota could be completed with a total analysis time of approximately four hours, allowing same-day results. Taxonomic profiling by MinION sequencing revealed relatively low diversity of the vaginal bacterial community, identifying the prevailing Lactobacillus species and several causative agents of bacterial vaginosis. Conclusions Full-length 16S rRNA gene sequencing analysis with MinION provides a rapid means for identifying vaginal bacteria with higher resolution. Species-level profiling of human vaginal microbiota by MinION sequencing can allow the analysis of associations with conditions such as genital infections, endometritis, and threatened miscarriage. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01218-8. Collapse Key Words 16S rRNA Bacterial vaginosis Long-read sequencer MinION Nanopore Collapse MESH Headings Collapse Grants Collapse
8	Sequence Compression Benchmark (SCB) database-A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences. Gigascience 2021;9:5867695. [PMID: 32627830 PMCID: PMC7336184 DOI: 10.1093/gigascience/giaa072] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 06/01/2020] [Accepted: 06/15/2020] [Indexed: 01/22/2023] Open Abstract Background Nearly all molecular sequence databases currently use gzip for data compression. Ongoing rapid accumulation of stored data calls for a more efficient compression tool. Although numerous compressors exist, both specialized and general-purpose, choosing one of them was difficult because no comprehensive analysis of their comparative advantages for sequence compression was available. Findings We systematically benchmarked 430 settings of 48 compressors (including 29 specialized sequence compressors and 19 general-purpose compressors) on representative FASTA-formatted datasets of DNA, RNA, and protein sequences. Each compressor was evaluated on 17 performance measures, including compression strength, as well as time and memory required for compression and decompression. We used 27 test datasets including individual genomes of various sizes, DNA and RNA datasets, and standard protein datasets. We summarized the results as the Sequence Compression Benchmark database (SCB database, http://kirr.dyndns.org/sequence-compression-benchmark/), which allows custom visualizations to be built for selected subsets of benchmark results. Conclusion We found that modern compressors offer a large improvement in compactness and speed compared to gzip. Our benchmark allows compressors and their settings to be compared using a variety of performance measures, offering the opportunity to select the optimal compressor on the basis of the data type and usage scenario specific to a particular application. Collapse Key Words DNA RNA benchmark compression database genome protein sequence Collapse MESH Headings Algorithms Computational Biology/methods Data Compression/methods Databases, Nucleic Acid Genomics/methods Humans Models, Theoretical Mutation Neoplasms/genetics Sequence Analysis, DNA/methods Software Collapse Grants Japan Society for the Promotion of Science Scientific Research on Innovative Areas Takeda Science Foundation Collapse
9	Diagnosis of pleural empyema/parapneumonic effusion by next-generation sequencing. Infect Dis (Lond) 2021;53:450-459. [PMID: 33689538 DOI: 10.1080/23744235.2021.1892178] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open Abstract BACKGROUND Although a microbiological diagnosis of pleural infection is clinically important, it is often complicated by prior antibiotic treatment and/or difficulties with culturing some bacterial species. Therefore, we aimed to identify probable causative bacteria in pleural empyema/parapneumonic effusions by combining 16S ribosomal RNA (rRNA) gene amplification and next-generation sequencing (NGS). METHODS Pleural fluids were collected from 19 patients with infectious effusions and nine patients with non-infectious malignant effusions. We analysed DNA extracted from the pleural fluid supernatant by NGS using the Genome Search Toolkit and GenomeSync database, either directly or after PCR amplification of the 16S rRNA gene. Infectious and non-infectious effusions were distinguished by semi-quantitative PCR of the 16S rRNA gene. RESULTS Only 8 (42%) effusions were culture-positive, however, NGS of the 16S rRNA gene amplicon identified 14 anaerobes and 7 aerobes/facultative anaerobes in all patients, including Streptococcus sp. (n = 6), Fusobacterium sp. (n = 5), Porphyromonas sp. (n = 5), and Prevotella sp. (n = 4), accounting for >10% of the total genomes. The culture and NGS results were discordant for 3 out of 8 patients, all of whom had previously been treated with antibiotics. Total (2^ΔCT value in semi-quantitative PCR of the 16S rRNA gene) and specific (total bacterial load multiplied by the proportion of primary bacteria in NGS) bacterial loads could efficiently distinguish empyema/parapneumonic effusion from non-infectious effusion. CONCLUSION Combining NGS with semi-quantitative PCR can facilitate the diagnosis of pleural empyema/parapneumonic effusion and its causal bacteria. Collapse Key Words Pleural empyema malignant pleural effusion next-generation sequencing parapneumonic effusion Collapse MESH Headings Collapse Grants Collapse
10	Rapid profiling of drug-resistant bacteria using DNA-binding dyes and a nanopore-based DNA sequencer. Sci Rep 2021;11:3436. [PMID: 33564026 PMCID: PMC7873225 DOI: 10.1038/s41598-021-82903-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 01/27/2021] [Indexed: 11/30/2022] Open Abstract Spread of drug-resistant bacteria is a serious problem worldwide. We thus designed a new sequence-based protocol that can quickly identify bacterial compositions of clinical samples and their drug-resistance profiles simultaneously. Here we utilized propidium monoazide (PMA) that prohibits DNA amplifications from dead bacteria, and subjected the original and antibiotics-treated samples to 16S rRNA metagenome sequencing. We tested our protocol on bacterial mixtures, and observed that sequencing reads derived from drug-resistant bacteria were significantly increased compared with those from drug-sensitive bacteria when samples were treated by antibiotics. Our protocol is scalable and will be useful for quickly profiling drug-resistant bacteria. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
11	Full-length 16S rRNA gene amplicon analysis of human gut microbiota using MinION™ nanopore sequencing confers species-level resolution. BMC Microbiol 2021;21:35. [PMID: 33499799 PMCID: PMC7836573 DOI: 10.1186/s12866-021-02094-5] [Citation(s) in RCA: 104] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 01/18/2021] [Indexed: 12/13/2022] Open Abstract Background Species-level genetic characterization of complex bacterial communities has important clinical applications in both diagnosis and treatment. Amplicon sequencing of the 16S ribosomal RNA (rRNA) gene has proven to be a powerful strategy for the taxonomic classification of bacteria. This study aims to improve the method for full-length 16S rRNA gene analysis using the nanopore long-read sequencer MinION™. We compared it to the conventional short-read sequencing method in both a mock bacterial community and human fecal samples. Results We modified our existing protocol for full-length 16S rRNA gene amplicon sequencing by MinION™. A new strategy for library construction with an optimized primer set overcame PCR-associated bias and enabled taxonomic classification across a broad range of bacterial species. We compared the performance of full-length and short-read 16S rRNA gene amplicon sequencing for the characterization of human gut microbiota with a complex bacterial composition. The relative abundance of dominant bacterial genera was highly similar between full-length and short-read sequencing. At the species level, MinION™ long-read sequencing had better resolution for discriminating between members of particular taxa such as Bifidobacterium, allowing an accurate representation of the sample bacterial composition. Conclusions Our present microbiome study, comparing the discriminatory power of full-length and short-read sequencing, clearly illustrated the analytical advantage of sequencing the full-length 16S rRNA gene. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-021-02094-5. Collapse Key Words 16S rRNA Gut microbiota MinION™ Nanopore sequencing Collapse MESH Headings Collapse Grants Collapse
12	Diverse mosquito-specific flaviviruses in the Bolivian Amazon basin. J Gen Virol 2021;102. [PMID: 33416463 DOI: 10.1099/jgv.0.001518] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open Abstract The genus Flavivirus includes a range of mosquito-specific viruses in addition to well-known medically important arboviruses. Isolation and comprehensive genomic analyses of viruses in mosquitoes collected in Bolivia resulted in the identification of three novel flavivirus species. Psorophora flavivirus (PSFV) was isolated from Psorophora albigenu. The coding sequence of the PSFV polyprotein shares 60 % identity with that of the Aedes-associated lineage II insect-specific flavivirus (ISF), Marisma virus. Isolated PSFV replicates in both Aedes albopictus- and Aedes aegypti-derived cells, but not in mammalian Vero or BHK-21 cell lines. Two other flaviviruses, Ochlerotatus scapularis flavivirus (OSFV) and Mansonia flavivirus (MAFV), which were identified from Ochlerotatus scapularis and Mansonia titillans, respectively, group with the classical lineage I ISFs. The protein coding sequences of these viruses share only 60 and 40 % identity with the most closely related of known lineage I ISFs, including Xishuangbanna aedes flavivirus and Sabethes flavivirus, respectively. Phylogenetic analysis suggests that MAFV is clearly distinct from the groups of the current known Culicinae-associated lineage I ISFs. Interestingly, the predicted amino acid sequence of the MAFV capsid protein is approximately two times longer than that of any of the other known flaviviruses. Our results indicate that flaviviruses with distinct features can be found at the edge of the Bolivian Amazon basin at sites that are also home to dense populations of human-biting mosquitoes. Collapse Key Words Amazon Bolivia Flavivirus insect-specific flavivirus mosquito Collapse MESH Headings Collapse Grants Collapse
13	Comprehensive genomic analysis reveals dynamic evolution of endogenous retroviruses that code for retroviral-like protein domains. Mob DNA 2020;11:29. [PMID: 32963593 PMCID: PMC7499964 DOI: 10.1186/s13100-020-00224-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 09/09/2020] [Indexed: 12/13/2022] Open Abstract Background Endogenous retroviruses (ERVs) are remnants of ancient retroviral infections of mammalian germline cells. A large proportion of ERVs lose their open reading frames (ORFs), while others retain them and become exapted by the host species. However, it remains unclear what proportion of ERVs possess ORFs (ERV-ORFs), become transcribed, and serve as candidates for co-opted genes. Results We investigated characteristics of 176,401 ERV-ORFs containing retroviral-like protein domains (gag, pro, pol, and env) in 19 mammalian genomes. The fractions of ERVs possessing ORFs were overall small (~ 0.15%) although they varied depending on domain types as well as species. The observed divergence of ERV-ORF from their consensus sequences showed bimodal distributions, suggesting that a large proportion of ERV-ORFs either recently, or anciently, inserted themselves into mammalian genomes. Alternatively, very few ERVs lacking ORFs were found to exhibit similar divergence patterns. To identify candidates for ERV-derived genes, we estimated the ratio of non-synonymous to synonymous substitution rates (dN/dS) for ERV-ORFs in human and non-human mammalian pairs, and found that approximately 42% of the ERV-ORFs showed dN/dS < 1. Further, using functional genomics data including transcriptome sequencing, we determined that approximately 9.7% of these selected ERV-ORFs exhibited transcriptional potential. Conclusions These results suggest that purifying selection operates on a certain portion of ERV-ORFs, some of which may correspond to uncharacterized functional genes hidden within mammalian genomes. Together, our analyses suggest that more ERV-ORFs may be co-opted in a host-species specific manner than we currently know, which are likely to have contributed to mammalian evolution and diversification. Collapse Key Words Co-option Divergence pattern Endogenous retrovirus Evolution Open reading frame Retroviral-like protein domain de novo gene Collapse MESH Headings Collapse Grants Collapse
14	Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences. Bioinformatics 2020;35:3826-3828. [PMID: 30799504 PMCID: PMC6761962 DOI: 10.1093/bioinformatics/btz144] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 02/13/2019] [Accepted: 02/22/2019] [Indexed: 11/13/2022] Open Abstract Summary DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)—a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archival Format compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz, brotli and zstd. Availability and implementation NAF compressor and decompressor, as well as format specification are available at https://github.com/KirillKryukov/naf. Format specification is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any use. Supplementary information Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
15	Major pathologic response of EGFR mutated non-small cell lung cancer (NSCLC) on 1-3 G TKI. J Clin Oncol 2020. [DOI: 10.1200/jco.2020.38.15_suppl.e21528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open Abstract e21528 Background: Despite impressive clinical efficacy in response rate, increase in PFS and sometimes in OS EGFR TKI rarely cure cancer patients, as all of them inevitably develop resistance. Small fractions of residual cells appear to be a reservoir of further clones with acquired resistance. For this moment morphological characteristics of “persister” population are not well defined. Methods: We screened hospital data-base containing > 200 pts for EGFRmut NSCLC who underwent cytoreductive surgical treatment during treatment with either TKI and before radiographic disease progression. We obtained 18 pairs of pretreatment biopsy and surgical specimen. 7/18 were male, median age 60.5 yo (45-79). 15/18 had ex19del; 3/18 – L858R. 12/3/2/1 received gefitinib/erlotinib/afatinib/osimertinib before surgery. Median time from treatment initiation to surgery was 8.9 mon (0.7 – 24.3). None of the pts had evidence of increase in any dimension of tumor lesions before surgery. According to RECIST 6/18 pts had SD, 12/18 – PR. 6/18 pts had complete cytoreduction, 4/18 – partial, 8/17 – metastasectomy. All pathologic responses (MPR) were graded according to Hellmann MD et al., 2014. Results: Pathologic response with > 50% of residual tumor – 2/18; 10-50% – 7/18; < 10% – 4/18; pCR – 3/18. Median observation time is 15.4 mon (8.4 – 37.2+). Disease progression after surgery was registered in 6/18 pts. No significant correlations between PFS, MPR, RR, TKI generation, time to surgery were seen. Conclusions: In our serie MPR rate ( < 10% of viable tumor cells according to Hellmann MD et al, 2014) for NSCLC treated with various TKI was 38%, that is higher than in other reports (10.2% in CTONG 1103 for example). Further characterization of the “persister” tumor cells may help to determine mechanisms of intrinsic resistance and direct more efficient antitumor activity. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
16	Usefulness of next-generation DNA sequencing for the diagnosis of urinary tract infection. Drug Discov Ther 2020;14:42-49. [PMID: 32101813 DOI: 10.5582/ddt.2020.01000] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Abstract Acute urinary tract infection (UTI) is a highly common clinical condition. Although bacterial culture is the gold standard diagnostic test, false negative results may be possible, leading to the pathogen being unidentified. In recent years, bacterial DNA sequencing analysis has garnered much attention, but clinical studies are rare in Japan. In this study, we assessed the usefulness of next-generation DNA sequencing (NGS) analysis for acute UTI patients. We thus performed an observational, retrospective case series study. Urine and blood samples were collected from ten acute UTI patients, of whom four had also been diagnosed with urosepsis. Seven variable regions of bacterial 16S rRNA genes were amplified by PCR and then sequenced by IonPGM. The identified bacterial species were compared with those identified using the culture tests and the clinical parameters were analyzed. As a result, the NGS method effectively identified predominant culture-positive bacteria in urine samples. The urine NGS also detected several culture-negative species, which have been reported to be potentially pathogenic. Out of four urosepsis cases, three were pathogen-positive in blood NGS results, while two were pathogen-negative in blood culture. In one sepsis case, although blood culture was negative for Escherichia coli, this species was detected by blood NGS. For non-sepsis cases, however, blood NGS, as well as blood culture, was less effective in detecting bacterial signals. In conclusion, NGS is potentially useful for identifying pathogenic bacteria in urine from acute UTI patients but is less applicable in patients who do not meet clinical criteria for sepsis. Collapse Key Words 16S rRNA amplicon sequencing analysis Next-generation DNA sequencing urinary tract infection Collapse MESH Headings Collapse Grants Collapse
17	Rapid sequencing-based diagnosis of infectious bacterial species from meningitis patients in Zambia. Clin Transl Immunology 2019;8:e01087. [PMID: 31709051 PMCID: PMC6831930 DOI: 10.1002/cti2.1087] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 10/05/2019] [Accepted: 10/10/2019] [Indexed: 12/13/2022] Open Abstract OBJECTIVES We have developed a portable system for the rapid determination of bacterial composition for the diagnosis of infectious diseases. Our system comprises of a nanopore technology-based sequencer, MinION, and two laptop computers. To examine the accuracy and time efficiency of our system, we provided a proof-of-concept for the detection of the causative bacteria of 11 meningitis patients in Zambia. METHODS We extracted DNA from cerebrospinal fluid samples of each patient and amplified the 16S rRNA gene regions. The sequencing library was prepared, and the sequenced reads were simultaneously processed for bacterial composition determination using the minimap2 software and the representative prokaryote genomes. RESULTS The sequencing results of four of the six culture-positive samples were consistent with those of conventional culture-based methods. The dominant bacterial species in each of these samples were identified from the sequencing data within only 3 min. Although the major bacterial species were also detected from the other two culture-positive samples and five culture-negative samples, their presence could not be confirmed. Moreover, as a whole, although the number of sequencing reads obtained within a short sequencing run was small, there was no change in the major bacterial species over time with prolonged sequencing. In addition, the processing time strongly correlated with the number of sequencing reads used for the analysis. CONCLUSION Our results suggest that time-effective analysis could be achieved by determining the number of sequencing reads required for the rapid diagnosis of infectious bacterial species depending on the complexity of bacterial species in a sample. Collapse Key Words meningitis meta 16S rRNA sequencing microbiome nanopore sequencing rapid diagnosis Collapse MESH Headings Collapse Grants Japan Agency for Medical Research and Development Takeda Science Foundation Ministry of Education, Culture, Sports, Science and Technology Collapse
18	Identification of a distinct lineage of aviadenovirus from crane feces. Virus Genes 2019;55:815-824. [PMID: 31549291 DOI: 10.1007/s11262-019-01703-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 08/29/2019] [Accepted: 09/03/2019] [Indexed: 12/14/2022] Abstract Viruses are believed to be ubiquitous; however, the diversity of viruses is largely unknown because of the bias of previous research toward pathogenic viruses. Deep sequencing is a promising and unbiased approach to detect viruses from animal-derived materials. Although cranes are known to be infected by several viruses such as influenza A viruses, previous studies targeted limited species of viruses, and thus viruses that infect cranes have not been extensively studied. In this study, we collected crane fecal samples in the Izumi plain in Japan, which is an overwintering site for cranes, and performed metagenomic shotgun sequencing analyses. We detected aviadenovirus-like sequences in the fecal samples and tentatively named the discovered virus crane-associated adenovirus 1 (CrAdV-1). We determined that our sequence accounted for approximately three-fourths of the estimated CrAdV-1 genome size (33,245 bp). The GC content of CrAdV-1 genome is 34.1%, which is considerably lower than that of other aviadenoviruses. Phylogenetic analyses revealed that CrAdV-1 clusters with members of the genus Aviadenovirus, but is distantly related to the previously identified aviadenoviruses. The protein sequence divergence between the DNA polymerase of CrAdV-1 and those of other aviadenoviruses is 45.2-46.8%. Based on these results and the species demarcation for the family Adenoviridae, we propose that CrAdV-1 be classified as a new species in the genus Aviadenovirus. Results of this study contribute to a deeper understanding of the diversity and evolution of viruses and provide additional information on viruses that infect cranes, which might lead to protection of the endangered species of cranes. Collapse Key Words Adenovirus Aviadenovirus Crane Feces Metagenomics Collapse MESH Headings Adenoviridae Infections/genetics Adenoviridae Infections/virology Animals Aviadenovirus/genetics Aviadenovirus/isolation & purification Bird Diseases/genetics Bird Diseases/virology Birds/genetics Birds/virology Feces/virology High-Throughput Nucleotide Sequencing Influenza A virus/genetics Influenza A virus/pathogenicity Japan Phylogeny Collapse Grants Collapse
19	Real-time diagnostic analysis of MinION™-based metagenomic sequencing in clinical microbiology evaluation: a case report. JA Clin Rep 2019;5:24. [PMID: 32025980 PMCID: PMC6967274 DOI: 10.1186/s40981-019-0244-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 03/08/2019] [Indexed: 11/10/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
20	Rapid bacterial identification by direct PCR amplification of 16S rRNA genes using the MinION™ nanopore sequencer. FEBS Open Bio 2019;9:548-557. [PMID: 30868063 PMCID: PMC6396348 DOI: 10.1002/2211-5463.12590] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Revised: 11/27/2018] [Accepted: 12/27/2018] [Indexed: 12/15/2022] Open Abstract Rapid identification of bacterial pathogens is crucial for appropriate and adequate antibiotic treatment, which significantly improves patient outcomes. 16S ribosomal RNA (rRNA) gene amplicon sequencing has proven to be a powerful strategy for diagnosing bacterial infections. We have recently established a sequencing method and bioinformatics pipeline for 16S rRNA gene analysis utilizing the Oxford Nanopore Technologies MinION™ sequencer. In combination with our taxonomy annotation analysis pipeline, the system enabled the molecular detection of bacterial DNA in a reasonable time frame for diagnostic purposes. However, purification of bacterial DNA from specimens remains a rate‐limiting step in the workflow. To further accelerate the process of sample preparation, we adopted a direct PCR strategy that amplifies 16S rRNA genes from bacterial cell suspensions without DNA purification. Our results indicate that differences in cell wall morphology significantly affect direct PCR efficiency and sequencing data. Notably, mechanical cell disruption preceding direct PCR was indispensable for obtaining an accurate representation of the specimen bacterial composition. Furthermore, 16S rRNA gene analysis of mock polymicrobial samples indicated that primer sequence optimization is required to avoid preferential detection of particular taxa and to cover a broad range of bacterial species. This study establishes a relatively simple workflow for rapid bacterial identification via MinION™ sequencing, which reduces the turnaround time from sample to result, and provides a reliable method that may be applicable to clinical settings. Collapse Key Words 16S rRNA MinION bacterial identification direct PCR nanopore sequencer Collapse MESH Headings Collapse Grants Collapse
21	Detection of pathogenic bacteria in the blood from sepsis patients using 16S rRNA gene amplicon sequencing analysis. PLoS One 2018;13:e0202049. [PMID: 30110400 PMCID: PMC6093674 DOI: 10.1371/journal.pone.0202049] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 07/26/2018] [Indexed: 02/01/2023] Open Abstract Prompt identification of causative pathogenic bacteria is imperative for the treatment of patients suffering from infectious diseases, including sepsis and pneumonia. However, current culture-based methodologies have several drawbacks including their limitation of use to culturable bacterial species. To circumvent these problems, we attempted to detect bacterial DNA in blood using next-generation DNA sequencing (NGS) technology. We conducted metagenomic and 16S ribosomal RNA (rRNA) gene amplicon sequencing of DNA extracted from bacteria-spiked blood using an Ion Personal Genome Machine. NGS data was analyzed using our in-house pipeline Genome Search Toolkit and database GenomeSync. The metagenomic sequencing analysis successfully detected three gram-positive and three gram-negative bacteria spiked in the blood, which was associated with a significant portion of non-bacterial reads, even though human blood cells were separated by low-speed centrifugation prior to DNA extraction. Sequencing analysis of seven variable regions of the 16S rRNA gene amplicon also successfully detected all six bacteria spiked in the blood. The methodology using 16S rRNA gene amplicon analysis was verified using DNA from the blood of six patients with sepsis and four healthy volunteers with potential pathogenic bacteria in the blood being identified at the species level. These findings suggest that our system will be a potential platform for practical diagnosis in the future. Collapse Key Words Collapse MESH Headings Bacteremia/diagnosis Bacteremia/microbiology Bacteria/genetics Bacteria/isolation & purification High-Throughput Nucleotide Sequencing Humans Metagenome Metagenomics/methods Nucleic Acid Amplification Techniques RNA, Bacterial RNA, Ribosomal, 16S Real-Time Polymerase Chain Reaction Sequence Analysis, DNA Collapse Grants The Japan Agency for Medical Research and Development (AMED) Collapse
22	The efficacy and further functional advantages of random-base molecular barcodes for absolute and digital quantification of nucleic acid molecules. Sci Rep 2017;7:13576. [PMID: 29051542 PMCID: PMC5648891 DOI: 10.1038/s41598-017-13529-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Accepted: 09/25/2017] [Indexed: 01/18/2023] Open Abstract Accurate quantification of biomolecules in system-wide measurements is in high demand, especially for systems with limited sample amounts such as single cells. Because of this, digital quantification of nucleic acid molecules using molecular barcodes has been developed, making, e.g., transcriptome analysis highly reproducible and quantitative. This counting scheme was shown to work using sequence-restricted barcodes, and non-sequence-restricted (random-base) barcodes that may provide a much higher dynamic range at significantly lower cost have been widely used. However, the efficacy of random-base barcodes is significantly affected by base changes due to amplification and/or sequencing errors and has not been investigated experimentally or quantitatively. Here, we show experimentally that random-base barcodes enable absolute and digital quantification of DNA molecules with high dynamic range (from one to more than 10⁴, potentially up to 10¹⁵ molecules) conditional on our barcode design and variety, a certain range of sequencing depths, and computational analyses. Moreover, we quantitatively show further functional advantages of the molecular barcodes: the molecular barcodes enable one to find contaminants and misidentifications of target sequences. Our scheme here may be generally used to confirm that the digital quantification works in each platform. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
23	Cbfβ2 deficiency preserves Langerhans cell precursors by lack of selective TGFβ receptor signaling. J Exp Med 2017;214:2933-2946. [PMID: 28814567 PMCID: PMC5626404 DOI: 10.1084/jem.20170729] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 06/18/2017] [Accepted: 07/14/2017] [Indexed: 12/23/2022] Open Abstract Tenno et al. show that loss of Cbfβ2, one of two RNA splice variants of the Cbfb gene, results in the persistence of embryonic Langerhans cell precursors in the adult epidermis by selective loss of BMP7-BMPR1A signaling with intact TGFβR1 signaling. The mouse Langerhans cell (LC) network is established through the differentiation of embryonic LC precursors. BMP7 and TGFβ1 initiate cellular signaling that is essential for inducing LC differentiation and preserving LCs in a quiescent state, respectively. Here we show that loss of Cbfβ2, one of two RNA splice variants of the Cbfb gene, results in long-term persistence of embryonic LC precursors after their developmental arrest at the transition into the EpCAM⁺ stage. This phenotype is caused by selective loss of BMP7-mediated signaling essential for LC differentiation, whereas TGFβR signaling is intact, maintaining cells in a quiescent state. Transgenic Cbfβ2 expression at the neonatal stage, but not at the adult stage, restored differentiation from Cbfβ2-deficient LC precursors. Loss of developmental potential in skin-residential precursor cells was accompanied by diminished BMP7–BMPR1A signaling. Collectively, our results reveal an essential requirement for the Cbfβ2 variant in LC differentiation and provide novel insight into how the establishment and homeostasis of the LC network is regulated. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
24	Human Contamination in Public Genome Assemblies. PLoS One 2016;11:e0162424. [PMID: 27611326 PMCID: PMC5017631 DOI: 10.1371/journal.pone.0162424] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Accepted: 07/31/2016] [Indexed: 01/29/2023] Open Abstract Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases. Collapse Key Words Collapse MESH Headings Animals Computational Biology/methods Computational Biology/standards DNA Contamination Genome Genome, Human Genomics/methods Genomics/standards High-Throughput Nucleotide Sequencing Humans Mammals Phylogeny Sequence Analysis, DNA/standards Collapse Grants Ministry of Health, Labour and Welfare Collapse
25	A partial nuclear genome of the Jomons who lived 3000 years ago in Fukushima, Japan. J Hum Genet 2016;62:213-221. [PMID: 27581845 PMCID: PMC5285490 DOI: 10.1038/jhg.2016.110] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 07/22/2016] [Accepted: 07/26/2016] [Indexed: 12/11/2022] Abstract The Jomon period of the Japanese Archipelago, characterized by cord-marked ‘jomon' potteries, has yielded abundant human skeletal remains. However, the genetic origins of the Jomon people and their relationships with modern populations have not been clarified. We determined a total of 115 million base pair nuclear genome sequences from two Jomon individuals (male and female each) from the Sanganji Shell Mound (dated 3000 years before present) with the Jomon-characteristic mitochondrial DNA haplogroup N9b, and compared these nuclear genome sequences with those of worldwide populations. We found that the Jomon population lineage is best considered to have diverged before diversification of present-day East Eurasian populations, with no evidence of gene flow events between the Jomon and other continental populations. This suggests that the Sanganji Jomon people descended from an early phase of population dispersals in East Asia. We also estimated that the modern mainland Japanese inherited <20% of Jomon peoples' genomes. Our findings, based on the first analysis of Jomon nuclear genome sequence data, firmly demonstrate that the modern mainland Japanese resulted from genetic admixture of the indigenous Jomon people and later migrants. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
26	Lineage-specific conserved noncoding sequences of plant genomes: their possible role in nucleosome positioning. Genome Biol Evol 2014;6:2527-42. [PMID: 25364802 PMCID: PMC4202324 DOI: 10.1093/gbe/evu188] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/26/2014] [Indexed: 01/01/2023] Open Abstract Many studies on conserved noncoding sequences (CNSs) have found that CNSs are enriched significantly in regulatory sequence elements. We conducted whole-genome analysis on plant CNSs to identify lineage-specific CNSs in eudicots, monocots, angiosperms,and vascular plants based on the premise that lineage-specific CNSs define lineage-specific characters and functions in groups of organisms. We identified 27 eudicot, 204 monocot, 6,536 grass, 19 angiosperm, and 2 vascular plant lineage-specific CNSs(lengths range from 16 to 1,517 bp) that presumably originated in their respective common ancestors. A stronger constraint on the CNSs located in the untranslated regions was observed. The CNSs were often flanked by genes involved in transcription regulation. A drop of A+T content near the border of CNSs was observed and CNS regions showed a higher nucleosome occupancy probability. These CNSs are candidate regulatory elements, which are expected to define lineage-specific features of various plant groups. Collapse Key Words conserved noncoding sequence cns eudicots monocots angiosperms plants Collapse MESH Headings Base Sequence Conserved Sequence Genome, Plant Molecular Sequence Data Nucleosomes/genetics Nucleosomes/metabolism Plants/classification Plants/genetics Plants/metabolism RNA, Plant/chemistry RNA, Plant/genetics RNA, Untranslated/chemistry RNA, Untranslated/genetics Sequence Alignment Species Specificity Untranslated Regions Collapse Grants Collapse
27	A new database (GCD) on genome composition for eukaryote and prokaryote genome sequences and their initial analyses. Genome Biol Evol 2012;4:501-12. [PMID: 22417913 PMCID: PMC3342873 DOI: 10.1093/gbe/evs026] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open Abstract Eukaryote genomes contain many noncoding regions, and they are quite complex. To understand these complexities, we constructed a database, Genome Composition Database, for the whole genome composition statistics for 101 eukaryote genome data, as well as more than 1,000 prokaryote genomes. Frequencies of all possible one to ten oligonucleotides were counted for each genome, and these observed values were compared with expected values computed under observed oligonucleotide frequencies of length 1-4. Deviations from expected values were much larger for eukaryotes than prokaryotes, except for fungal genomes. Mammalian genomes showed the largest deviation among animals. The results of comparison are available online at http://esper.lab.nig.ac.jp/genome-composition-database/. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
28	MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data. BMC Bioinformatics 2010;11:142. [PMID: 20298584 PMCID: PMC2848238 DOI: 10.1186/1471-2105-11-142] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2009] [Accepted: 03/18/2010] [Indexed: 11/10/2022] Open Abstract Background Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences. Results We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences. Conclusions MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species Helicobacter pylori (about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
29	Estimation of bacterial species phylogeny through oligonucleotide frequency distances. Genomics 2009;93:525-33. [PMID: 19442633 DOI: 10.1016/j.ygeno.2009.01.009] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2008] [Revised: 01/30/2009] [Accepted: 01/30/2009] [Indexed: 10/21/2022] Abstract Classification of bacteria is mainly based on sequence comparisons of certain homologous genes such as 16S rRNA. Recently there are challenges to classify bacteria using oligonucleotide frequency pattern of nonhomologous sequences. However, the evolutionary significance of oligonucleotides longer than tetra-nucleotide is not studied well. We performed phylogenetic analysis by using the Euclidean distances calculated from the di to deca-nucleotide frequencies in bacterial genomes, and compared these oligonucleotide frequency-based tree topologies with those for 16S rRNA gene and concatenated seven genes. When oligonucleotide frequency-based trees were constructed for bacterial species with similar GC content, their topologies at genus and family level were congruent with those based on homologous genes. Our results suggest that oligonucleotide frequency is useful not only for classification of bacteria, but also for estimation of their phylogenetic relationships for closely related species. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
30	The evolutionary study of small RNA-directed gene silencing pathways by investigating RNase III enzymes. Gene 2009;435:1-8. [PMID: 19393176 DOI: 10.1016/j.gene.2008.12.022] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2007] [Revised: 11/29/2008] [Accepted: 12/23/2008] [Indexed: 11/28/2022] Abstract The RNA-mediated gene silencing pathways are evolutionarily conserved processes. They highlight a fundamental role of short RNAs in eukaryotic gene regulation and antiviral defense. Recently three distinct small RNA-directed silencing pathways are observed, such as the destruction of mRNA via siRNA, inhibition of mRNA translation via miRNA, and epigenetic gene silencing via siRNA. It was also found that in these pathways, the members of ribonuclease III family play important roles in diverse RNA maturation and decay. Here we investigated the evolution of RNase III nucleases, Dicer as representative, to further figure out the evolutionary relationship of these three gene silencing pathways. With the advantage of using genomic sequences as the subject in homolog search, in un-annotated genomic regions, we were able to detect possible candidates for 3 functional domains and genes of dicer and drosha. Moreover, we found that prokaryotes including eubacteria and archaea lack completely the PAZ domain of Dicer. These results show the taxonomic-dependent evolution of the RNA-mediated gene silencing pathways. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
31	The evolutionary relationship between gene duplication and alternative splicing. Gene 2008;427:19-31. [PMID: 18835337 DOI: 10.1016/j.gene.2008.09.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2008] [Revised: 09/03/2008] [Accepted: 09/03/2008] [Indexed: 10/21/2022] Abstract Gene duplication and alternative splicing (AS) are the two major evolutionary mechanisms that can bring the functional variation by increasing gene diversification. The purpose of this research is to understand the evolutionary relationship between these two different mechanisms, utilizing available data resources. We found the proportion of AS loci and the average number of AS isoforms per locus to be larger in duplicated genes compared to those in singleton genes. However we also found that small gene families have larger proportion of AS loci and larger average number of AS isoforms per locus than large gene families. These results suggest that gene duplication allows for more alternative splicing events to occur on newly duplicated copies than on singletons, probably due to the reduced functional constraint on the duplicates. Smaller average number of AS isoforms in the larger gene families can be explained by the decreased possibility for new useful function to be created via a new alternative splicing event. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse