1
|
Banović Đeri B, Nešić S, Vićić I, Samardžić J, Nikolić D. Benchmarking of five NGS mapping tools for the reference alignment of bacterial outer membrane vesicles-associated small RNAs. Front Microbiol 2024; 15:1401985. [PMID: 39101033 PMCID: PMC11294920 DOI: 10.3389/fmicb.2024.1401985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Accepted: 07/01/2024] [Indexed: 08/06/2024] Open
Abstract
Advances in small RNAs (sRNAs)-related studies have posed a challenge for NGS-related bioinformatics, especially regarding the correct mapping of sRNAs. Depending on the algorithms and scoring matrices on which they are based, aligners are influenced by the characteristics of the dataset and the reference genome. These influences have been studied mainly in eukaryotes and to some extent in prokaryotes. However, in bacteria, the selection of aligners depending on sRNA-seq data associated with outer membrane vesicles (OMVs) and the features of the corresponding bacterial reference genome has not yet been investigated. We selected five aligners: BBmap, Bowtie2, BWA, Minimap2 and Segemehl, known for their generally good performance, to test them in mapping OMV-associated sRNAs from Aliivibrio fischeri to the bacterial reference genome. Significant differences in the performance of the five aligners were observed, resulting in differential recognition of OMV-associated sRNA biotypes in A. fischeri. Our results suggest that aligner(s) should not be arbitrarily selected for this task, which is often done, as this can be detrimental to the biological interpretation of NGS analysis results. Since each aligner has specific advantages and disadvantages, these need to be considered depending on the characteristics of the input OMV sRNAs dataset and the corresponding bacterial reference genome to improve the detection of existing, biologically important OMV sRNAs. Until we learn more about these dependencies, we recommend using at least two, preferably three, aligners that have good metrics for the given dataset/bacterial reference genome. The overlapping results should be considered trustworthy, yet their differences should not be dismissed lightly, but treated carefully in order not to overlook any biologically important OMV sRNA. This can be achieved by applying the intersect-then-combine approach. For the mapping of OMV-associated sRNAs of A. fischeri to the reference genome organized into two circular chromosomes and one circular plasmid, containing copies of sequences with rRNA- and tRNA-related features and no copies of sequences with protein-encoding features, if the aligners are used with their default parameters, we advise avoiding Segemehl, and recommend using the intersect-then-combine approach with BBmap, BWA and Minimap2 to improve the potential for discovery of biologically important OMV-associated sRNAs.
Collapse
Affiliation(s)
- Bojana Banović Đeri
- Group for Plant Molecular Biology, Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Belgrade, Serbia
| | - Sofija Nešić
- Group for Plant Molecular Biology, Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Belgrade, Serbia
| | - Ivan Vićić
- Department of Food Hygiene and Technology, Faculty of Veterinary Medicine, University of Belgrade, Belgrade, Serbia
| | - Jelena Samardžić
- Group for Plant Molecular Biology, Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Belgrade, Serbia
| | - Dragana Nikolić
- Group for Plant Molecular Biology, Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Belgrade, Serbia
| |
Collapse
|
2
|
Reinert T, do Rego FO, Silva MCE, Rodrigues AM, Koyama FC, Gonçalves AC, Pauletto MM, de Carvalho Oliveira LJ, de Resende CAA, Landeiro LCG, Barrios CH, Mano MS, Dienstmann R. The somatic mutation profile of estrogen receptor-positive HER2-negative metastatic breast cancer in Brazilian patients. Front Oncol 2024; 14:1372947. [PMID: 38952553 PMCID: PMC11215150 DOI: 10.3389/fonc.2024.1372947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 05/27/2024] [Indexed: 07/03/2024] Open
Abstract
Background Breast cancer is the leading cause of cancer death among women worldwide. Studies about the genomic landscape of metastatic breast cancer (MBC) have predominantly originated from developed nations. There are still limited data on the molecular epidemiology of MBC in low- and middle-income countries. This study aims to evaluate the prevalence of mutations in the PI3K-AKT pathway and other actionable drivers in estrogen receptor (ER)+/HER2- MBC among Brazilian patients treated at a large institution representative of the nation's demographic diversity. Methods We conducted a retrospective observational study using laboratory data (OC Precision Medicine). Our study included tumor samples from patients with ER+/HER2- MBC who underwent routine tumor testing from 2020 to 2023 and originated from several Brazilian centers within the Oncoclinicas network. Two distinct next-generation sequencing (NGS) assays were used: GS Focus (23 genes, covering PIK3CA, AKT1, ESR1, ERBB2, BRCA1, BRCA2, PALB2, TP53, but not PTEN) or GS 180 (180 genes, including PTEN, tumor mutation burden [TMB] and microsatellite instability [MSI]). Results Evaluation of tumor samples from 328 patients was undertaken, mostly (75.6%) with GS Focus. Of these, 69% were primary tumors, while 31% were metastatic lesions. The prevalence of mutations in the PI3K-AKT pathway was 39.3% (95% confidence interval, 33% to 43%), distributed as 37.5% in PIK3CA and 1.8% in AKT1. Stratification by age revealed a higher incidence of mutations in this pathway among patients over 50 (44.5% vs 29.1%, p=0.01). Among the PIK3CA mutations, 78% were canonical (included in the alpelisib companion diagnostic non-NGS test), while the remaining 22% were characterized as non-canonical mutations (identifiable only by NGS test). ESR1 mutations were detected in 6.1%, exhibiting a higher frequency in metastatic samples (15.1% vs 1.3%, p=0.003). Additionally, mutations in BRCA1, BRCA2, or PALB2 were identified in 3.9% of cases, while mutations in ERBB2 were found in 2.1%. No PTEN mutations were detected, nor were TMB high or MSI cases. Conclusion We describe the genomic landscape of Brazilian patients with ER+/HER2- MBC, in which the somatic mutation profile is comparable to what is described in the literature globally. These data are important for developing precision medicine strategies in this scenario, as well as for health systems management and research initiatives.
Collapse
Affiliation(s)
- Tomás Reinert
- Oncoclínicas & Co, São Paulo, Brazil
- Grupo Brasileiro de Estudos em Câncer de Mama (GBECAM), Porto Alegre, Brazil
| | | | | | | | | | | | | | | | | | | | | | | | - Rodrigo Dienstmann
- Oncoclínicas & Co, São Paulo, Brazil
- University of Vic – Central University of Catalonia, Vic, Spain
| |
Collapse
|
3
|
Siemers M, Lippegaus A, Papenfort K. ChimericFragments: computation, analysis and visualization of global RNA networks. NAR Genom Bioinform 2024; 6:lqae035. [PMID: 38633425 PMCID: PMC11023125 DOI: 10.1093/nargab/lqae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 03/08/2024] [Accepted: 03/28/2024] [Indexed: 04/19/2024] Open
Abstract
RNA-RNA interactions are a key feature of post-transcriptional gene regulation in all domains of life. While ever more experimental protocols are being developed to study RNA duplex formation on a genome-wide scale, computational methods for the analysis and interpretation of the underlying data are lagging behind. Here, we present ChimericFragments, an analysis framework for RNA-seq experiments that produce chimeric RNA molecules. ChimericFragments implements a novel statistical method based on the complementarity of the base-pairing RNAs around their ligation site and provides an interactive graph-based visualization for data exploration and interpretation. ChimericFragments detects true RNA-RNA interactions with high precision and is compatible with several widely used experimental procedures such as RIL-seq, LIGR-seq or CLASH. We further demonstrate that ChimericFragments enables the systematic detection of novel RNA regulators and RNA-target pairs with crucial roles in microbial physiology and virulence. ChimericFragments is written in Julia and available at: https://github.com/maltesie/ChimericFragments.
Collapse
Affiliation(s)
- Malte Siemers
- Friedrich Schiller University, Institute of Microbiology, 07745 Jena, Germany
- Microverse Cluster, Friedrich Schiller University Jena, 07743 Jena, Germany
| | - Anne Lippegaus
- Friedrich Schiller University, Institute of Microbiology, 07745 Jena, Germany
| | - Kai Papenfort
- Friedrich Schiller University, Institute of Microbiology, 07745 Jena, Germany
- Microverse Cluster, Friedrich Schiller University Jena, 07743 Jena, Germany
| |
Collapse
|
4
|
Mackinnon AC, Chandrashekar DS, Suster DI. Molecular pathology as basis for timely cancer diagnosis and therapy. Virchows Arch 2024; 484:155-168. [PMID: 38012424 DOI: 10.1007/s00428-023-03707-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/16/2023] [Accepted: 11/08/2023] [Indexed: 11/29/2023]
Abstract
Precision and personalized therapeutics have witnessed significant advancements in technology, revolutionizing the capabilities of laboratories to generate vast amounts of genetic data. Coupled with computational resources for analysis and interpretation, and integrated with various other types of data, including genomic data, electronic medical health (EMH) data, and clinical knowledge, these advancements support optimized health decisions. Among these technologies, next-generation sequencing (NGS) stands out as a transformative tool in the field of cancer treatment, playing a crucial role in precision oncology. NGS-based workflows are employed across a range of applications, including gene panels, exome sequencing, and whole-genome sequencing, supporting comprehensive analysis of the entire cancer genome, including mutations, copy number variations, gene expression profiles, and epigenetic modifications. By utilizing the power of NGS, these workflows contribute to enhancing our understanding of disease mechanisms, diagnosis confirmation, identifying therapeutic targets, and guiding personalized treatment decisions. This manuscript explores the diverse applications of NGS in cancer treatment, highlighting its significance in guiding diagnosis and treatment decisions, identifying therapeutic targets, monitoring disease progression, and improving patient outcomes.
Collapse
Affiliation(s)
- A Craig Mackinnon
- Department of Pathology, University of Alabama at Birmingham, 619 19Th Street South, Birmingham, AL, 35249, USA.
| | | | - David I Suster
- Department of Pathology, Rutgers University New Jersey Medical School, 150 Bergen Street, Newark, NJ, 07103, USA.
| |
Collapse
|
5
|
Wang XY, Xu YM, Lau ATY. Proteogenomics in Cancer: Then and Now. J Proteome Res 2023; 22:3103-3122. [PMID: 37725793 DOI: 10.1021/acs.jproteome.3c00196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2023]
Abstract
For years, the paths of sequencing technologies and mass spectrometry have occurred in isolation, with each developing its own unique culture and expertise. These two technologies are crucial for inspecting complementary aspects of the molecular phenotype across the central dogma. Integrative multiomics strives to bridge the analysis gap among different fields to complete more comprehensive mechanisms of life events and diseases. Proteogenomics is one integrated multiomics field. Here in this review, we mainly summarize and discuss three aspects: workflow of proteogenomics, proteogenomics applications in cancer research, and the SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis of proteogenomics in cancer research. In conclusion, proteogenomics has a promising future as it clarifies the functional consequences of many unannotated genomic abnormalities or noncanonical variants and identifies driver genes and novel therapeutic targets across cancers, which would substantially accelerate the development of precision oncology.
Collapse
Affiliation(s)
- Xiu-Yun Wang
- Laboratory of Cancer Biology and Epigenetics, Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, People's Republic of China
| | - Yan-Ming Xu
- Laboratory of Cancer Biology and Epigenetics, Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, People's Republic of China
| | - Andy T Y Lau
- Laboratory of Cancer Biology and Epigenetics, Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, People's Republic of China
| |
Collapse
|
6
|
Magdy Mohamed Abdelaziz Barakat S, Sallehuddin R, Yuhaniz SS, R. Khairuddin RF, Mahmood Y. Genome assembly composition of the String "ACGT" array: a review of data structure accuracy and performance challenges. PeerJ Comput Sci 2023; 9:e1180. [PMID: 37547391 PMCID: PMC10403225 DOI: 10.7717/peerj-cs.1180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 04/27/2023] [Indexed: 08/08/2023]
Abstract
Background The development of sequencing technology increases the number of genomes being sequenced. However, obtaining a quality genome sequence remains a challenge in genome assembly by assembling a massive number of short strings (reads) with the presence of repetitive sequences (repeats). Computer algorithms for genome assembly construct the entire genome from reads in two approaches. The de novo approach concatenates the reads based on the exact match between their suffix-prefix (overlapping). Reference-guided approach orders the reads based on their offsets in a well-known reference genome (reads alignment). The presence of repeats extends the technical ambiguity, making the algorithm unable to distinguish the reads resulting in misassembly and affecting the assembly approach accuracy. On the other hand, the massive number of reads causes a big assembly performance challenge. Method The repeat identification method was introduced for misassembly by prior identification of repetitive sequences, creating a repeat knowledge base to reduce ambiguity during the assembly process, thus enhancing the accuracy of the assembled genome. Also, hybridization between assembly approaches resulted in a lower misassembly degree with the aid of the reference genome. The assembly performance is optimized through data structure indexing and parallelization. This article's primary aim and contribution are to support the researchers through an extensive review to ease other researchers' search for genome assembly studies. The study also, highlighted the most recent developments and limitations in genome assembly accuracy and performance optimization. Results Our findings show the limitations of the repeat identification methods available, which only allow to detect of specific lengths of the repeat, and may not perform well when various types of repeats are present in a genome. We also found that most of the hybrid assembly approaches, either starting with de novo or reference-guided, have some limitations in handling repetitive sequences as it is more computationally costly and time intensive. Although the hybrid approach was found to outperform individual assembly approaches, optimizing its performance remains a challenge. Also, the usage of parallelization in overlapping and reads alignment for genome assembly is yet to be fully implemented in the hybrid assembly approach. Conclusion We suggest combining multiple repeat identification methods to enhance the accuracy of identifying the repeats as an initial step to the hybrid assembly approach and combining genome indexing with parallelization for better optimization of its performance.
Collapse
Affiliation(s)
| | - Roselina Sallehuddin
- Computer Science, School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Skudai, Johor, Malaysia
| | - Siti Sophiayati Yuhaniz
- Advanced Informatics Department, Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Kuala Lumpur, Kuala Lumpur, Malaysia
| | | | - Yasir Mahmood
- Faculty of Information Technology, The University of Lahore, Lahore, Lahore, Pakistan
| |
Collapse
|
7
|
Pezzini FF, Ferrari G, Forrest LL, Hart ML, Nishii K, Kidner CA. Target capture and genome skimming for plant diversity studies. APPLICATIONS IN PLANT SCIENCES 2023; 11:e11537. [PMID: 37601316 PMCID: PMC10439825 DOI: 10.1002/aps3.11537] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 06/16/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023]
Abstract
Recent technological advances in long-read high-throughput sequencing and assembly methods have facilitated the generation of annotated chromosome-scale whole-genome sequence data for evolutionary studies; however, generating such data can still be difficult for many plant species. For example, obtaining high-molecular-weight DNA is typically impossible for samples in historical herbarium collections, which often have degraded DNA. The need to fast-freeze newly collected living samples to conserve high-quality DNA can be complicated when plants are only found in remote areas. Therefore, short-read reduced-genome representations, such as target capture and genome skimming, remain important for evolutionary studies. Here, we review the pros and cons of each technique for non-model plant taxa. We provide guidance related to logistics, budget, the genomic resources previously available for the target clade, and the nature of the study. Furthermore, we assess the available bioinformatic analyses, detailing best practices and pitfalls, and suggest pathways to combine newly generated data with legacy data. Finally, we explore the possible downstream analyses allowed by the type of data generated using each technique. We provide a practical guide to help researchers make the best-informed choice regarding reduced genome representation for evolutionary studies of non-model plants in cases where whole-genome sequencing remains impractical.
Collapse
Affiliation(s)
| | - Giada Ferrari
- Royal Botanic Garden EdinburghEdinburghUnited Kingdom
| | | | | | - Kanae Nishii
- Royal Botanic Garden EdinburghEdinburghUnited Kingdom
| | - Catherine A. Kidner
- Royal Botanic Garden EdinburghEdinburghUnited Kingdom
- School of Biological SciencesUniversity of EdinburghEdinburghUnited Kingdom
| |
Collapse
|
8
|
Bazant W, Blevins AS, Crouch K, Beiting DP. Improved eukaryotic detection compatible with large-scale automated analysis of metagenomes. MICROBIOME 2023; 11:72. [PMID: 37032329 PMCID: PMC10084625 DOI: 10.1186/s40168-023-01505-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 02/24/2023] [Indexed: 06/19/2023]
Abstract
BACKGROUND Eukaryotes such as fungi and protists frequently accompany bacteria and archaea in microbial communities. Unfortunately, their presence is difficult to study with "shotgun" metagenomic sequencing since prokaryotic signals dominate in most environments. Recent methods for eukaryotic detection use eukaryote-specific marker genes, but they do not incorporate strategies to handle the presence of eukaryotes that are not represented in the reference marker gene set, and they are not compatible with web-based tools for downstream analysis. RESULTS Here, we present CORRAL (for Clustering Of Related Reference ALignments), a tool for the identification of eukaryotes in shotgun metagenomic data based on alignments to eukaryote-specific marker genes and Markov clustering. Using a combination of simulated datasets, mock community standards, and large publicly available human microbiome studies, we demonstrate that our method is not only sensitive and accurate but is also capable of inferring the presence of eukaryotes not included in the marker gene reference, such as novel strains. Finally, we deploy CORRAL on our MicrobiomeDB.org resource, producing an atlas of eukaryotes present in various environments of the human body and linking their presence to study covariates. CONCLUSIONS CORRAL allows eukaryotic detection to be automated and carried out at scale. Implementation of CORRAL in MicrobiomeDB.org creates a running atlas of microbial eukaryotes in metagenomic studies. Since our approach is independent of the reference used, it may be applicable to other contexts where shotgun metagenomic reads are matched against redundant but non-exhaustive databases, such as the identification of bacterial virulence genes or taxonomic classification of viral reads. Video Abstract.
Collapse
Affiliation(s)
- Wojtek Bazant
- Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK
| | - Ann S Blevins
- Department of Pathobiology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kathryn Crouch
- Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, UK.
| | - Daniel P Beiting
- Department of Pathobiology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
9
|
Zhang Y, Zhang C, Huo W, Wang X, Zhang M, Palmer K, Chen M. An expectation-maximization algorithm for estimating proportions of deletions among bacterial populations with application to study antibiotic resistance gene transfer in Enterococcus faecalis. MARINE LIFE SCIENCE & TECHNOLOGY 2023; 5:28-43. [PMID: 36744155 PMCID: PMC9888353 DOI: 10.1007/s42995-022-00144-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 08/25/2022] [Indexed: 06/18/2023]
Abstract
The emergence of antibiotic resistance in bacteria limits the availability of antibiotic choices for treatment and infection control, thereby representing a major threat to human health. The de novo mutation of bacterial genomes is an essential mechanism by which bacteria acquire antibiotic resistance. Previously, deletion mutations within bacterial immune systems, ranging from dozens to thousands of base pairs (bps) in length, have been associated with the spread of antibiotic resistance. Most current methods for evaluating genomic structural variations (SVs) have concentrated on detecting them, rather than estimating the proportions of populations that carry distinct SVs. A better understanding of the distribution of mutations and subpopulations dynamics in bacterial populations is needed to appreciate antibiotic resistance evolution and movement of resistance genes through populations. Here, we propose a statistical model to estimate the proportions of genomic deletions in a mixed population based on Expectation-Maximization (EM) algorithms and next-generation sequencing (NGS) data. The method integrates both insert size and split-read mapping information to iteratively update estimated distributions. The proposed method was evaluated with three simulations that demonstrated the production of accurate estimations. The proposed method was then applied to investigate the horizontal transfers of antibiotic resistance genes in concert with changes in the CRISPR-Cas system of E. faecalis. Supplementary Information The online version contains supplementary material available at 10.1007/s42995-022-00144-z.
Collapse
Affiliation(s)
- Yu Zhang
- School of Mathematical Sciences, Ocean University of China, Qingdao, 266000 China
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75080 USA
| | - Cong Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75080 USA
| | - Wenwen Huo
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080 USA
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, TX 75205 USA
| | - Michael Zhang
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080 USA
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing, 100084 China
| | - Kelli Palmer
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080 USA
| | - Min Chen
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75080 USA
- Department of Population and Data Sciences, UT Southwestern Medical Center, Dallas, TX 75390 USA
| |
Collapse
|
10
|
Yu D, Xin L, Qing X, Hao Z, Yong W, Jiangjiang Z, Yaqiu L. Key circRNAs from goat: discovery, integrated regulatory network and their putative roles in the differentiation of intramuscular adipocytes. BMC Genomics 2023; 24:51. [PMID: 36707755 PMCID: PMC9883971 DOI: 10.1186/s12864-023-09141-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 01/17/2023] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND The procession of preadipocytes differentiation into mature adipocytes involves multiple cellular and signal transduction pathways. Recently. a seirces of noncoding RNAs (ncRNAs), including circular RNAs (circRNAs) were proved to play important roles in regulating differentiation of adipocytes. RESULT In this study, we aimed to identificate the potential circRNAs in the early and late stages of goat intramuscular adipocytes differentiation. Using bioinformatics methods to predict their biological functions and map the circRNA-miRNA interaction network. Over 104 million clean reads in goat intramuscular preadipocytes and adipocytes were mapped, of which16 circRNAs were differentially expressed (DE-circRNAs). Furthermore, we used real-time fluorescent quantitative PCR (qRT-PCR) technology to randomly detect the expression levels of 8 circRNAs among the DE-circRNAs, and our result verifies the accuracy of the RNA-seq data. From the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis of the DE-circRNAs, two circRNAs, circ_0005870 and circ_0000946, were found in Focal adhesion and PI3K-Akt signaling pathway. Then we draw the circRNA-miRNA interaction network and obtained the miRNAs that possibly interact with circ_0005870 and circ_0000946. Using TargetScan, miRTarBase and miR-TCDS online databases, we further obtained the mRNAs that may interact with the miRNAs, and generated the final circRNA-miRNA-mRNA interaction network. Combined with the following GO (Gene Ontology) and KEGG enrichment analysis, we obtained 5 key mRNAs related to adipocyte differentiation in our interaction network, which are FOXO3(forkhead box O3), PPP2CA (protein phosphatase 2 catalytic subunit alpha), EEIF4E (eukaryotic translation initiation factor 4), CDK6 (cyclin dependent kinase 6) and ACVR1 (activin A receptor type 1). CONCLUSIONS By using Illumina HiSeq and online databases, we generated the final circRNA-miRNA-mRNA interaction network that have valuable functions in adipocyte differentiation. Our work serves as a valuable genomic resource for in-depth exploration of the molecular mechanism of ncRNAs interaction network regulating adipocyte differentiation.
Collapse
Affiliation(s)
- Du Yu
- grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Ministry of Education, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Exploitation of Sichuan Province, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XCollege of Animal and Veterinary Sciences, Southwest Minzu University, Chengdu, China
| | - Li Xin
- grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Ministry of Education, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Exploitation of Sichuan Province, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XCollege of Animal and Veterinary Sciences, Southwest Minzu University, Chengdu, China
| | - Xu Qing
- grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Ministry of Education, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Exploitation of Sichuan Province, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XCollege of Animal and Veterinary Sciences, Southwest Minzu University, Chengdu, China
| | - Zhang Hao
- grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Ministry of Education, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Exploitation of Sichuan Province, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XCollege of Animal and Veterinary Sciences, Southwest Minzu University, Chengdu, China
| | - Wang Yong
- grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Ministry of Education, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Exploitation of Sichuan Province, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XCollege of Animal and Veterinary Sciences, Southwest Minzu University, Chengdu, China
| | - Zhu Jiangjiang
- grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Ministry of Education, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Exploitation of Sichuan Province, Southwest Minzu University, Chengdu, China
| | - Lin Yaqiu
- grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Utilization, Ministry of Education, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XKey Laboratory of Qinghai-Tibetan Plateau Animal Genetic Resource Reservation and Exploitation of Sichuan Province, Southwest Minzu University, Chengdu, China ,grid.412723.10000 0004 0604 889XCollege of Animal and Veterinary Sciences, Southwest Minzu University, Chengdu, China
| |
Collapse
|
11
|
Yuan Y, Gao F, Chang Y, Zhao Q, He X. Advances of mRNA vaccine in tumor: a maze of opportunities and challenges. Biomark Res 2023; 11:6. [PMID: 36650562 PMCID: PMC9845107 DOI: 10.1186/s40364-023-00449-w] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 01/10/2023] [Indexed: 01/19/2023] Open
Abstract
High-frequency mutations in tumor genomes could be exploited as an asset for developing tumor vaccines. In recent years, with the tremendous breakthrough in genomics, intelligence algorithm, and in-depth insight of tumor immunology, it has become possible to rapidly target genomic alterations in tumor cell and rationally select vaccine targets. Among a variety of candidate vaccine platforms, the early application of mRNA was limited by instability low efficiency and excessive immunogenicity until the successful development of mRNA vaccines against SARS-COV-2 broken of technical bottleneck in vaccine preparation, allowing tumor mRNA vaccines to be prepared rapidly in an economical way with good performance of stability and efficiency. In this review, we systematically summarized the classification and characteristics of tumor antigens, the general process and methods for screening neoantigens, the strategies of vaccine preparations and advances in clinical trials, as well as presented the main challenges in the current mRNA tumor vaccine development.
Collapse
Affiliation(s)
- Yuan Yuan
- grid.413247.70000 0004 1808 0969Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, China ,grid.412793.a0000 0004 1799 5032Department of Gastroenterology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Fan Gao
- grid.413247.70000 0004 1808 0969Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, China ,grid.412793.a0000 0004 1799 5032Department of Gastroenterology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ying Chang
- grid.413247.70000 0004 1808 0969Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, China ,grid.413247.70000 0004 1808 0969Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Diseases, Wuhan, China
| | - Qiu Zhao
- grid.413247.70000 0004 1808 0969Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, China ,grid.413247.70000 0004 1808 0969Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Diseases, Wuhan, China
| | - Xingxing He
- grid.413247.70000 0004 1808 0969Department of Gastroenterology, Zhongnan Hospital of Wuhan University, Wuhan, China ,grid.412793.a0000 0004 1799 5032Department of Gastroenterology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China ,grid.413247.70000 0004 1808 0969Hubei Clinical Center and Key Laboratory of Intestinal and Colorectal Diseases, Wuhan, China
| |
Collapse
|
12
|
Kong J, Fan C, Liao X, Chen A, Yang S, Zhao L, Li H. Accurate detection of Escherichia coli O157:H7 and Salmonella enterica serovar typhimurium based on the combination of next-generation sequencing and droplet digital PCR. Lebensm Wiss Technol 2022. [DOI: 10.1016/j.lwt.2022.113913] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
13
|
Váradi A, Kaszab E, Kardos G, Prépost E, Szarka K, Laczkó L. Rapid genotyping of targeted viral samples using Illumina short-read sequencing data. PLoS One 2022; 17:e0274414. [PMID: 36112576 PMCID: PMC9481040 DOI: 10.1371/journal.pone.0274414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/30/2022] [Indexed: 11/19/2022] Open
Abstract
The most important information about microorganisms might be their accurate genome sequence. Using current Next Generation Sequencing methods, sequencing data can be generated at an unprecedented pace. However, we still lack tools for the automated and accurate reference-based genotyping of viral sequencing reads. This paper presents our pipeline designed to reconstruct the dominant consensus genome of viral samples and analyze their within-host variability. We benchmarked our approach on numerous datasets and showed that the consensus genome of samples could be obtained reliably without further manual data curation. Our pipeline can be a valuable tool for fast identifying viral samples. The pipeline is publicly available on the project’s GitHub page (https://github.com/laczkol/QVG).
Collapse
Affiliation(s)
- Alex Váradi
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Laboratory Medicine, University of Pécs, Pécs, Hungary
| | - Eszter Kaszab
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
- Veterinary Medical Research Institute, Budapest, Hungary
| | - Gábor Kardos
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Eszter Prépost
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Krisztina Szarka
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Levente Laczkó
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
- ELKH-DE Conservation Biology Research Group, Debrecen, Hungary
- * E-mail:
| |
Collapse
|
14
|
The human "contaminome": bacterial, viral, and computational contamination in whole genome sequences from 1000 families. Sci Rep 2022; 12:9863. [PMID: 35701436 PMCID: PMC9198055 DOI: 10.1038/s41598-022-13269-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 05/18/2022] [Indexed: 01/11/2023] Open
Abstract
The unmapped readspace of whole genome sequencing data tends to be large but is often ignored. We posit that it contains valuable signals of both human infection and contamination. Using unmapped and poorly aligned reads from whole genome sequences (WGS) of over 1000 families and nearly 5000 individuals, we present insights into common viral, bacterial, and computational contamination that plague whole genome sequencing studies. We present several notable results: (1) In addition to known contaminants such as Epstein-Barr virus and phiX, sequences from whole blood and lymphocyte cell lines contain many other contaminants, likely originating from storage, prep, and sequencing pipelines. (2) Sequencing plate and biological sample source of a sample strongly influence contamination profile. And, (3) Y-chromosome fragments not on the human reference genome commonly mismap to bacterial reference genomes. Both experiment-derived and computational contamination is prominent in next-generation sequencing data. Such contamination can compromise results from WGS as well as metagenomics studies, and standard protocols for identifying and removing contamination should be developed to ensure the fidelity of sequencing-based studies.
Collapse
|
15
|
Diallo I, Ho J, Lalaouna D, Massé E, Provost P. RNA Sequencing Unveils Very Small RNAs With Potential Regulatory Functions in Bacteria. Front Mol Biosci 2022; 9:914991. [PMID: 35720117 PMCID: PMC9203972 DOI: 10.3389/fmolb.2022.914991] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 05/02/2022] [Indexed: 12/21/2022] Open
Abstract
RNA sequencing (RNA-seq) is the gold standard for the discovery of small non-coding RNAs. Following a long-standing approach, reads shorter than 16 nucleotides (nt) are removed from the small RNA sequencing libraries or datasets. The serendipitous discovery of an eukaryotic 12 nt-long RNA species capable of modulating the microRNA from which they derive prompted us to challenge this dogma and, by expanding the window of RNA sizes down to 8 nt, to confirm the existence of functional very small RNAs (vsRNAs <16 nt). Here we report the detailed profiling of vsRNAs in Escherichia coli, E. coli-derived outer membrane vesicles (OMVs) and five other bacterial strains (Pseudomonas aeruginosa PA7, P. aeruginosa PAO1, Salmonella enterica serovar Typhimurium 14028S, Legionella pneumophila JR32 Philadelphia-1 and Staphylococcus aureus HG001). vsRNAs of 8–15 nt in length [RNAs (8-15 nt)] were found to be more abundant than RNAs of 16–30 nt in length [RNAs (16–30 nt)]. vsRNA biotypes were distinct and varied within and across bacterial species and accounted for one third of reads identified in the 8–30 nt window. The tRNA-derived fragments (tRFs) have appeared as a major biotype among the vsRNAs, notably Ile-tRF and Ala-tRF, and were selectively loaded in OMVs. tRF-derived vsRNAs appear to be thermodynamically stable with at least 2 G-C basepairs and stem-loop structure. The analyzed tRF-derived vsRNAs are predicted to target several human host mRNAs with diverse functions. Bacterial vsRNAs and OMV-derived vsRNAs could be novel players likely modulating the intricate relationship between pathogens and their hosts.
Collapse
Affiliation(s)
- Idrissa Diallo
- CHU de Québec Research Center/CHUL Pavilion, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec City, QC, Canada
| | - Jeffrey Ho
- CHU de Québec Research Center/CHUL Pavilion, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec City, QC, Canada
| | - David Lalaouna
- CRCHUS, RNA Group, Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Eric Massé
- CRCHUS, RNA Group, Department of Biochemistry and Functional Genomics, Faculty of Medicine and Health Sciences, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Patrick Provost
- CHU de Québec Research Center/CHUL Pavilion, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec City, QC, Canada
- *Correspondence: Patrick Provost,
| |
Collapse
|
16
|
Chang TC, Xu K, Cheng Z, Wu G. Somatic and Germline Variant Calling from Next-Generation Sequencing Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:37-54. [DOI: 10.1007/978-3-030-91836-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
17
|
AIM in Medical Informatics. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
Bogaerts B, Winand R, Van Braekel J, Hoffman S, Roosens NHC, De Keersmaecker SCJ, Marchal K, Vanneste K. Evaluation of WGS performance for bacterial pathogen characterization with the Illumina technology optimized for time-critical situations. Microb Genom 2021; 7:000699. [PMID: 34739368 PMCID: PMC8743554 DOI: 10.1099/mgen.0.000699] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 09/30/2021] [Indexed: 12/29/2022] Open
Abstract
Whole genome sequencing (WGS) has become the reference standard for bacterial outbreak investigation and pathogen typing, providing a resolution unattainable with conventional molecular methods. Data generated with Illumina sequencers can however only be analysed after the sequencing run has finished, thereby losing valuable time during emergency situations. We evaluated both the effect of decreasing overall run time, and also a protocol to transfer and convert intermediary files generated by Illumina sequencers enabling real-time data analysis for multiple samples part of the same ongoing sequencing run, as soon as the forward reads have been sequenced. To facilitate implementation for laboratories operating under strict quality systems, extensive validation of several bioinformatics assays (16S rRNA species confirmation, gene detection against virulence factor and antimicrobial resistance databases, SNP-based antimicrobial resistance detection, serotype determination, and core genome multilocus sequence typing) for three bacterial pathogens (Mycobacterium tuberculosis , Neisseria meningitidis , and Shiga-toxin producing Escherichia coli ) was performed by evaluating performance in function of the two most critical sequencing parameters, i.e. read length and coverage. For the majority of evaluated bioinformatics assays, actionable results could be obtained between 14 and 22 h of sequencing, decreasing the overall sequencing-to-results time by more than half. This study aids in reducing the turn-around time of WGS analysis by facilitating a faster response in time-critical scenarios and provides recommendations for time-optimized WGS with respect to required read length and coverage to achieve a minimum level of performance for the considered bioinformatics assay(s), which can also be used to maximize the cost-effectiveness of routine surveillance sequencing when response time is not essential.
Collapse
Affiliation(s)
- Bert Bogaerts
- Transversal activities in Applied Genomics, Sciensano, Brussels (1050), Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent (9000), Belgium
| | - Raf Winand
- Transversal activities in Applied Genomics, Sciensano, Brussels (1050), Belgium
| | - Julien Van Braekel
- Transversal activities in Applied Genomics, Sciensano, Brussels (1050), Belgium
| | - Stefan Hoffman
- Transversal activities in Applied Genomics, Sciensano, Brussels (1050), Belgium
| | - Nancy H. C. Roosens
- Transversal activities in Applied Genomics, Sciensano, Brussels (1050), Belgium
| | | | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent (9000), Belgium
- Department of Information Technology, IDLab, imec, Ghent University, Ghent (9000), Belgium
- Department of Genetics, University of Pretoria, 0001 Pretoria, South Africa
| | - Kevin Vanneste
- Transversal activities in Applied Genomics, Sciensano, Brussels (1050), Belgium
| |
Collapse
|
19
|
Kong NR, Chai L, Tenen DG, Bassal MA. A modified CUT&RUN protocol and analysis pipeline to identify transcription factor binding sites in human cell lines. STAR Protoc 2021; 2:100750. [PMID: 34458869 PMCID: PMC8379522 DOI: 10.1016/j.xpro.2021.100750] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
CUT&RUN is a recently developed in situ chromatin profiling technique that enables high-resolution chromatin mapping and probing. Herein, we describe our adapted CUT&RUN protocol for transcription factors (TFs). Our protocol outlines all necessary steps for TF profiling including the procedure to obtain proteinA-Mnase, while also outlining the bioinformatic pipeline steps required to process, analyze, and identify novel binding sites and sequences. Due to the small number of cells required, this method will allow the elucidation of cell context-dependent functions of many TFs. For details on the use and execution of this protocol, please refer to Kong et al. (2021). CUT&RUN was recently developed for in situ chromatin mapping and probing Herein, we describe our modified CUT&RUN protocol to profile TF binding sites and motifs Modifications relate to nuclear TF targeting, rather than whole-cell histone targeting Bespoke bioinformatics pipeline simplifies analysis enabling binding site identification
Collapse
Affiliation(s)
- Nikki Ruoxi Kong
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Harvard Stem Cell Institute, Boston, MA 02115, USA
| | - Li Chai
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Harvard Stem Cell Institute, Boston, MA 02115, USA
| | - Daniel Geoffrey Tenen
- Harvard Stem Cell Institute, Boston, MA 02115, USA
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Mahmoud Adel Bassal
- Harvard Stem Cell Institute, Boston, MA 02115, USA
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
- Corresponding author
| |
Collapse
|
20
|
Freire R, Weisweiler M, Guerreiro R, Baig N, Hüttel B, Obeng-Hinneh E, Renner J, Hartje S, Muders K, Truberg B, Rosen A, Prigge V, Bruckmüller J, Lübeck J, Stich B. Chromosome-scale reference genome assembly of a diploid potato clone derived from an elite variety. G3-GENES GENOMES GENETICS 2021; 11:6371871. [PMID: 34534288 PMCID: PMC8664475 DOI: 10.1093/g3journal/jkab330] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/08/2021] [Indexed: 01/27/2023]
Abstract
Potato (Solanum tuberosum L.) is one of the most important crops with a worldwide production of 370 million metric tons. The objectives of this study were (1) to create a high-quality consensus sequence across the two haplotypes of a diploid clone derived from a tetraploid elite variety and assess the sequence divergence from the available potato genome assemblies, as well as among the two haplotypes; (2) to evaluate the new assembly’s usefulness for various genomic methods; and (3) to assess the performance of phasing in diploid and tetraploid clones, using linked-read sequencing technology. We used PacBio long reads coupled with 10x Genomics reads and proximity ligation scaffolding to create the dAg1_v1.0 reference genome sequence. With a final assembly size of 812 Mb, where 750 Mb are anchored to 12 chromosomes, our assembly is larger than other available potato reference sequences and high proportions of properly paired reads were observed for clones unrelated by pedigree to dAg1. Comparisons of the new dAg1_v1.0 sequence to other potato genome sequences point out the high divergence between the different potato varieties and illustrate the potential of using dAg1_v1.0 sequence in breeding applications.
Collapse
Affiliation(s)
- Ruth Freire
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Ricardo Guerreiro
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Nadia Baig
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Bruno Hüttel
- Max Planck-Genome-centre Cologne, Max Planck Institute for Plant Breeding, Carl-von-Linne-Weg 10, 50829 Köln, Germany
| | - Evelyn Obeng-Hinneh
- Böhm-Nordkartoffel Agrarproduktion GmbH & Co. OHG, Strehlow 19, 17111 Hohenmocker, Germany
| | - Juliane Renner
- Böhm-Nordkartoffel Agrarproduktion GmbH & Co. OHG, Strehlow 19, 17111 Hohenmocker, Germany
| | - Stefanie Hartje
- Böhm-Nordkartoffel Agrarproduktion GmbH & Co. OHG, Strehlow 19, 17111 Hohenmocker, Germany
| | - Katja Muders
- Nordring- Kartoffelzucht- und Vermehrungs- GmbH, Parkweg 4, 18190 Sanitz, Germany
| | - Bernd Truberg
- Nordring- Kartoffelzucht- und Vermehrungs- GmbH, Parkweg 4, 18190 Sanitz, Germany
| | - Arne Rosen
- Nordring- Kartoffelzucht- und Vermehrungs- GmbH, Parkweg 4, 18190 Sanitz, Germany
| | - Vanessa Prigge
- SaKa Pflanzenzucht GmbH & Co. KG, Zuchtstation Windeby, Eichenallee 9, 24340 Windeby, Germany
| | | | - Jens Lübeck
- Solana Research GmbH, Eichenallee 9, 24340 Windeby, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany.,Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany
| |
Collapse
|
21
|
Rodionov DA, Rodionova IA, Rodionov VA, Arzamasov AA, Zhang K, Rubinstein GM, Tanwee TNN, Bing RG, Crosby JR, Nookaew I, Basen M, Brown SD, Wilson CM, Klingeman DM, Poole FL, Zhang Y, Kelly RM, Adams MWW. Transcriptional Regulation of Plant Biomass Degradation and Carbohydrate Utilization Genes in the Extreme Thermophile Caldicellulosiruptor bescii. mSystems 2021; 6:e0134520. [PMID: 34060910 PMCID: PMC8579813 DOI: 10.1128/msystems.01345-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/04/2021] [Indexed: 11/20/2022] Open
Abstract
Extremely thermophilic bacteria from the genus Caldicellulosiruptor can degrade polysaccharide components of plant cell walls and subsequently utilize the constituting mono- and oligosaccharides. Through metabolic engineering, ethanol and other industrially important end products can be produced. Previous experimental studies identified a variety of carbohydrate-active enzymes in model species Caldicellulosiruptor saccharolyticus and Caldicellulosiruptor bescii, while prior transcriptomic experiments identified their putative carbohydrate uptake transporters. We investigated the mechanisms of transcriptional regulation of carbohydrate utilization genes using a comparative genomics approach applied to 14 Caldicellulosiruptor species. The reconstruction of carbohydrate utilization regulatory network includes the predicted binding sites for 34 mostly local regulators and point to the regulatory mechanisms controlling expression of genes involved in degradation of plant biomass. The Rex and CggR regulons control the central glycolytic and primary redox reactions. The identified transcription factor binding sites and regulons were validated with transcriptomic and transcription start site experimental data for C. bescii grown on cellulose, cellobiose, glucose, xylan, and xylose. The XylR and XynR regulons control xylan-induced transcriptional response of genes involved in degradation of xylan and xylose utilization. The reconstructed regulons informed the carbohydrate utilization reconstruction analysis and improved functional annotations of 51 transporters and 11 catabolic enzymes. Using gene deletion, we confirmed that the shared ATPase component MsmK is essential for growth on oligo- and polysaccharides but not for the utilization of monosaccharides. By elucidating the carbohydrate utilization framework in C. bescii, strategies for metabolic engineering can be pursued to optimize yields of bio-based fuels and chemicals from lignocellulose. IMPORTANCE To develop functional metabolic engineering platforms for nonmodel microorganisms, a comprehensive understanding of the physiological and metabolic characteristics is critical. Caldicellulosiruptor bescii and other species in this genus have untapped potential for conversion of unpretreated plant biomass into industrial fuels and chemicals. The highly interactive and complex machinery used by C. bescii to acquire and process complex carbohydrates contained in lignocellulose was elucidated here to complement related efforts to develop a metabolic engineering platform with this bacterium. Guided by the findings here, a clearer picture of how C. bescii natively drives carbohydrate utilization is provided and strategies to engineer this bacterium for optimal conversion of lignocellulose to commercial products emerge.
Collapse
Affiliation(s)
- Dmitry A. Rodionov
- Sanford-Burnhams-Prebys Medical Discovery Institute, La Jolla, California, USA
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Irina A. Rodionova
- Department of Bioengineering, University of California—San Diego, La Jolla, California, USA
| | - Vladimir A. Rodionov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Aleksandr A. Arzamasov
- Sanford-Burnhams-Prebys Medical Discovery Institute, La Jolla, California, USA
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Ke Zhang
- Department of Cell and Molecular Biology, College of the Environment and Life Sciences, University of Rhode Island, Kingston, Rhode Island, USA
| | - Gabriel M. Rubinstein
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia, USA
| | - Tania N. N. Tanwee
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia, USA
| | - Ryan G. Bing
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina, USA
| | - James R. Crosby
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina, USA
| | - Intawat Nookaew
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Mirko Basen
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Biowissenschaften, Mikrobiologie, Universität Rostock, Rostock, Germany
| | - Steven D. Brown
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Charlotte M. Wilson
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
- University of Otago, Dunedin, New Zealand
| | - Dawn M. Klingeman
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Farris L. Poole
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia, USA
| | - Ying Zhang
- Department of Cell and Molecular Biology, College of the Environment and Life Sciences, University of Rhode Island, Kingston, Rhode Island, USA
| | - Robert M. Kelly
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina, USA
| | - Michael W. W. Adams
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia, USA
| |
Collapse
|
22
|
New evaluation methods of read mapping by 17 aligners on simulated and empirical NGS data: an updated comparison of DNA- and RNA-Seq data from Illumina and Ion Torrent technologies. Neural Comput Appl 2021; 33:15669-15692. [PMID: 34155424 PMCID: PMC8208613 DOI: 10.1007/s00521-021-06188-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 06/02/2021] [Indexed: 12/13/2022]
Abstract
During the last (15) years, improved omics sequencing technologies have expanded the scale and resolution of various biological applications, generating high-throughput datasets that require carefully chosen software tools to be processed. Therefore, following the sequencing development, bioinformatics researchers have been challenged to implement alignment algorithms for next-generation sequencing reads. However, nowadays selection of aligners based on genome characteristics is poorly studied, so our benchmarking study extended the “state of art” comparing 17 different aligners. The chosen tools were assessed on empirical human DNA- and RNA-Seq data, as well as on simulated datasets in human and mouse, evaluating a set of parameters previously not considered in such kind of benchmarks. As expected, we found that each tool was the best in specific conditions. For Ion Torrent single-end RNA-Seq samples, the most suitable aligners were CLC and BWA-MEM, which reached the best results in terms of efficiency, accuracy, duplication rate, saturation profile and running time. About Illumina paired-end osteomyelitis transcriptomics data, instead, the best performer algorithm, together with the already cited CLC, resulted Novoalign, which excelled in accuracy and saturation analyses. Segemehl and DNASTAR performed the best on both DNA-Seq data, with Segemehl particularly suitable for exome data. In conclusion, our study could guide users in the selection of a suitable aligner based on genome and transcriptome characteristics. However, several other aspects, emerged from our work, should be considered in the evolution of alignment research area, such as the involvement of artificial intelligence to support cloud computing and mapping to multiple genomes.
Collapse
|
23
|
Gombolay AL, Storici F. Mapping ribonucleotides embedded in genomic DNA to single-nucleotide resolution using Ribose-Map. Nat Protoc 2021; 16:3625-3638. [PMID: 34089018 DOI: 10.1038/s41596-021-00553-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 04/09/2021] [Indexed: 11/09/2022]
Abstract
The most common nonstandard nucleotides found in genomic DNA are ribonucleotides. Although ribonucleotides are frequently incorporated into DNA by replicative DNA polymerases, very little is known about the distribution and signatures of ribonucleotides incorporated into DNA. Recent advances in high-throughput ribonucleotide sequencing can capture the exact locations of ribonucleotides in genomic DNA. Ribose-Map is a user-friendly, standardized bioinformatics toolkit for the comprehensive analysis of ribonucleotide sequencing experiments. It allows researchers to map the locations of ribonucleotides in DNA to single-nucleotide resolution and identify biological signatures of ribonucleotide incorporation. In addition, it can be applied to data generated using any currently available high-throughput ribonucleotide sequencing technique, thus standardizing the analysis of ribonucleotide sequencing experiments and allowing direct comparisons of results. This protocol describes in detail how to use Ribose-Map to analyze ribonucleotide sequencing data, including preparing the reads for analysis, locating the genomic coordinates of ribonucleotides, exploring the genome-wide distribution of ribonucleotides, determining the nucleotide sequence context of ribonucleotides and identifying hotspots of ribonucleotide incorporation. Ribose-Map does not require background knowledge of ribonucleotide sequencing analysis and assumes only basic command-line skills. The protocol requires less than 3 h of computing time for most datasets and ~30 min of hands-on time. Ribose-Map is available at https://github.com/agombolay/ribose-map .
Collapse
Affiliation(s)
- Alli L Gombolay
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Francesca Storici
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
24
|
Wood AW, Duda TF. Reticulate evolution in Conidae: Evidence of nuclear and mitochondrial introgression. Mol Phylogenet Evol 2021; 161:107182. [PMID: 33892099 DOI: 10.1016/j.ympev.2021.107182] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 04/07/2021] [Accepted: 04/15/2021] [Indexed: 10/21/2022]
Abstract
Conidae is a hyperdiverse family of marine snails that has many hallmarks of adaptive radiation. Hybridization and introgression may contribute to such instances of rapid diversification by generating novel gene combinations that facilitate exploitation of distinct niches. Here we evaluated whether or not these mechanisms may have contributed to the evolutionary history of a subgenus of Conidae (Virroconus). Several observations hint at evidence of past introgression for members of this group, including incongruence between phylogenetic relationships inferred from mitochondrial gene sequences and morphology and widespread sympatry of many Virroconus species in the Indo-West Pacific. We generated and analyzed transcriptome data of Virroconus species to (i) infer a robust nuclear phylogeny, (ii) assess mitochondrial and nuclear gene tree discordance, and (iii) formally test for introgression of nuclear loci. We identified introgression of mitochondrial genomes and nuclear gene regions between ancestors of one pair of Virroconus species, and mitochondrial introgression between another pair. We also found evidence of adaptive introgression of conotoxin venom loci between a third pair of species. Together, our results demonstrate that hybridization and introgression impacted the evolutionary history of Virroconus and hence may have contributed to the adaptive radiation of Conidae.
Collapse
Affiliation(s)
- Andrew W Wood
- University of Michigan, Department of Ecology & Evolutionary Biology, 1105 North University Avenue, Biological Sciences Building, Ann Arbor, MI 48109-1085, USA.
| | - Thomas F Duda
- University of Michigan, Department of Ecology & Evolutionary Biology, 1105 North University Avenue, Biological Sciences Building, Ann Arbor, MI 48109-1085, USA.
| |
Collapse
|
25
|
Wickland DP, Ren Y, Sinnwell JP, Reddy JS, Pottier C, Sarangi V, Carrasquillo MM, Ross OA, Younkin SG, Ertekin-Taner N, Rademakers R, Hudson ME, Mainzer LS, Biernacka JM, Asmann YW. Impact of variant-level batch effects on identification of genetic risk factors in large sequencing studies. PLoS One 2021; 16:e0249305. [PMID: 33861770 PMCID: PMC8051815 DOI: 10.1371/journal.pone.0249305] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 03/15/2021] [Indexed: 11/30/2022] Open
Abstract
Genetic studies have shifted to sequencing-based rare variants discovery after decades of success in identifying common disease variants by Genome-Wide Association Studies using Single Nucleotide Polymorphism chips. Sequencing-based studies require large sample sizes for statistical power and therefore often inadvertently introduce batch effects because samples are typically collected, processed, and sequenced at multiple centers. Conventionally, batch effects are first detected and visualized using Principal Components Analysis and then controlled by including batch covariates in the disease association models. For sequencing-based genetic studies, because all variants included in the association analyses have passed sequencing-related quality control measures, this conventional approach treats every variant as equal and ignores the substantial differences still remaining in variant qualities and characteristics such as genotype quality scores, alternative allele fractions (fraction of reads supporting alternative allele at a variant position) and sequencing depths. In the Alzheimer’s Disease Sequencing Project (ADSP) exome dataset of 9,904 cases and controls, we discovered hidden variant-level differences between sample batches of three sequencing centers and two exome capture kits. Although sequencing centers were included as a covariate in our association models, we observed differences at the variant level in genotype quality and alternative allele fraction between samples processed by different exome capture kits that significantly impacted both the confidence of variant detection and the identification of disease-associated variants. Furthermore, we found that a subset of top disease-risk variants came exclusively from samples processed by one exome capture kit that was more effective at capturing the alternative alleles compared to the other kit. Our findings highlight the importance of additional variant-level quality control for large sequencing-based genetic studies. More importantly, we demonstrate that automatically filtering out variants with batch differences may lead to false negatives if the batch discordances come largely from quality differences and if the batch-specific variants have better quality.
Collapse
Affiliation(s)
- Daniel P. Wickland
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, Florida, United States of America
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Yingxue Ren
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, Florida, United States of America
| | - Jason P. Sinnwell
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Joseph S. Reddy
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, Florida, United States of America
| | - Cyril Pottier
- Department of Neuroscience, Mayo Clinic, Jacksonville, Florida, United States of America
| | - Vivekananda Sarangi
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, United States of America
| | | | - Owen A. Ross
- Department of Neuroscience, Mayo Clinic, Jacksonville, Florida, United States of America
- Department of Clinical Genomics, Mayo Clinic, Jacksonville, Florida, United States of America
| | - Steven G. Younkin
- Department of Neuroscience, Mayo Clinic, Jacksonville, Florida, United States of America
| | - Nilüfer Ertekin-Taner
- Department of Neuroscience, Mayo Clinic, Jacksonville, Florida, United States of America
- Department of Neurology, Mayo Clinic, Jacksonville, Florida, United States of America
| | - Rosa Rademakers
- Department of Neuroscience, Mayo Clinic, Jacksonville, Florida, United States of America
| | - Matthew E. Hudson
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, Carver Biotechnology Center and Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Liudmila Sergeevna Mainzer
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Carl R. Woese Institute for Genomic Biology, Carver Biotechnology Center and Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Joanna M. Biernacka
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Yan W. Asmann
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, Florida, United States of America
- * E-mail:
| |
Collapse
|
26
|
Diallo I, Ho J, Laffont B, Laugier J, Benmoussa A, Lambert M, Husseini Z, Soule G, Kozak R, Kobinger GP, Provost P. Altered microRNA Transcriptome in Cultured Human Liver Cells upon Infection with Ebola Virus. Int J Mol Sci 2021; 22:ijms22073792. [PMID: 33917562 PMCID: PMC8038836 DOI: 10.3390/ijms22073792] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/27/2021] [Accepted: 03/30/2021] [Indexed: 02/07/2023] Open
Abstract
Ebola virus (EBOV) is a virulent pathogen, notorious for inducing life-threatening hemorrhagic fever, that has been responsible for several outbreaks in Africa and remains a public health threat. Yet, its pathogenesis is still not completely understood. Although there have been numerous studies on host transcriptional response to EBOV, with an emphasis on the clinical features, the impact of EBOV infection on post-transcriptional regulatory elements, such as microRNAs (miRNAs), remains largely unexplored. MiRNAs are involved in inflammation and immunity and are believed to be important modulators of the host response to viral infection. Here, we have used small RNA sequencing (sRNA-Seq), qPCR and functional analyses to obtain the first comparative miRNA transcriptome (miRNome) of a human liver cell line (Huh7) infected with one of the following three EBOV strains: Mayinga (responsible for the first Zaire outbreak in 1976), Makona (responsible for the West Africa outbreak in 2013–2016) and the epizootic Reston (presumably innocuous to humans). Our results highlight specific miRNA-based immunity pathways and substantial differences between the strains beyond their clinical manifestation and pathogenicity. These analyses shed new light into the molecular signature of liver cells upon EBOV infection and reveal new insights into miRNA-based virus attack and host defense strategy.
Collapse
Affiliation(s)
- Idrissa Diallo
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Jeffrey Ho
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Benoit Laffont
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Jonathan Laugier
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Abderrahim Benmoussa
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Marine Lambert
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Zeinab Husseini
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
| | - Geoff Soule
- Special Pathogens Program, National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB R3B 3M9, Canada; (G.S.); (R.K.)
| | - Robert Kozak
- Special Pathogens Program, National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB R3B 3M9, Canada; (G.S.); (R.K.)
- Division of Microbiology, Department of Laboratory Medicine & Molecular Diagnostics, Sunnybrook Health Sciences Centre, Toronto, ON M4N 3M5, Canada
| | - Gary P. Kobinger
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
- Special Pathogens Program, National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB R3B 3M9, Canada; (G.S.); (R.K.)
- Département de Microbiologie Médicale, Université du Manitoba, Winnipeg, MB R3E 0J9, Canada
| | - Patrick Provost
- CHU de Québec Research Center, Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université Laval, Quebec, QC G1V 4G2, Canada; (I.D.); (J.H.); (B.L.); (J.L.); (A.B.); (M.L.); (Z.H.); (G.P.K.)
- CHUQ Research Center/CHUL Pavilion, 2705 Blvd Laurier, Room T1-65, Quebec, QC G1V 4G2, Canada
- Correspondence: ; Tel.: +1-418-525-4444 (ext. 48842)
| |
Collapse
|
27
|
Hynst J, Navrkalova V, Pal K, Pospisilova S. Bioinformatic strategies for the analysis of genomic aberrations detected by targeted NGS panels with clinical application. PeerJ 2021; 9:e10897. [PMID: 33850640 PMCID: PMC8019320 DOI: 10.7717/peerj.10897] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 01/13/2021] [Indexed: 01/21/2023] Open
Abstract
Molecular profiling of tumor samples has acquired importance in cancer research, but currently also plays an important role in the clinical management of cancer patients. Rapid identification of genomic aberrations improves diagnosis, prognosis and effective therapy selection. This can be attributed mainly to the development of next-generation sequencing (NGS) methods, especially targeted DNA panels. Such panels enable a relatively inexpensive and rapid analysis of various aberrations with clinical impact specific to particular diagnoses. In this review, we discuss the experimental approaches and bioinformatic strategies available for the development of an NGS panel for a reliable analysis of selected biomarkers. Compliance with defined analytical steps is crucial to ensure accurate and reproducible results. In addition, a careful validation procedure has to be performed before the application of NGS targeted assays in routine clinical practice. With more focus on bioinformatics, we emphasize the need for thorough pipeline validation and management in relation to the particular experimental setting as an integral part of the NGS method establishment. A robust and reproducible bioinformatic analysis running on powerful machines is essential for proper detection of genomic variants in clinical settings since distinguishing between experimental noise and real biological variants is fundamental. This review summarizes state-of-the-art bioinformatic solutions for careful detection of the SNV/Indels and CNVs for targeted sequencing resulting in translation of sequencing data into clinically relevant information. Finally, we share our experience with the development of a custom targeted NGS panel for an integrated analysis of biomarkers in lymphoproliferative disorders.
Collapse
Affiliation(s)
- Jakub Hynst
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic.,Department of Internal Medicine-Hematology and Oncology, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic.,Department of Medical Genetics and Genomics, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic
| | - Veronika Navrkalova
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic.,Department of Internal Medicine-Hematology and Oncology, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic
| | - Karol Pal
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic.,Department of Hematology, University Hospital Schleswig-Holstein, Kiel, Germany
| | - Sarka Pospisilova
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic.,Department of Internal Medicine-Hematology and Oncology, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic.,Department of Medical Genetics and Genomics, Faculty of Medicine and University Hospital Brno, Masaryk University, Brno, Czech Republic
| |
Collapse
|
28
|
Nodehi HM, Tabatabaiefar MA, Sehhati M. Selection of Optimal Bioinformatic Tools and Proper Reference for Reducing the Alignment Error in Targeted Sequencing Data. JOURNAL OF MEDICAL SIGNALS & SENSORS 2021; 11:37-44. [PMID: 34026589 PMCID: PMC8043119 DOI: 10.4103/jmss.jmss_7_20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 01/28/2020] [Accepted: 02/12/2020] [Indexed: 11/04/2022]
Abstract
Background Careful design in the primary steps of a next-generation sequencing study is critical for obtaining successful results in downstream analysis. Methods In this study, a framework is proposed to evaluate and improve the sequence mapping in targeted regions of the reference genome. In this regard, simulated short reads were produced from the coding regions of the human genome and mapped to a Customized Target-Based Reference (CTBR) by the alignment tools that have been introduced recently. The short reads produced by different sequencing technologies aligned to the standard genome and also CTBR with and without well-defined mutation types where the amount of unmapped and misaligned reads and runtime was measured for comparison. Results The results showed that the mapping accuracy of the reads generated from Illumina Hiseq2500 using Stampy as the alignment tool whenever the CTBR was used as reference was significantly better than other evaluated pipelines. Using CTBR for alignment significantly decreased the mapping error in comparison to other expanded or more limited references. While intentional mutations were imported in the reads, Stampy showed the minimum error of 1.67% using CTBR. However, the lowest error obtained by stampy too using whole genome and one chromosome as references was 3.78% and 20%, respectively. Maximum and minimum misalignment errors were observed on chromosome Y and 20, respectively. Conclusion Therefore using the proposed framework in a clinical targeted sequencing study may lead to predict the error and improve the performance of variant calling regarding the genomic regions targeted in a clinical study.
Collapse
Affiliation(s)
- Hannane Mohammadi Nodehi
- Department of Bioelectric and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammad Amin Tabatabaiefar
- Department of Medical Genetics, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.,Department of Bioinformatics, Medical Image and Signal Processing Research Center, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammadreza Sehhati
- Department of Bioelectric and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
29
|
Yik MHY, Lo YT, Lin X, Sun W, Chan TF, Shaw PC. Authentication of Hedyotis products by adaptor ligation-mediated PCR and metabarcoding. J Pharm Biomed Anal 2021; 196:113920. [PMID: 33549873 DOI: 10.1016/j.jpba.2021.113920] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 01/13/2021] [Accepted: 01/19/2021] [Indexed: 01/25/2023]
Abstract
DNA barcoding is a widely used tool for species identification and authentication. However, it may not be applicable to highly processed herbal products due to severe DNA fragmentation. The emergence of DNA metabarcoding provides an alternative way to solve the problem. In this study, we are the first to combine the use of adaptor ligation-mediated PCR method and metabarcoding to reveal species identities in herbal products. As an illustration, we applied the method on three Hedyotis herbal products collected from China and Thailand. Results showed that H. diffusa and H. corymbosa were present in the products which were consistent with their label claims. Our study indicated that the adaptor ligation-mediated PCR with metabarcoding approach is useful for authentication of highly processed herbal products.
Collapse
Affiliation(s)
- Mavis Hong-Yu Yik
- Li Dak Sum Yip Yio Chin R & D Center for Chinese Medicine, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China.
| | - Yat-Tung Lo
- Li Dak Sum Yip Yio Chin R & D Center for Chinese Medicine, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China; State Key Laboratory of Research on Bioactivities and Clinical Application of Medicinal Plants, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China.
| | - Xiao Lin
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China; Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China.
| | - Wei Sun
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China.
| | - Ting-Fung Chan
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China; Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China.
| | - Pang-Chui Shaw
- Li Dak Sum Yip Yio Chin R & D Center for Chinese Medicine, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China; State Key Laboratory of Research on Bioactivities and Clinical Application of Medicinal Plants, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China; School of Life Sciences, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China.
| |
Collapse
|
30
|
Hu H, Yuan Y, Bayer PE, Fernandez CT, Scheben A, Golicz AA, Edwards D. Legume Pangenome Construction Using an Iterative Mapping and Assembly Approach. Methods Mol Biol 2021; 2107:35-47. [PMID: 31893442 DOI: 10.1007/978-1-0716-0235-5_3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
A pangenome is a collection of genomic sequences found in the entire species rather than a single individual. It allows for comprehensive, species-wide characterization of genetic variations and mining of variable genes which may play important roles in phenotypes of interest. Recent advances in sequencing technologies have facilitated draft genome sequence construction and have made pangenome constructions feasible. Here, we present a reference genome-based iterative mapping and assembly method to construct a pangenome for a legume species.
Collapse
Affiliation(s)
- Haifei Hu
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Yuxuan Yuan
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Philipp E Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Cassandria T Fernandez
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Armin Scheben
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Agnieszka A Golicz
- Plant Molecular Biology and Biotechnology Laboratory, Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Melbourne, VIC, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia.
| |
Collapse
|
31
|
Bruno P, Calimeri F, Greco G. AIM in Medical Informatics. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_32-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
Resolving misalignment interference for NGS-based clinical diagnostics. Hum Genet 2020; 140:477-492. [PMID: 32915251 DOI: 10.1007/s00439-020-02216-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Accepted: 07/31/2020] [Indexed: 01/18/2023]
Abstract
Next-generation sequencing (NGS) is an incredibly useful tool for genetic disease diagnosis. However, the most commonly used bioinformatics methods for analyzing sequence reads insufficiently discriminate genomic regions with extensive sequence identity, such as gene families and pseudogenes, complicating diagnostics. This problem has been recognized for specific genes, including many involved in human disease, and diagnostic labs must perform additional costly steps to guarantee accurate diagnosis in these cases. Here we report a new data analysis method based on the comparison of read depth between highly homologous regions to identify misalignment. Analyzing six clinically important genes-CYP21A2, GBA, HBA1/2, PMS2, and SMN1-each exhibiting misalignment issues related to homology, we show that our technique can correctly identify potential misalignment events and be used to make appropriate calls. Combined with long-range PCR and/or MLPA orthogonal testing, our clinical laboratory can improve variant calling with minimal additional cost. We propose an accurate and cost-efficient NGS testing procedure that will benefit disease diagnostics, carrier screening, and research-based population studies.
Collapse
|
33
|
Arias-Giraldo LM, Muñoz M, Hernández C, Herrera G, Velásquez-Ortiz N, Cantillo-Barraza O, Urbano P, Cuervo A, Ramírez JD. Identification of blood-feeding sources in Panstrongylus, Psammolestes, Rhodnius and Triatoma using amplicon-based next-generation sequencing. Parasit Vectors 2020; 13:434. [PMID: 32867816 PMCID: PMC7457505 DOI: 10.1186/s13071-020-04310-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 08/24/2020] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Triatomines are hematophagous insects that play an important role as vectors of Trypanosoma cruzi, the causative agent of Chagas disease. These insects have adapted to multiple blood-feeding sources that can affect relevant aspects of their life-cycle and interactions, thereby influencing parasitic transmission dynamics. We conducted a characterization of the feeding sources of individuals from the primary circulating triatomine genera in Colombia using amplicon-based next-generation sequencing (NGS). METHODS We used 42 triatomines collected in different departments of Colombia. DNA was extracted from the gut. The presence of T. cruzi was identified using real-time PCR, and discrete typing units (DTUs) were determined by conventional PCR. For blood-feeding source identification, PCR products of the vertebrate 12S rRNA gene were obtained and sequenced by next-generation sequencing (NGS). Blood-meal sources were inferred using blastn against a curated reference dataset containing the 12S rRNA sequences belonging to vertebrates with a distribution in South America that represent a potential feeding source for triatomine bugs. Mean and median comparison tests were performed to evaluate differences in triatomine blood-feeding sources, infection state, and geographical regions. Lastly, the inverse Simpson's diversity index was calculated. RESULTS The overall frequency of T. cruzi infection was 83.3%. TcI was found as the most predominant DTU (65.7%). A total of 67 feeding sources were detected from the analyses of approximately 7 million reads. The predominant feeding source found was Homo sapiens (76.8%), followed by birds (10.5%), artiodactyls (4.4%), and non-human primates (3.9%). There were differences among numerous feeding sources of triatomines of different species. The diversity of feeding sources also differed depending on the presence of T. cruzi. CONCLUSIONS To the best of our knowledge, this is the first study to employ amplicon-based NGS of the 12S rRNA gene to depict blood-feeding sources of multiple triatomine species collected in different regions of Colombia. Our findings report a striking read diversity that has not been reported previously. This is a powerful approach to unravel transmission dynamics at microgeographical levels.
Collapse
Affiliation(s)
- Luisa M Arias-Giraldo
- Grupo de Investigaciones Microbiológicas-UR (GIMUR), Departamento de Biología, Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Marina Muñoz
- Grupo de Investigaciones Microbiológicas-UR (GIMUR), Departamento de Biología, Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Carolina Hernández
- Grupo de Investigaciones Microbiológicas-UR (GIMUR), Departamento de Biología, Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Giovanny Herrera
- Grupo de Investigaciones Microbiológicas-UR (GIMUR), Departamento de Biología, Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Natalia Velásquez-Ortiz
- Grupo de Investigaciones Microbiológicas-UR (GIMUR), Departamento de Biología, Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia
| | - Omar Cantillo-Barraza
- Grupo de Biología y Control de Enfermedades Infecciosas, Universidad de Antioquia, Medellín, Colombia
| | - Plutarco Urbano
- Grupo de Investigaciones Biológicas de la Orinoquia, Fundación Universitaria Internacional del Trópico Americano (Unitropico), Yopal, Colombia
| | - Andrés Cuervo
- Secretaría Departamental de Salud de Arauca, Arauca, Colombia
| | - Juan David Ramírez
- Grupo de Investigaciones Microbiológicas-UR (GIMUR), Departamento de Biología, Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá, Colombia.
| |
Collapse
|
34
|
Krasnov GS, Pushkova EN, Novakovskiy RO, Kudryavtseva LP, Rozhmina TA, Dvorianinova EM, Povkhova LV, Kudryavtseva AV, Dmitriev AA, Melnikova NV. High-Quality Genome Assembly of Fusarium oxysporum f. sp. lini. Front Genet 2020; 11:959. [PMID: 33193577 PMCID: PMC7481384 DOI: 10.3389/fgene.2020.00959] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Accepted: 07/30/2020] [Indexed: 12/31/2022] Open
Abstract
In the present work, a highly pathogenic isolate of Fusarium oxysporum f. sp. lini, which is the most harmful pathogen affecting flax (Linum usitatissimum L.), was sequenced for the first time. To achieve a high-quality genome assembly, we used the combination of two sequencing platforms - Oxford Nanopore Technologies (MinION system) with long noisy reads and Illumina (HiSeq 2500 instrument) with short accurate reads. Given the quality of DNA is crucial for Nanopore sequencing, we developed the protocol for extraction of pure high-molecular-weight DNA from fungi. Sequencing of DNA extracted using this protocol allowed us to obtain about 85x genome coverage with long (N50 = 29 kb) MinION reads and 30x coverage with 2 × 250 bp HiSeq reads. Several tools were developed for genome assembly; however, they provide different results depending on genome complexity, sequencing data volume, read length and quality. We benchmarked the most requested assemblers (Canu, Flye, Shasta, wtdbg2, and MaSuRCA), Nanopore polishers (Medaka and Racon), and Illumina polishers (Pilon and POLCA) on our sequencing data. The assembly performed with Canu and polished with Medaka and POLCA was considered the most full and accurate. After further elimination of redundant contigs using Purge Haplotigs, we achieved a high-quality genome of F. oxysporum f. sp. lini with a total length of 59 Mb, N50 of 3.3 Mb, and 99.5% completeness according to BUSCO. We also obtained a complete circular mitochondrial genome with a length of 38.7 kb. The achieved assembly expands studies on F. oxysporum and plant-pathogen interaction in flax.
Collapse
Affiliation(s)
- George S. Krasnov
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Elena N. Pushkova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Roman O. Novakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | | | - Tatiana A. Rozhmina
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Federal Research Center for Bast Fiber Crops, Torzhok, Russia
| | - Ekaterina M. Dvorianinova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Liubov V. Povkhova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Anna V. Kudryavtseva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Alexey A. Dmitriev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Nataliya V. Melnikova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
35
|
Poterico JA, Mestanza O. Response to comment on "genetic variants and source of introduction of SARS-CoV-2 in South America". J Med Virol 2020; 93:25-27. [PMID: 32716059 DOI: 10.1002/jmv.26359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 07/24/2020] [Indexed: 12/31/2022]
Abstract
During a pandemic, science needs data to generate helpful evidence, and researchers assume this responsibility despite the risk of potential bias. This is the response to the comment made by Pedro Romero, who argued that our manuscript did not use reassembling and mapping strategies for corroborating mutations, and lacked bootstrap support in the phylogenetic analysis.
Collapse
Affiliation(s)
- Julio A Poterico
- Genetics Service, Instituto Nacional de Salud del Niño-San Borja (INSN-SB), Lima, Peru
| | - Orson Mestanza
- Genetics Service, Instituto Nacional de Salud del Niño-San Borja (INSN-SB), Lima, Peru
| |
Collapse
|
36
|
Miossec MJ, Valenzuela SL, Pérez-Losada M, Johnson WE, Crandall KA, Castro-Nallar E. Evaluation of computational methods for human microbiome analysis using simulated data. PeerJ 2020; 8:e9688. [PMID: 32864214 PMCID: PMC7427543 DOI: 10.7717/peerj.9688] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 07/18/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Our understanding of the composition, function, and health implications of human microbiota has been advanced by high-throughput sequencing and the development of new genomic analyses. However, trade-offs among alternative strategies for the acquisition and analysis of sequence data remain understudied. METHODS We assessed eight popular taxonomic profiling pipelines; MetaPhlAn2, metaMix, PathoScope 2.0, Sigma, Kraken, ConStrains, Centrifuge and Taxator-tk, against a battery of metagenomic datasets simulated from real data. The metagenomic datasets were modeled on 426 complete or permanent draft genomes stored in the Human Oral Microbiome Database and were designed to simulate various experimental conditions, both in the design of a putative experiment; read length (75-1,000 bp reads), sequence depth (100K-10M), and in metagenomic composition; number of species present (10, 100, 426), species distribution. The sensitivity and specificity of each of the pipelines under various scenarios were measured. We also estimated the relative root mean square error and average relative error to assess the abundance estimates produced by different methods. Additional datasets were generated for five of the pipelines to simulate the presence within a metagenome of an unreferenced species, closely related to other referenced species. Additional datasets were also generated in order to measure computational time on datasets of ever-increasing sequencing depth (up to 6 × 107). RESULTS Testing of eight pipelines against 144 simulated metagenomic datasets initially produced 1,104 discrete results. Pipelines using a marker gene strategy; MetaPhlAn2 and ConStrains, were overall less sensitive, than other pipelines; with the notable exception of Taxator-tk. This difference in sensitivity was largely made up in terms of runtime, significantly lower than more sensitive pipelines that rely on whole-genome alignments such as PathoScope2.0. However, pipelines that used strategies to speed-up alignment between genomic references and metagenomic reads, such as kmerization, were able to combine both high sensitivity and low run time, as is the case with Kraken and Centrifuge. Absent species genomes in the database mostly led to assignment of reads to the most closely related species available in all pipelines. Our results therefore suggest that taxonomic profilers that use kmerization have largely superseded those that use gene markers, coupling low run times with high sensitivity and specificity. Taxonomic profilers using more time-consuming read reassignment, such as PathoScope 2.0, provided the most sensitive profiles under common metagenomic sequencing scenarios. All the results described and discussed in this paper can be visualized using the dedicated R Shiny application (https://github.com/microgenomics/HumanMicrobiomeAnalysis). All of our datasets, pipelines and results are made available through the GitHub repository for future benchmarking.
Collapse
Affiliation(s)
- Matthieu J. Miossec
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Universidad Andres Bello, Santiago, Chile
| | - Sandro L. Valenzuela
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Universidad Andres Bello, Santiago, Chile
| | - Marcos Pérez-Losada
- Computational Biology Institute and Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA
| | - W. Evan Johnson
- Section of Computational Biomedicine, Department of Medicine, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - Keith A. Crandall
- Computational Biology Institute and Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA
| | - Eduardo Castro-Nallar
- Center for Bioinformatics and Integrative Biology, Facultad de Ciencias de la Vida, Universidad Andres Bello, Santiago, Chile
- Computational Biology Institute and Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, The George Washington University, Washington, DC, USA
| |
Collapse
|
37
|
Yang J, Wan W, Xie M, Mao J, Dong Z, Lu S, He J, Xie F, Liu G, Dai X, Chang Z, Zhao R, Zhang R, Wang S, Zhang Y, Zhang W, Wang W, Li X. Chromosome‐level reference genome assembly and gene editing of the dead‐leaf butterfly
Kallima inachus. Mol Ecol Resour 2020; 20:1080-1092. [DOI: 10.1111/1755-0998.13185] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 04/30/2020] [Accepted: 05/05/2020] [Indexed: 01/26/2023]
Affiliation(s)
- Jie Yang
- School of Ecology and Environment Northwestern Polytechnical University Xi'an China
| | - Wenting Wan
- School of Ecology and Environment Northwestern Polytechnical University Xi'an China
- State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology Chinese Academy of Sciences Kunming China
| | - Meng Xie
- State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology Chinese Academy of Sciences Kunming China
- College of Life Sciences Sichuan Agricultural University Yaan China
| | - Junlai Mao
- School of Marine Science and Technology Zhejiang Ocean University Zhoushan China
| | - Zhiwei Dong
- State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology Chinese Academy of Sciences Kunming China
| | - Sihan Lu
- School of Ecology and Environment Northwestern Polytechnical University Xi'an China
| | - Jinwu He
- School of Ecology and Environment Northwestern Polytechnical University Xi'an China
- State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology Chinese Academy of Sciences Kunming China
| | - Feiang Xie
- School of Marine Science and Technology Zhejiang Ocean University Zhoushan China
| | - Guichun Liu
- School of Ecology and Environment Northwestern Polytechnical University Xi'an China
- State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology Chinese Academy of Sciences Kunming China
| | - Xuelei Dai
- Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province College of Animal Science and Technology Northwest A&F University Yangling China
| | - Zhou Chang
- State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology Chinese Academy of Sciences Kunming China
| | - Ruoping Zhao
- State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology Chinese Academy of Sciences Kunming China
| | - Ru Zhang
- School of Ecology and Environment Northwestern Polytechnical University Xi'an China
| | - Shuting Wang
- Peking‐Tsinghua Center for Life Sciences Peking University Beijing China
| | - Yiming Zhang
- Peking‐Tsinghua Center for Life Sciences Peking University Beijing China
| | - Wei Zhang
- State Key Laboratory of Protein and Plant Gene Research Peking‐Tsinghua Center for Life Sciences and School of Life Sciences Peking University Beijing China
| | - Wen Wang
- School of Ecology and Environment Northwestern Polytechnical University Xi'an China
- State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology Chinese Academy of Sciences Kunming China
- Center for Excellence in Animal Evolution and Genetics Kunming China
| | - Xueyan Li
- State Key Laboratory of Genetic Resources and Evolution Kunming Institute of Zoology Chinese Academy of Sciences Kunming China
| |
Collapse
|
38
|
Lachmann A, Clarke DJB, Torre D, Xie Z, Ma'ayan A. Interoperable RNA-Seq analysis in the cloud. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2020; 1863:194521. [PMID: 32156561 DOI: 10.1016/j.bbagrm.2020.194521] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2019] [Revised: 03/01/2020] [Accepted: 03/01/2020] [Indexed: 12/25/2022]
Abstract
RNA-Sequencing (RNA-Seq) is currently the leading technology for genome-wide transcript quantification. Mapping the raw reads to transcript and gene level counts can be achieved by different aligners. Here we report an in-depth comparison of transcript quantification methods. Our goal is the specific use of cost-efficient RNA-Seq analysis for deployment in a cloud infrastructure composed of interacting microservices. The individual modules cover file transfer into the cloud and APIs to handle the cloud alignment jobs. We next demonstrate how newly generated RNA-Seq data can be placed in the context of thousands of previously published datasets in near real time. With in-depth benchmarks, we identify suitable gene count quantification methods to facilitate cost-effective, accurate, and cloud-based RNA-Seq analysis service. Pseudo-alignment algorithms such as kallisto and Salmon combine high read quality estimation with cost efficient runtime performance. HISAT2 is the fastest of the classical aligners with good alignment quality. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Alexander Lachmann
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA.
| | - Daniel J B Clarke
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| | - Denis Torre
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| | - Zhuorui Xie
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA
| | - Avi Ma'ayan
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1603, New York, NY 10029, USA; Library of Integrated Network-based Cellular Signatures, Data Coordination and Integration Center (BD2K-LINCS DCIC), USA; Knowledge Management Center for Illuminating the Druggable Genome (KMC-IDG), USA
| |
Collapse
|
39
|
Hernandez-Lopez AA, Alberti C, Mattavelli M. Toward a Dynamic Threshold for Quality Score Distortion in Reference-Based Alignment. J Comput Biol 2020; 27:288-300. [PMID: 31891532 DOI: 10.1089/cmb.2019.0333] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
The intrinsic high-entropy sequence metadata, known as quality scores, is largely the cause of the substantial size of sequence data files. Yet, there is no consensus on a viable reduction of the resolution of the quality score scale, arguably because of collateral side effects. In this article, we leverage on the penalty functions of HISAT2 aligner to rebin the quality score scale in such a way as to avoid any impact on sequence alignment, identifying alongside a distortion threshold for "safe" quality score representation. We tested our findings on whole-genome and RNA-seq data, and contrasted the results with three methods for lossy compression of the quality scores.
Collapse
Affiliation(s)
| | | | - Marco Mattavelli
- École Polytechnique Fédérale de Lausanne, EPFL, Lausanne, Switzerland
| |
Collapse
|
40
|
Alzaid E, Allali AE. PostSV: A Post-Processing Approach for Filtering Structural Variations. Bioinform Biol Insights 2020; 14:1177932219892957. [PMID: 32009779 PMCID: PMC6974750 DOI: 10.1177/1177932219892957] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Accepted: 11/09/2019] [Indexed: 11/25/2022] Open
Abstract
Genomic structural variations are significant causes of genome diversity and
complex diseases. With advances in sequencing technologies, many algorithms have
been designed to identify structural differences using next-generation
sequencing (NGS) data. Due to repetitions in the human genome and the short
reads produced by NGS, the discovery of structural variants (SVs) by
state-of-the-art SV callers is not always accurate. To improve performance,
multiple SV callers are often used to detect variants. However, most SV callers
suffer from high false-positive rates, which diminishes the overall performance,
especially in low-coverage genomes. In this article, we propose a
post-processing classification–based algorithm that can be used to filter
structural variation predictions produced by SV callers. Novel features are
defined from putative SV predictions using reads at the local regions around the
breakpoints. Several classifiers are employed to classify the candidate
predictions and remove false positives. We test our classifier models on
simulated and real genomes and show that the proposed approach improves the
performance of state-of-the-art algorithms.
Collapse
Affiliation(s)
- Eman Alzaid
- Computer Science Department, King Saud University, Riyadh, Saudi Arabia.,Department of Computer Science, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia
| | - Achraf El Allali
- Computer Science Department, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
41
|
Wu X, Heffelfinger C, Zhao H, Dellaporta SL. Benchmarking variant identification tools for plant diversity discovery. BMC Genomics 2019; 20:701. [PMID: 31500583 PMCID: PMC6734213 DOI: 10.1186/s12864-019-6057-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 08/22/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The ability to accurately and comprehensively identify genomic variations is critical for plant studies utilizing high-throughput sequencing. Most bioinformatics tools for processing next-generation sequencing data were originally developed and tested in human studies, raising questions as to their efficacy for plant research. A detailed evaluation of the entire variant calling pipeline, including alignment, variant calling, variant filtering, and imputation was performed on different programs using both simulated and real plant genomic datasets. RESULTS A comparison of SOAP2, Bowtie2, and BWA-MEM found that BWA-MEM was consistently able to align the most reads with high accuracy, whereas Bowtie2 had the highest overall accuracy. Comparative results of GATK HaplotypCaller versus SAMtools mpileup indicated that the choice of variant caller affected precision and recall differentially depending on the levels of diversity, sequence coverage and genome complexity. A cross-reference experiment of S. lycopersicum and S. pennellii reference genomes revealed the inadequacy of single reference genome for variant discovery that includes distantly-related plant individuals. Machine-learning-based variant filtering strategy outperformed the traditional hard-cutoff strategy resulting in higher number of true positive variants and fewer false positive variants. A 2-step imputation method, which utilized a set of high-confidence SNPs as the reference panel, showed up to 60% higher accuracy than direct LD-based imputation. CONCLUSIONS Programs in the variant discovery pipeline have different performance on plant genomic dataset. Choice of the programs is subjected to the goal of the study and available resources. This study serves as an important guiding information for plant biologists utilizing next-generation sequencing data for diversity characterization and crop improvement.
Collapse
Affiliation(s)
- Xing Wu
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, 06520-8104, USA
| | - Christopher Heffelfinger
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, 06520-8104, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, 06520-8034, USA
| | - Stephen L Dellaporta
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, 06520-8104, USA.
| |
Collapse
|
42
|
Jung H, Winefield C, Bombarely A, Prentis P, Waterhouse P. Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes. TRENDS IN PLANT SCIENCE 2019; 24:700-724. [PMID: 31208890 DOI: 10.1016/j.tplants.2019.05.003] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Revised: 05/01/2019] [Accepted: 05/10/2019] [Indexed: 05/16/2023]
Abstract
The commercial release of third-generation sequencing technologies (TGSTs), giving long and ultra-long sequencing reads, has stimulated the development of new tools for assembling highly contiguous genome sequences with unprecedented accuracy across complex repeat regions. We survey here a wide range of emerging sequencing platforms and analytical tools for de novo assembly, provide background information for each of their steps, and discuss the spectrum of available options. Our decision tree recommends workflows for the generation of a high-quality genome assembly when used in combination with the specific needs and resources of a project.
Collapse
Affiliation(s)
- Hyungtaek Jung
- Centre for Tropical Crops and Biocommodities, Queensland University of Technology, Brisbane, QLD 4001, Australia.
| | - Christopher Winefield
- Department of Wine, Food, and Molecular Biosciences, Lincoln University, 7647 Christchurch, New Zealand
| | - Aureliano Bombarely
- Department of Bioscience, University of Milan, Milan 20133, Italy; School of Plants and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Peter Prentis
- School of Earth, Environmental, and Biological Sciences, Queensland University of Technology, Brisbane, QLD, 4001, Australia
| | - Peter Waterhouse
- Centre for Tropical Crops and Biocommodities, Queensland University of Technology, Brisbane, QLD 4001, Australia; School of Biological Sciences, University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
43
|
Babarinde IA, Li Y, Hutchins AP. Computational Methods for Mapping, Assembly and Quantification for Coding and Non-coding Transcripts. Comput Struct Biotechnol J 2019; 17:628-637. [PMID: 31193391 PMCID: PMC6526290 DOI: 10.1016/j.csbj.2019.04.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 04/24/2019] [Accepted: 04/29/2019] [Indexed: 12/17/2022] Open
Abstract
The measurement of gene expression has long provided significant insight into biological functions. The development of high-throughput short-read sequencing technology has revealed transcriptional complexity at an unprecedented scale, and informed almost all areas of biology. However, as researchers have sought to gather more insights from the data, these new technologies have also increased the computational analysis burden. In this review, we describe typical computational pipelines for RNA-Seq analysis and discuss their strengths and weaknesses for the assembly, quantification and analysis of coding and non-coding RNAs. We also discuss the assembly of transposable elements into transcripts, and the difficulty these repetitive elements pose. In summary, RNA-Seq is a powerful technology that is likely to remain a key asset in the biologist's toolkit.
Collapse
Affiliation(s)
| | | | - Andrew P. Hutchins
- Department of Biology, Southern University of Science and Technology, 1088 Xueyuan Lu, Shenzhen, China
| |
Collapse
|
44
|
Ode H, Kobayashi A, Matsuda M, Hachiya A, Imahashi M, Yokomaku Y, Iwatani Y. Identifying integration sites of the HIV-1 genome with intact and aberrant ends through deep sequencing. J Virol Methods 2019; 267:59-65. [PMID: 30857886 DOI: 10.1016/j.jviromet.2019.03.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 02/05/2019] [Accepted: 03/08/2019] [Indexed: 02/05/2023]
Abstract
Paired-end deep sequencing is a powerful tool to investigate integration sites of the HIV-1 genome in infected cells. Integration sites of HIV-1 proviral DNA carrying intact LTR ends have been well documented. In contrast, integration sites of proviral DNA with aberrant ends, which emerge infrequently but can also induce replication-competent viruses, have not been extensively examined, in part, because of the lack of a suitable bioinformatics method for deep sequencing. Here, we report a novel bioinformatics protocol, named the VINSSRM, to search for integration sites of proviral DNA carrying intact and aberrant LTR ends using paired-end deep sequencing data. The protocol incorporates split-read mapping to assign viral and human genome parts within read sequences and overlapping paired-end read merging to construct long error-corrected sequences. The VINSSRM not only consistently detects integration sites similar to the conventional method but also provides information on additional integration sites, including those of proviral DNA with aberrant ends, which were mainly found in non-exonic regions of the human genome. Therefore, the VINSSRM may help us to understand HIV-1 integration, persistence of infected cells, and viral latency.
Collapse
Affiliation(s)
- Hirotaka Ode
- Clinical Research Center, National Hospital Organization Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan.
| | - Ayumi Kobayashi
- Clinical Research Center, National Hospital Organization Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan; Program in Integrated Molecular Medicine, Nagoya University Graduate School of Medicine, Nagoya, Aichi, 466-8550, Japan
| | - Masakazu Matsuda
- Clinical Research Center, National Hospital Organization Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan
| | - Atsuko Hachiya
- Clinical Research Center, National Hospital Organization Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan
| | - Mayumi Imahashi
- Clinical Research Center, National Hospital Organization Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan
| | - Yoshiyuki Yokomaku
- Clinical Research Center, National Hospital Organization Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan
| | - Yasumasa Iwatani
- Clinical Research Center, National Hospital Organization Nagoya Medical Center, Nagoya, Aichi, 460-0001, Japan; Program in Integrated Molecular Medicine, Nagoya University Graduate School of Medicine, Nagoya, Aichi, 466-8550, Japan
| |
Collapse
|
45
|
Payá-Milans M, Olmstead JW, Nunez G, Rinehart TA, Staton M. Comprehensive evaluation of RNA-seq analysis pipelines in diploid and polyploid species. Gigascience 2018; 7:5168871. [PMID: 30418578 PMCID: PMC6275443 DOI: 10.1093/gigascience/giy132] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 10/21/2018] [Indexed: 11/12/2022] Open
Abstract
Background The usual analysis of RNA sequencing (RNA-seq) reads is based on an existing reference genome and annotated gene models. However, when a reference for the sequenced species is not available, alternatives include using a reference genome from a related species or reconstructing transcript sequences with de novo assembly. In addition, researchers are faced with many options for RNA-seq data processing and limited information on how their decisions will impact the final outcome. Using both a diploid and polyploid species with a distant reference genome, we have tested the influence of different tools at various steps of a typical RNA-seq analysis workflow on the recovery of useful processed data available for downstream analysis. Findings At the preprocessing step, we found error correction has a strong influence on de novo assembly but not on mapping results. After trimming, a greater percentage of reads could be used in downstream analysis by selecting gentle quality trimming performed with Skewer instead of strict quality trimming with Trimmomatic. This availability of reads correlated with size, quality, and completeness of de novo assemblies and with number of mapped reads. When selecting a reference genome from a related species to map reads, outcome was significantly improved when using mapping software tolerant of greater sequence divergence, such as Stampy or GSNAP. Conclusions The selection of bioinformatic software tools for RNA-seq data analysis can maximize quality parameters on de novo assemblies and availability of reads in downstream analysis.
Collapse
Affiliation(s)
- Miriam Payá-Milans
- Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States
| | - James W Olmstead
- Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States
| | - Gerardo Nunez
- Horticultural Sciences Department, University of Florida, 2550 Hull Rd, PO Box 110690, Gainesville, FL, 32611, United States
| | - Timothy A Rinehart
- Thad Cochran Southern Horticultural Laboratory, USDA-Agricultural Research Service, PO Box 287, Poplarville, MS, 39470, United States.,Crop Production and Protection, USDA-Agricultural Research Service, 5601 Sunnyside Ave, Beltsville, MD, 20705, United States
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, 370 PBB, 2505 EJ Chapman Blvd, Knoxville, TN, 37996, United States
| |
Collapse
|
46
|
Comparative evaluation of cDNA library construction approaches for RNA-Seq analysis from low RNA-content human specimens. J Microbiol Methods 2018; 154:55-62. [PMID: 30332617 DOI: 10.1016/j.mimet.2018.10.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 10/12/2018] [Accepted: 10/13/2018] [Indexed: 02/08/2023]
Abstract
With the emergence of RNA sequencing technologies, metatranscriptomic studies are rapidly gaining attention as they simultaneously provide insight into gene expression profiles and therefore disease association pathways of microbial pathogens and their hosts. This approach, therefore, holds promise for applicability in infectious disease diagnostics. A challenge of this approach in the clinical setting is the low amount and quality of RNA, especially microbial RNA in most clinically-infected specimens. Here, we compared two commercially available stranded cDNA library preparation kits, the NuGEN Ovation SoLo RNA-Seq System and the Illumina TruSeq Stranded Total RNA, using RNA extracted from synovial and sonicate fluids from a subject with periprosthetic joint infection. The Ovation SoLo RNA-Seq System provided more useful transcriptomic data for the infecting bacterium, whereas the TruSeq Stranded Total RNA kit provided more useful human transcriptomic data.
Collapse
|
47
|
|
48
|
Overview of Trends in the Application of Metagenomic Techniques in the Analysis of Human Enteric Viral Diversity in Africa's Environmental Regimes. Viruses 2018; 10:v10080429. [PMID: 30110939 PMCID: PMC6115975 DOI: 10.3390/v10080429] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2018] [Revised: 08/03/2018] [Accepted: 08/10/2018] [Indexed: 12/19/2022] Open
Abstract
There has been an increase in the quest for metagenomics as an approach for the identification and study of the diversity of human viruses found in aquatic systems, both for their role as waterborne pathogens and as water quality indicators. In the last few years, environmental viral metagenomics has grown significantly and has enabled the identification, diversity and entire genome sequencing of viruses in environmental and clinical samples extensively. Prior to the arrival of metagenomics, traditional molecular procedures such as the polymerase chain reaction (PCR) and sequencing, were mostly used to identify and classify enteric viral species in different environmental milieu. After the advent of metagenomics, more detailed reports have emerged about the important waterborne viruses identified in wastewater treatment plant effluents and surface water. This paper provides a review of methods that have been used for the concentration, detection and identification of viral species from different environmental matrices. The review also takes into consideration where metagenomics has been explored in different African countries, as well as the limitations and challenges facing the approach. Procedures including sample processing, experimental design, sequencing technology, and bioinformatics analysis are discussed. The review concludes by summarising the current thinking and practices in the field and lays bare key issues that those venturing into this field need to consider and address.
Collapse
|
49
|
von Reumont BM. Studying Smaller and Neglected Organisms in Modern Evolutionary Venomics Implementing RNASeq (Transcriptomics)-A Critical Guide. Toxins (Basel) 2018; 10:toxins10070292. [PMID: 30012955 PMCID: PMC6070909 DOI: 10.3390/toxins10070292] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 07/06/2018] [Accepted: 07/13/2018] [Indexed: 12/20/2022] Open
Abstract
Venoms are evolutionary key adaptations that species employ for defense, predation or competition. However, the processes and forces that drive the evolution of venoms and their toxin components remain in many aspects understudied. In particular, the venoms of many smaller, neglected (mostly invertebrate) organisms are not characterized in detail, especially with modern methods. For the majority of these taxa, even their biology is only vaguely known. Modern evolutionary venomics addresses the question of how venoms evolve by applying a plethora of -omics methods. These recently became so sensitive and enhanced that smaller, neglected organisms are now more easily accessible to comparatively study their venoms. More knowledge about these taxa is essential to better understand venom evolution in general. The methodological core pillars of integrative evolutionary venomics are genomics, transcriptomics and proteomics, which are complemented by functional morphology and the field of protein synthesis and activity tests. This manuscript focuses on transcriptomics (or RNASeq) as one toolbox to describe venom evolution in smaller, neglected taxa. It provides a hands-on guide that discusses a generalized RNASeq workflow, which can be adapted, accordingly, to respective projects. For neglected and small taxa, generalized recommendations are difficult to give and conclusions need to be made individually from case to case. In the context of evolutionary venomics, this overview highlights critical points, but also promises of RNASeq analyses. Methodologically, these concern the impact of read processing, possible improvements by perfoming multiple and merged assemblies, and adequate quantification of expressed transcripts. Readers are guided to reappraise their hypotheses on venom evolution in smaller organisms and how robustly these are testable with the current transcriptomics toolbox. The complementary approach that combines particular proteomics but also genomics with transcriptomics is discussed as well. As recently shown, comparative proteomics is, for example, most important in preventing false positive identifications of possible toxin transcripts. Finally, future directions in transcriptomics, such as applying 3rd generation sequencing strategies to overcome difficulties by short read assemblies, are briefly addressed.
Collapse
Affiliation(s)
- Björn Marcus von Reumont
- Justus Liebig University of Giessen, Institute for Insect Biotechnology, Heinrich Buff Ring 58, 35392 Giessen, Germany.
- Natural History Museum, Department of Life Sciences, Cromwell Rd, London SW75BD, UK.
| |
Collapse
|
50
|
Lee H, Lee KW, Lee T, Park D, Chung J, Lee C, Park WY, Son DS. Performance evaluation method for read mapping tool in clinical panel sequencing. Genes Genomics 2017; 40:189-197. [PMID: 29568413 PMCID: PMC5846869 DOI: 10.1007/s13258-017-0621-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 10/11/2017] [Indexed: 01/28/2023]
Abstract
In addition to the rapid advancement in Next-Generation Sequencing (NGS) technology, clinical panel sequencing is being used increasingly in clinical studies and tests. However, tools that are used in NGS data analysis have not been comparatively evaluated in performance for panel sequencing. This study aimed to evaluate the tools used in the alignment process, the first procedure in bioinformatics analysis, by comparing tools that have been widely used with ones that have been introduced recently. With the accumulated panel sequencing data, detected variant lists were cataloged and inserted into simulated reads produced from the reference genome (h19). The amount of unmapped reads and misaligned reads, mapping quality distribution, and runtime were measured as standards for comparison. As the most widely used tools, Bowtie2 and BWA–MEM each showed explicit performance with AUC of 0.9984 and 0.9970 respectively. Kart, maintaining superior runtime and less number of misaligned read, also similarly possessed high level of AUC (0.9723). Such selection and optimization method of tools appropriate for panel sequencing can be utilized for fields requiring error minimization, such as clinical application and liquid biopsy studies.
Collapse
Affiliation(s)
- Hojun Lee
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea
| | - Ki-Wook Lee
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea.,2Department of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06351 South Korea
| | - Taeseob Lee
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea
| | - Donghyun Park
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea
| | - Jongsuk Chung
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea.,3Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon, 16419 South Korea
| | - Chung Lee
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea.,4Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, 06351 South Korea
| | - Woong-Yang Park
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea.,3Department of Molecular Cell Biology, Sungkyunkwan University School of Medicine, Suwon, 16419 South Korea.,4Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, 06351 South Korea
| | - Dae-Soon Son
- 1Samsung Genome Institute (SGI), Samsung Medical Center (SMC), Seoul, 06351 South Korea
| |
Collapse
|