Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Quinn EM, Cormican P, Kenny EM, Hill M, Anney R, Gill M, Corvin AP, Morris DW. Development of strategies for SNP detection in RNA-seq data: application to lymphoblastoid cell lines and evaluation using 1000 Genomes data. PLoS One 2013;8:e58815. [PMID: 23555596 PMCID: PMC3608647 DOI: 10.1371/journal.pone.0058815] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 02/07/2013] [Indexed: 11/24/2022] Open

For:	Quinn EM, Cormican P, Kenny EM, Hill M, Anney R, Gill M, Corvin AP, Morris DW. Development of strategies for SNP detection in RNA-seq data: application to lymphoblastoid cell lines and evaluation using 1000 Genomes data. PLoS One 2013;8:e58815. [PMID: 23555596 PMCID: PMC3608647 DOI: 10.1371/journal.pone.0058815] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Accepted: 02/07/2013] [Indexed: 11/24/2022] Open

Number

Cited by Other Article(s)

Chi WY, Hu Y, Huang HC, Kuo HH, Lin SH, Kuo CTJ, Tao J, Fan D, Huang YM, Wu AA, Hung CF, Wu TC. Molecular targets and strategies in the development of nucleic acid cancer vaccines: from shared to personalized antigens. J Biomed Sci 2024;31:94. [PMID: 39379923 PMCID: PMC11463125 DOI: 10.1186/s12929-024-01082-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 09/01/2024] [Indexed: 10/10/2024] Open

Affiliation(s)

Wei-Yu Chi Physiology, Biophysics and Systems Biology Graduate Program, Weill Cornell Medicine, New York, NY, USA
Yingying Hu Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Hsin-Che Huang Tri-Institutional PhD Program in Chemical Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Hui-Hsuan Kuo Pharmacology PhD Program, Weill Cornell Medicine, New York, NY, USA
Shu-Hong Lin Department of Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA The University of Texas Graduate School of Biomedical Sciences at Houston and MD Anderson Cancer Center, Houston, TX, USA
Chun-Tien Jimmy Kuo Division of Pharmaceutics and Pharmacology, College of Pharmacy, The Ohio State University, Columbus, OH, USA
Julia Tao Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA
Darrell Fan Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA
Yi-Min Huang Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA
Annie A Wu Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA
Chien-Fu Hung Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA Department of Oncology, Johns Hopkins School of Medicine, Baltimore, MD, USA Department of Obstetrics and Gynecology, Johns Hopkins School of Medicine, Baltimore, MD, USA
T-C Wu Department of Pathology, Johns Hopkins School of Medicine, 1550 Orleans St, CRB II Room 309, Baltimore, MD, 21287, USA. Department of Oncology, Johns Hopkins School of Medicine, Baltimore, MD, USA. Department of Obstetrics and Gynecology, Johns Hopkins School of Medicine, Baltimore, MD, USA. Department of Molecular Microbiology and Immunology, Bloomberg School of Public Health, Johns Hopkins School of Medicine, Baltimore, MD, USA.

Collapse

Premanand A, Shanmuga Priya M, Reena Rajkumari B. Genetic variants in androgenetic alopecia: insights from scalp RNA sequencing data. Arch Dermatol Res 2024;316:590. [PMID: 39215850 DOI: 10.1007/s00403-024-03351-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 08/03/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024]

Vigorito E, Barton A, Pitzalis C, Lewis MJ, Wallace C. BBmix: a Bayesian beta-binomial mixture model for accurate genotyping from RNA-sequencing. Bioinformatics 2023;39:btad393. [PMID: 37338536 PMCID: PMC10318392 DOI: 10.1093/bioinformatics/btad393] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 05/15/2023] [Accepted: 06/19/2023] [Indexed: 06/21/2023] Open

Nagi SC, Oruni A, Weetman D, Donnelly MJ. RNA-Seq-Pop: Exploiting the sequence in RNA sequencing-A Snakemake workflow reveals patterns of insecticide resistance in the malaria vector Anopheles gambiae. Mol Ecol Resour 2023;23:946-961. [PMID: 36695302 PMCID: PMC10568660 DOI: 10.1111/1755-0998.13759] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 11/12/2022] [Accepted: 01/06/2023] [Indexed: 01/26/2023]

Abstract

We provide a reproducible and scalable Snakemake workflow, called RNA-Seq-Pop, which provides end-to-end analysis of RNA sequencing data sets. The workflow allows the user to perform quality control, perform differential expression analyses and call genomic variants. Additional options include the calculation of allele frequencies of variants of interest, summaries of genetic variation and population structure, and genome-wide selection scans, together with clear visualizations. RNA-Seq-Pop is applicable to any organism, and we demonstrate the utility of the workflow by investigating pyrethroid resistance in selected strains of the major malaria mosquito, Anopheles gambiae. The workflow provides additional modules specifically for An. gambiae, including estimating recent ancestry and determining the karyotype of common chromosomal inversions. The Busia laboratory colony used for selections was collected in Busia, Uganda, in November 2018. We performed a comparative analysis of three groups: a parental G24 Busia strain; its deltamethrin-selected G28 offspring; and the susceptible reference strain Kisumu. Measures of genetic diversity reveal patterns consistent with that of laboratory colonization and selection, with the parental Busia strain exhibiting the highest nucleotide diversity, followed by the selected Busia offspring, and finally, Kisumu. Differential expression and variant analyses reveal that the selected Busia colony exhibits a number of distinct mechanisms of pyrethroid resistance, including the Vgsc-995S target-site mutation, upregulation of SAP genes, P450s and a cluster of carboxylesterases. During deltamethrin selections, the 2La chromosomal inversion rose in frequency (from 33% to 86%), supporting a previous link with pyrethroid resistance. RNA-Seq-Pop is hosted at: github.com/sanjaynagi/rna-seq-pop. We anticipate that the workflow will provide a useful tool to facilitate reproducible, transcriptomic studies in An. gambiae and other taxa.

Collapse

Huang P, Hameed R, Abbas M, Balooch S, Alharthi B, Du Y, Abbas A, Younas A, Du D. Integrated omic techniques and their genomic features for invasive weeds. Funct Integr Genomics 2023;23:44. [PMID: 36680630 DOI: 10.1007/s10142-023-00971-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/01/2023] [Accepted: 01/11/2023] [Indexed: 01/22/2023]

Long Q, Yuan Y, Li M. RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data. Front Genet 2022;13:865313. [PMID: 35846154 PMCID: PMC9279659 DOI: 10.3389/fgene.2022.865313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 05/17/2022] [Indexed: 11/13/2022] Open

Ma T, Li H, Zhang X. Discovering single-cell eQTLs from scRNA-seq data only. Gene 2022;829:146520. [PMID: 35452708 DOI: 10.1016/j.gene.2022.146520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 01/12/2022] [Accepted: 04/15/2022] [Indexed: 12/14/2022]

Whole genome re-sequencing reveals the genetic diversity and evolutionary patterns of Eucommia ulmoides. Mol Genet Genomics 2022;297:485-494. [PMID: 35146538 DOI: 10.1007/s00438-022-01864-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 01/23/2022] [Indexed: 10/19/2022]

Abstract

Eucommia ulmoides (E. ulmoides) is a deciduous perennial tree belonging to the order Garryales, and is known as "living fossil" plant, along with ginkgo (Ginkgo biloba), metaspaca (Metasequoia glyptostroboides) and dove tree (Davidia involucrata Baill). However, the genetic diversity and population structure of E. ulmoides are still ambiguous nowdays. In this study, we re-sequenced the genomes of 12 E. ulmoides accessions from different major climatic geography regions in China to elucidate the genetic diversity, population structure and evolutionary pattern. By integration of phylogenetic analysis, principal component analysis and population structure analysis based on a number of high-quality SNPs, a total of 12 E. ulmoides accessions were clustered into four different groups. This result is consistent with their geographical location except for group samples from Shanghai and Hunan province. E. ulmoides accessions from Hunan province exhibited a closer genetic relationship with E. ulmoides accessions from Shanghai in China compared with other regions, which is also supported by the result of population structure analyses. Genetic diversity analysis further revealed that E. ulmoides samples in Shanghai and Hunan province were with higher genetic diversity than those in other regions in this study. In addition, we treated the E. ulmoides materials from Shanghai and Hunan province as group A, and the other materials from other places as group B, and then analyzed the evolutionary pattern of E. ulmoides. The result showed the significant differentiation (Fst = 0.1545) between group A and group B. Some candidate highly divergent genome regions were identified in group A by selective sweep analyses, and the function analysis of candidate genes in these regions showed that biological regulation processes could be correlated with the Eu-rubber biosynthesis. Notably, nine genes were identified from selective sweep regions. They were involved in the Eu-rubber biosynthesis and expressed in rubber containing tissues. The genetic diversity research and evolution model of E. ulmoides were preliminarily explored in this study, which laid the foundation for the protection of germplasm resources and the development and utilization of multipurpose germplasm resources in the future.

Collapse

Gundogdu P, Loucera C, Alamo-Alvarez I, Dopazo J, Nepomuceno I. Integrating pathway knowledge with deep neural networks to reduce the dimensionality in single-cell RNA-seq data. BioData Min 2022;15:1. [PMID: 34980200 PMCID: PMC8722116 DOI: 10.1186/s13040-021-00285-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 12/04/2021] [Indexed: 11/13/2022] Open

Abstract

Background

Single-cell RNA sequencing (scRNA-seq) data provide valuable insights into cellular heterogeneity which is significantly improving the current knowledge on biology and human disease. One of the main applications of scRNA-seq data analysis is the identification of new cell types and cell states. Deep neural networks (DNNs) are among the best methods to address this problem. However, this performance comes with the trade-off for a lack of interpretability in the results. In this work we propose an intelligible pathway-driven neural network to correctly solve cell-type related problems at single-cell resolution while providing a biologically meaningful representation of the data.

Results

In this study, we explored the deep neural networks constrained by several types of prior biological information, e.g. signaling pathway information, as a way to reduce the dimensionality of the scRNA-seq data. We have tested the proposed biologically-based architectures on thousands of cells of human and mouse origin across a collection of public datasets in order to check the performance of the model. Specifically, we tested the architecture across different validation scenarios that try to mimic how unknown cell types are clustered by the DNN and how it correctly annotates cell types by querying a database in a retrieval problem. Moreover, our approach demonstrated to be comparable to other less interpretable DNN approaches constrained by using protein-protein interactions gene regulation data. Finally, we show how the latent structure learned by the network could be used to visualize and to interpret the composition of human single cell datasets.

Conclusions

Here we demonstrate how the integration of pathways, which convey fundamental information on functional relationships between genes, with DNNs, that provide an excellent classification framework, results in an excellent alternative to learn a biologically meaningful representation of scRNA-seq data. In addition, the introduction of prior biological knowledge in the DNN reduces the size of the network architecture. Comparative results demonstrate a superior performance of this approach with respect to other similar approaches. As an additional advantage, the use of pathways within the DNN structure enables easy interpretability of the results by connecting features to cell functionalities by means of the pathway nodes, as demonstrated with an example with human melanoma tumor cells.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13040-021-00285-4.

Collapse

He Y, Huang L, Tang Y, Yang Z, Han Z. Genome-wide Identification and Analysis of Splicing QTLs in Multiple Sclerosis by RNA-Seq Data. Front Genet 2021;12:769804. [PMID: 34868258 PMCID: PMC8633104 DOI: 10.3389/fgene.2021.769804] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/18/2021] [Indexed: 12/21/2022] Open

Subramaniam N, Nair R, Marsden PA. Epigenetic Regulation of the Vascular Endothelium by Angiogenic LncRNAs. Front Genet 2021;12:668313. [PMID: 34512715 PMCID: PMC8427604 DOI: 10.3389/fgene.2021.668313] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 05/17/2021] [Indexed: 12/15/2022] Open

Jehl F, Degalez F, Bernard M, Lecerf F, Lagoutte L, Désert C, Coulée M, Bouchez O, Leroux S, Abasht B, Tixier-Boichard M, Bed'hom B, Burlot T, Gourichon D, Bardou P, Acloque H, Foissac S, Djebali S, Giuffra E, Zerjal T, Pitel F, Klopp C, Lagarrigue S. RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species. Front Genet 2021;12:655707. [PMID: 34262593 PMCID: PMC8273700 DOI: 10.3389/fgene.2021.655707] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/01/2021] [Indexed: 12/19/2022] Open

Abstract

In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.

Collapse

Affiliation(s)

Frédéric Jehl INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
Fabien Degalez INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
Maria Bernard INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France.,INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
Frédéric Lecerf INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
Laetitia Lagoutte INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
Colette Désert INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
Manon Coulée INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
Olivier Bouchez INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
Sophie Leroux INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
Behnam Abasht Department of Animal and Food Sciences, University of Delaware, Newark, DE, United States
Michèle Tixier-Boichard INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
Bertrand Bed'hom INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
Thierry Burlot NOVOGEN, Maugueérand, Le Foeil, France
David Gourichon INRAE, PEAT UE, Nouzilly, France
Philippe Bardou INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France
Hervé Acloque INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
Sylvain Foissac INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
Sarah Djebali INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
Elisabetta Giuffra INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
Tatiana Zerjal INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
Frédérique Pitel INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
Christophe Klopp INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France
Sandrine Lagarrigue INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France

Collapse

Da Broi MG, Plaça JR, Silva WAD, Ferriani RA, Navarro PA. Screening of Variants in the Transcript Profile of Eutopic Endometrium from Infertile Women with Endometriosis during the Implantation Window. REVISTA BRASILEIRA DE GINECOLOGIA E OBSTETRÍCIA 2021;43:457-466. [PMID: 34318471 PMCID: PMC10411168 DOI: 10.1055/s-0041-1730287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 02/12/2021] [Indexed: 10/20/2022] Open

Abstract

OBJECTIVE

Abnormalities in the eutopic endometrium of women with endometriosis may be related to disease-associated infertility. Although previous RNA-sequencing analysis did not show differential expression in endometrial transcripts of endometriosis patients, other molecular alterations could impact protein synthesis and endometrial receptivity. Our aim was to screen for functional mutations in the transcripts of eutopic endometria of infertile women with endometriosis and controls during the implantation window.

METHODS

Data from RNA-Sequencing of endometrial biopsies collected during the implantation window from 17 patients (6 infertile women with endometriosis, 6 infertile controls, 5 fertile controls) were analyzed for variant discovery and identification of functional mutations. A targeted study of the alterations found was performed to understand the data into disease's context.

RESULTS

None of the variants identified was common to other samples within the same group, and no mutation was repeated among patients with endometriosis, infertile and fertile controls. In the endometriosis group, nine predicted deleterious mutations were identified, but only one was previously associated to a clinical condition with no endometrial impact. When crossing the mutated genes with the descriptors endometriosis and/or endometrium, the gene CMKLR1 was associated either with inflammatory response in endometriosis or with endometrial processes for pregnancy establishment.

CONCLUSION

Despite no pattern of mutation having been found, we ponder the small sample size and the analysis on RNA-sequencing data. Considering the purpose of the study of screening and the importance of the CMKLR1 gene on endometrial modulation, it could be a candidate gene for powered further studies evaluating mutations in eutopic endometria from endometriosis patients.

Collapse

Youssefian L, Saeidian AH, Palizban F, Bagherieh A, Abdollahimajd F, Sotoudeh S, Mozafari N, Farahani RA, Mahmoudi H, Babashah S, Zabihi M, Zeinali S, Fortina P, Salas-Alanis JC, South AP, Vahidnezhad H, Uitto J. Whole-Transcriptome Analysis by RNA Sequencing for Genetic Diagnosis of Mendelian Skin Disorders in the Context of Consanguinity. Clin Chem 2021;67:876-888. [PMID: 33969388 DOI: 10.1093/clinchem/hvab042] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 02/11/2021] [Indexed: 02/07/2023]

Affiliation(s)

Leila Youssefian Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, PA, USA Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA Genetics, Genomics and Cancer Biology PhD Program, Thomas Jefferson University, Philadelphia, PA, USA
Amir Hossein Saeidian Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, PA, USA Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA Genetics, Genomics and Cancer Biology PhD Program, Thomas Jefferson University, Philadelphia, PA, USA
Fahimeh Palizban Laboratory of Complex Biological Systems and Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
Atefeh Bagherieh Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
Fahimeh Abdollahimajd Skin Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Soheila Sotoudeh Department of Dermatology, Children's Medical Center, Center of Excellence, Tehran University of Medical Sciences, Tehran, Iran
Nikoo Mozafari Skin Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Rahele A Farahani Division of Nephrology and Hypertension, Mayo Clinic, Rochester, Minnesota, United States of America
Hamidreza Mahmoudi Department of Dermatology, Razi Hospital, Tehran University of Medical Sciences, Tehran, Iran
Sadegh Babashah Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
Masoud Zabihi Kawsar Human Genetics Research Center, Tehran, Iran
Sirous Zeinali Kawsar Human Genetics Research Center, Tehran, Iran
Paolo Fortina Cancer Genomics and Bioinformatics, Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, USA Department of Translation and Precision Medicine, Sapienza University, Rome, Italy
Julio C Salas-Alanis Dystrophic Epidermolysis Bullosa Research Association, Mexico
Andrew P South Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA
Hassan Vahidnezhad Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, PA, USA Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA
Jouni Uitto Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, PA, USA Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA

Collapse

Integration of SNP Disease Association, eQTL, and Enrichment Analyses to Identify Risk SNPs and Susceptibility Genes in Chronic Obstructive Pulmonary Disease. BIOMED RESEARCH INTERNATIONAL 2020;2020:3854196. [PMID: 33457407 PMCID: PMC7785362 DOI: 10.1155/2020/3854196] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 12/09/2020] [Accepted: 12/15/2020] [Indexed: 12/14/2022]

Quaglieri A, Flensburg C, Speed TP, Majewski IJ. Finding a suitable library size to call variants in RNA-Seq. BMC Bioinformatics 2020;21:553. [PMID: 33261552 PMCID: PMC7708150 DOI: 10.1186/s12859-020-03860-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 11/03/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

RNA sequencing allows the study of both gene expression changes and transcribed mutations, providing a highly effective way to gain insight into cancer biology. When planning the sequencing of a large cohort of samples, library size is a fundamental factor affecting both the overall cost and the quality of the results. Here we specifically address how overall library size influences the detection of somatic mutations in RNA-seq data in two acute myeloid leukaemia datasets. RESULTS : We simulated shallower sequencing depths by downsampling 45 acute myeloid leukaemia samples (100 bp PE) that are part of the Leucegene project, which were originally sequenced at high depth. We compared the sensitivity of six methods of recovering validated mutations on the same samples. The methods compared are a combination of three popular callers (MuTect, VarScan, and VarDict) and two filtering strategies. We observed an incremental loss in sensitivity when simulating libraries of 80M, 50M, 40M, 30M and 20M fragments, with the largest loss detected with less than 30M fragments (below 90%, average loss of 7%). The sensitivity in recovering insertions and deletions varied markedly between callers, with VarDict showing the highest sensitivity (60%). Single nucleotide variant sensitivity is relatively consistent across methods, apart from MuTect, whose default filters need adjustment when using RNA-Seq. We also analysed 136 RNA-Seq samples from the TCGA-LAML cohort (50 bp PE) and assessed the change in sensitivity between the initial libraries (average 59M fragments) and after downsampling to 40M fragments. When considering single nucleotide variants in recurrently mutated myeloid genes we found a comparable performance, with a 6% average loss in sensitivity using 40M fragments.

CONCLUSIONS

Between 30M and 40M 100 bp PE reads are needed to recover 90-95% of the initial variants on recurrently mutated myeloid genes. To extend this result to another cancer type, an exploration of the characteristics of its mutations and gene expression patterns is suggested.

Collapse

Lam S, Zeidan J, Miglior F, Suárez-Vega A, Gómez-Redondo I, Fonseca PAS, Guan LL, Waters S, Cánovas A. Development and comparison of RNA-sequencing pipelines for more accurate SNP identification: practical example of functional SNP detection associated with feed efficiency in Nellore beef cattle. BMC Genomics 2020;21:703. [PMID: 33032519 PMCID: PMC7545862 DOI: 10.1186/s12864-020-07107-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 09/28/2020] [Indexed: 12/14/2022] Open

Abstract

Background

Optimization of an RNA-Sequencing (RNA-Seq) pipeline is critical to maximize power and accuracy to identify genetic variants, including SNPs, which may serve as genetic markers to select for feed efficiency, leading to economic benefits for beef production. This study used RNA-Seq data (GEO Accession ID: PRJEB7696 and PRJEB15314) from muscle and liver tissue, respectively, from 12 Nellore beef steers selected from 585 steers with residual feed intake measures (RFI; n = 6 low-RFI, n = 6 high-RFI). Three RNA-Seq pipelines were compared including multi-sample calling from i) non-merged samples; ii) merged samples by RFI group, iii) merged samples by RFI and tissue group. The RNA-Seq reads were aligned against the UMD3.1 bovine reference genome (release 94) assembly using STAR aligner. Variants were called using BCFtools and variant effect prediction (VeP) and functional annotation (ToppGene) analyses were performed.

Results

On average, total reads detected for Approach i) non-merged samples for liver and muscle, were 18,362,086.3 and 35,645,898.7, respectively. For Approach ii), merging samples by RFI group, total reads detected for each merged group was 162,030,705, and for Approach iii), merging samples by RFI group and tissues, was 324,061,410, revealing the highest read depth for Approach iii). Additionally, Approach iii) merging samples by RFI group and tissues, revealed the highest read depth per variant coverage (572.59 ± 3993.11) and encompassed the majority of localized positional genes detected by each approach. This suggests Approach iii) had optimized detection power, read depth, and accuracy of SNP calling, therefore increasing confidence of variant detection and reducing false positive detection. Approach iii) was then used to detect unique SNPs fixed within low- (12,145) and high-RFI (14,663) groups. Functional annotation of SNPs revealed positional candidate genes, for each RFI group (2886 for low-RFI, 3075 for high-RFI), which were significantly (P < 0.05) associated with immune and metabolic pathways.

Conclusion

The most optimized RNA-Seq pipeline allowed for more accurate identification of SNPs, associated positional candidate genes, and significantly associated metabolic pathways in muscle and liver tissues, providing insight on the underlying genetic architecture of feed efficiency in beef cattle.

Collapse

Sun YM, Chen YQ. Principles and innovative technologies for decrypting noncoding RNAs: from discovery and functional prediction to clinical application. J Hematol Oncol 2020;13:109. [PMID: 32778133 PMCID: PMC7416809 DOI: 10.1186/s13045-020-00945-8] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 07/27/2020] [Indexed: 12/20/2022] Open

Serin Harmanci A, Harmanci AO, Zhou X. CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data. Nat Commun 2020;11:89. [PMID: 31900397 PMCID: PMC6941987 DOI: 10.1038/s41467-019-13779-x] [Citation(s) in RCA: 86] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 11/25/2019] [Indexed: 12/15/2022] Open

Dharshini SAP, Taguchi YH, Gromiha MM. Identifying suitable tools for variant detection and differential gene expression using RNA-seq data. Genomics 2019;112:2166-2172. [PMID: 31862361 DOI: 10.1016/j.ygeno.2019.12.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 11/25/2019] [Accepted: 12/16/2019] [Indexed: 12/27/2022]

Abstract

Neurodegenerative diseases are the most predominate brain disorders around the globe and the affected populations are rapidly increasing. Recently, these diseases have been addressed using the data obtained from RNA-sequencing technology to reveal the changes in gene/transcript expression, effect of variants, and pathways involved in disease mechanisms. However, the observations mainly depend on the aligners/tools and the performance of existing RNA-seq tools on hg38 genome assembly has not yet been documented. In this study, we performed a systematic analysis of various spliced aligners, transcript assembling and variant calling tools based on both genomic assemblies (hg19/hg38) from hippocampus brain tissue. This helps to identify the best possible combination tools for hg38 annotation. In order to evaluate the identified variants from various pipelines, we compared them with expression Quantitative Trait Loci (eQTL) and Genome-Wide Association Study (GWAS). In addition, the identified differentially expressed genes (DG) were compared with microarray studies. From our analysis of variant calling, the combination of GATK (Genome Analysis Tool-kit) and STAR (Spliced Transcripts Alignment to a Reference) protocol yields a larger number of GWAS/eQTL variants compared to SAMtools (Sequence Alignment Map). We also identified a higher number of non-coding variants in hg38 compared to hg19 due to enhanced annotation. In the case of various DG pipelines, we found that the Salmon-based hg38 transcriptomic quantification yields a higher number of reported DG compared to other genome-based quantification methods. This study revealed that higher number of reads maps to multiple location of the genome with hg38 compared to hg19, and these spurious multi-mapped reads may affect the gene quantification techniques. We suggest that it is necessary to develop efficient algorithms, which can handle the multi-mapped reads and improve the performance of genome-based alignment quantification.

Collapse

Dharshini SAP, Taguchi YH, Gromiha MM. Investigating the energy crisis in Alzheimer disease using transcriptome study. Sci Rep 2019;9:18509. [PMID: 31811163 PMCID: PMC6898285 DOI: 10.1038/s41598-019-54782-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 11/09/2019] [Indexed: 01/01/2023] Open

Liu F, Zhang Y, Zhang L, Li Z, Fang Q, Gao R, Zhang Z. Systematic comparative analysis of single-nucleotide variant detection methods from single-cell RNA sequencing data. Genome Biol 2019;20:242. [PMID: 31744515 PMCID: PMC6862814 DOI: 10.1186/s13059-019-1863-4] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/23/2019] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Systematic interrogation of single-nucleotide variants (SNVs) is one of the most promising approaches to delineate the cellular heterogeneity and phylogenetic relationships at the single-cell level. While SNV detection from abundant single-cell RNA sequencing (scRNA-seq) data is applicable and cost-effective in identifying expressed variants, inferring sub-clones, and deciphering genotype-phenotype linkages, there is a lack of computational methods specifically developed for SNV calling in scRNA-seq. Although variant callers for bulk RNA-seq have been sporadically used in scRNA-seq, the performances of different tools have not been assessed.

RESULTS

Here, we perform a systematic comparison of seven tools including SAMtools, the GATK pipeline, CTAT, FreeBayes, MuTect2, Strelka2, and VarScan2, using both simulation and scRNA-seq datasets, and identify multiple elements influencing their performance. While the specificities are generally high, with sensitivities exceeding 90% for most tools when calling homozygous SNVs in high-confident coding regions with sufficient read depths, such sensitivities dramatically decrease when calling SNVs with low read depths, low variant allele frequencies, or in specific genomic contexts. SAMtools shows the highest sensitivity in most cases especially with low supporting reads, despite the relatively low specificity in introns or high-identity regions. Strelka2 shows consistently good performance when sufficient supporting reads are provided, while FreeBayes shows good performance in the cases of high variant allele frequencies.

CONCLUSIONS

We recommend SAMtools, Strelka2, FreeBayes, or CTAT, depending on the specific conditions of usage. Our study provides the first benchmarking to evaluate the performances of different SNV detection tools for scRNA-seq data.

Collapse

Pharmacogenes (PGx-genes): Current understanding and future directions. Gene 2019;718:144050. [DOI: 10.1016/j.gene.2019.144050] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 08/13/2019] [Accepted: 08/14/2019] [Indexed: 12/14/2022]

Adetunji MO, Lamont SJ, Abasht B, Schmidt CJ. Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data. PLoS One 2019;14:e0216838. [PMID: 31545812 PMCID: PMC6756534 DOI: 10.1371/journal.pone.0216838] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 09/10/2019] [Indexed: 12/27/2022] Open

Islam R, Lai C. A Brief Overview of lncRNAs in Endothelial Dysfunction-Associated Diseases: From Discovery to Characterization. EPIGENOMES 2019;3:epigenomes3030020. [PMID: 34968230 PMCID: PMC8594677 DOI: 10.3390/epigenomes3030020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 09/06/2019] [Accepted: 09/07/2019] [Indexed: 11/16/2022] Open

Grant AD, Vail P, Padi M, Witkiewicz AK, Knudsen ES. Interrogating Mutant Allele Expression via Customized Reference Genomes to Define Influential Cancer Mutations. Sci Rep 2019;9:12766. [PMID: 31484939 PMCID: PMC6726654 DOI: 10.1038/s41598-019-48967-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2019] [Accepted: 08/12/2019] [Indexed: 11/16/2022] Open

Lee C, Kang EY, Gandal MJ, Eskin E, Geschwind DH. Profiling allele-specific gene expression in brains from individuals with autism spectrum disorder reveals preferential minor allele usage. Nat Neurosci 2019;22:1521-1532. [PMID: 31455884 PMCID: PMC6750256 DOI: 10.1038/s41593-019-0461-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 07/09/2019] [Indexed: 12/21/2022]

Batcha AMN, Bamopoulos SA, Kerbs P, Kumar A, Jurinovic V, Rothenberg-Thurley M, Ksienzyk B, Philippou-Massier J, Krebs S, Blum H, Schneider S, Konstandin N, Bohlander SK, Heckman C, Kontro M, Hiddemann W, Spiekermann K, Braess J, Metzeler KH, Greif PA, Mansmann U, Herold T. Allelic Imbalance of Recurrently Mutated Genes in Acute Myeloid Leukaemia. Sci Rep 2019;9:11796. [PMID: 31409822 PMCID: PMC6692371 DOI: 10.1038/s41598-019-48167-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 07/29/2019] [Indexed: 12/24/2022] Open

Affiliation(s)

Aarif M N Batcha Institute of Medical Data Processing, Biometrics and Epidemiology (IBE), Faculty of Medicine, LMU Munich, Munich, Germany. .,Data Integration for Future Medicine (DiFuture, www.difuture.de), LMU Munich, Munich, Germany.
Stefanos A Bamopoulos Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
Paul Kerbs Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
Ashwini Kumar Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
Vindi Jurinovic Institute of Medical Data Processing, Biometrics and Epidemiology (IBE), Faculty of Medicine, LMU Munich, Munich, Germany.,Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
Maja Rothenberg-Thurley Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
Bianka Ksienzyk Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
Julia Philippou-Massier Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, University of Munich, Munich, Germany
Stefan Krebs Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, University of Munich, Munich, Germany
Helmut Blum Laboratory for Functional Genome Analysis (LAFUGA), Gene Center, University of Munich, Munich, Germany
Stephanie Schneider Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,Institute of Human Genetics, University Hospital, LMU Munich, Munich, Germany
Nikola Konstandin Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany
Stefan K Bohlander Leukaemia and Blood Cancer Research Unit, Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
Caroline Heckman Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
Mika Kontro Department of Haematology, Helsinki University Hospital Comprehensive Cancer Center, Helsinki, Finland
Wolfgang Hiddemann Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
Karsten Spiekermann Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
Jan Braess Department of Oncology and Hematology, Hospital Barmherzige Brüder, Regensburg, Germany
Klaus H Metzeler Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
Philipp A Greif Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
Ulrich Mansmann Institute of Medical Data Processing, Biometrics and Epidemiology (IBE), Faculty of Medicine, LMU Munich, Munich, Germany.,Data Integration for Future Medicine (DiFuture, www.difuture.de), LMU Munich, Munich, Germany.,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany.,German Cancer Research Center (DKFZ), Heidelberg, Germany
Tobias Herold Laboratory for Leukemia Diagnostics, Department of Medicine III, University Hospital, LMU Munich, Munich, Germany. .,German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany. .,German Cancer Research Center (DKFZ), Heidelberg, Germany. .,Research Unit Apoptosis in Hematopoietic Stem Cells, Helmholtz Zentrum München, German Research Center for Environmental Health (HMGU), Munich, Germany.

Collapse

Sanchez de Groot N, Armaos A, Graña-Montes R, Alriquet M, Calloni G, Vabulas RM, Tartaglia GG. RNA structure drives interaction with proteins. Nat Commun 2019;10:3246. [PMID: 31324771 PMCID: PMC6642211 DOI: 10.1038/s41467-019-10923-5] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 06/10/2019] [Indexed: 12/12/2022] Open

Brouard JS, Schenkel F, Marete A, Bissonnette N. The GATK joint genotyping workflow is appropriate for calling variants in RNA-seq experiments. J Anim Sci Biotechnol 2019;10:44. [PMID: 31249686 PMCID: PMC6587293 DOI: 10.1186/s40104-019-0359-0] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Accepted: 04/28/2019] [Indexed: 12/30/2022] Open

Abstract

The Genome Analysis Toolkit (GATK) is a popular set of programs for discovering and genotyping variants from next-generation sequencing data. The current GATK recommendation for RNA sequencing (RNA-seq) is to perform variant calling from individual samples, with the drawback that only variable positions are reported. Versions 3.0 and above of GATK offer the possibility of calling DNA variants on cohorts of samples using the HaplotypeCaller algorithm in Genomic Variant Call Format (GVCF) mode. Using this approach, variants are called individually on each sample, generating one GVCF file per sample that lists genotype likelihoods and their genome annotations. In a second step, variants are called from the GVCF files through a joint genotyping analysis. This strategy is more flexible and reduces computational challenges in comparison to the traditional joint discovery workflow. Using a GVCF workflow for mining SNP in RNA-seq data provides substantial advantages, including reporting homozygous genotypes for the reference allele as well as missing data. Taking advantage of RNA-seq data derived from primary macrophages isolated from 50 cows, the GATK joint genotyping method for calling variants on RNA-seq data was validated by comparing this approach to a so-called “per-sample” method. In addition, pair-wise comparisons of the two methods were performed to evaluate their respective sensitivity, precision and accuracy using DNA genotypes from a companion study including the same 50 cows genotyped using either genotyping-by-sequencing or with the Bovine SNP50 Beadchip (imputed to the Bovine high density). Results indicate that both approaches are very close in their capacity of detecting reference variants and that the joint genotyping method is more sensitive than the per-sample method. Given that the joint genotyping method is more flexible and technically easier, we recommend this approach for variant calling in RNA-seq experiments.

Collapse

Mutational landscape of the transcriptome offers putative targets for immunotherapy of myeloproliferative neoplasms. Blood 2019;134:199-210. [PMID: 31064751 DOI: 10.1182/blood.2019000519] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 04/19/2019] [Indexed: 12/11/2022] Open

Toups MA, Rodrigues N, Perrin N, Kirkpatrick M. A reciprocal translocation radically reshapes sex-linked inheritance in the common frog. Mol Ecol 2019;28:1877-1889. [PMID: 30576024 DOI: 10.1111/mec.14990] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 12/04/2018] [Accepted: 12/04/2018] [Indexed: 12/22/2022]

Miao Z, Alvarez M, Pajukanta P, Ko A. ASElux: an ultra-fast and accurate allelic reads counter. Bioinformatics 2019;34:1313-1320. [PMID: 29186329 DOI: 10.1093/bioinformatics/btx762] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 11/22/2017] [Indexed: 11/12/2022] Open

Abstract

Motivation

Mapping bias causes preferential alignment to the reference allele, forming a major obstacle in allele-specific expression (ASE) analysis. The existing methods, such as simulation and SNP-aware alignment, are either inaccurate or relatively slow. To fast and accurately count allelic reads for ASE analysis, we developed a novel approach, ASElux, which utilizes the personal SNP information and counts allelic reads directly from unmapped RNA-sequence (RNA-seq) data. ASElux significantly reduces runtime by disregarding reads outside single nucleotide polymorphisms (SNPs) during the alignment.

Results

When compared to other tools on simulated and experimental data, ASElux achieves a higher accuracy on ASE estimation than non-SNP-aware aligners and requires a much shorter time than the benchmark SNP-aware aligner, GSNAP with just a slight loss in performance. ASElux can process 40 million read-pairs from an RNA-sequence (RNA-seq) sample and count allelic reads within 10 min, which is comparable to directly counting the allelic reads from alignments based on other tools. Furthermore, processing an RNA-seq sample using ASElux in conjunction with a general aligner, such as STAR, is more accurate and still ∼4× faster than STAR + WASP, and ∼33× faster than the lead SNP-aware aligner, GSNAP, making ASElux ideal for ASE analysis of large-scale transcriptomic studies. We applied ASElux to 273 lung RNA-seq samples from GTEx and identified a splice-QTL rs11078928 in lung which explains the mechanism underlying an asthma GWAS SNP rs11078927. Thus, our analysis demonstrated ASE as a highly powerful complementary tool to cis-expression quantitative trait locus (eQTL) analysis.

Availability and implementation

The software can be downloaded from https://github.com/abl0719/ASElux.

Contact

zmiao@ucla.edu or a5ko@ucla.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Han Z, Xue W, Tao L, Lou Y, Qiu Y, Zhu F. Genome-wide identification and analysis of the eQTL lncRNAs in multiple sclerosis based on RNA-seq data. Brief Bioinform 2019;21:1023-1037. [PMID: 31323688 DOI: 10.1093/bib/bbz036] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 03/05/2019] [Accepted: 03/06/2019] [Indexed: 12/29/2022] Open

Abstract Abstract The pathogenesis of multiple sclerosis (MS) is significantly regulated by long noncoding RNAs (lncRNAs), the expression of which is substantially influenced by a number of MS-associated risk single nucleotide polymorphisms (SNPs). It is thus hypothesized that the dysregulation of lncRNA induced by genomic variants may be one of the key molecular mechanisms for the pathology of MS. However, due to the lack of sufficient data on lncRNA expression and SNP genotypes of the same MS patients, such molecular mechanisms underlying the pathology of MS remain elusive. In this study, a bioinformatics strategy was applied to obtain lncRNA expression and SNP genotype data simultaneously from 142 samples (51 MS patients and 91 controls) based on RNA-seq data, and an expression quantitative trait loci (eQTL) analysis was conducted. In total, 2383 differentially expressed lncRNAs were identified as specifically expressing in brain-related tissues, and 517 of them were affected by SNPs. Then, the functional characterization, secondary structure changes and tissue and disease specificity of the cis-eQTL SNPs and lncRNA were assessed. The cis-eQTL SNPs were substantially and specifically enriched in neurological disease and intergenic region, and the secondary structure was altered in 17.6% of all lncRNAs in MS. Finally, the weighted gene coexpression network and gene set enrichment analyses were used to investigate how the influence of SNPs on lncRNAs contributed to the pathogenesis of MS. As a result, the regulation of lncRNAs by SNPs was found to mainly influence the antigen processing/presentation and mitogen-activated protein kinases (MAPK) signaling pathway in MS. These results revealed the effectiveness of the strategy proposed in this study and give insight into the mechanism (SNP-mediated modulation of lncRNAs) underlying the pathology of MS. Collapse

A high-throughput SNP discovery strategy for RNA-seq data. BMC Genomics 2019;20:160. [PMID: 30813897 PMCID: PMC6391812 DOI: 10.1186/s12864-019-5533-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 02/15/2019] [Indexed: 12/24/2022] Open

Abstract

Background

Single nucleotide polymorphisms (SNP) have been applied as important molecular markers in genetics and breeding studies. The rapid advance of next generation sequencing (NGS) provides a high-throughput means of SNP discovery. However, SNP development is limited by the availability of reliable SNP discovery methods. Especially, the optimum assembler and SNP caller for accurate SNP prediction from next generation sequencing data are not known.

Results

Herein we performed SNP prediction based on RNA-seq data of peach and mandarin peel tissue under a comprehensive comparison of two paired-end read lengths (125 bp and 150 bp), five assemblers (Trinity, IDBA, oases, SOAPdenovo, Trans-abyss) and two SNP callers (GATK and GBS). The predicted SNPs were compared with the authentic SNPs identified via PCR amplification followed by gene cloning and sequencing procedures. A total of 40 and 240 authentic SNPs were presented in five anthocyanin biosynthesis related genes in peach and in nine carotenogenic genes in mandarin. Putative SNPs predicted from the same RNA-seq data with different strategies led to quite divergent results. The rate of false positive SNPs was significantly lower when the paired-end read length was 150 bp compared with 125 bp. Trinity was superior to the other four assemblers and GATK was substantially superior to GBS due to a low rate of missing authentic SNPs. The combination of assembler Trinity, SNP caller GATK, and the paired-end read length 150 bp had the best performance in SNP discovery with 100% accuracy both in peach and in mandarin cases. This strategy was applied to the characterization of SNPs in peach and mandarin transcriptomes.

Conclusions

Through comparison of authentic SNPs obtained by PCR cloning strategy and putative SNPs predicted from different combinations of five assemblers, two SNP callers, and two paired-end read lengths, we provided a reliable and efficient strategy, Trinity-GATK with 150 bp paired-end read length, for SNP discovery from RNA-seq data. This strategy discovered SNP at 100% accuracy in peach and mandarin cases and might be applicable to a wide range of plants and other organisms.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5533-4) contains supplementary material, which is available to authorized users.

Collapse

Guo Y, Yu H, Samuels DC, Yue W, Ness S, Zhao YY. Single-nucleotide variants in human RNA: RNA editing and beyond. Brief Funct Genomics 2019;18:30-39. [PMID: 30312373 PMCID: PMC7962770 DOI: 10.1093/bfgp/ely032] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 08/21/2018] [Accepted: 09/06/2018] [Indexed: 12/12/2022] Open

Tessier L, Côté O, Bienzle D. Sequence variant analysis of RNA sequences in severe equine asthma. PeerJ 2018;6:e5759. [PMID: 30324028 PMCID: PMC6186407 DOI: 10.7717/peerj.5759] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Accepted: 09/15/2018] [Indexed: 12/13/2022] Open

Abstract

Background

Severe equine asthma is a chronic inflammatory disease of the lung in horses similar to low-Th2 late-onset asthma in humans. This study aimed to determine the utility of RNA-Seq to call gene sequence variants, and to identify sequence variants of potential relevance to the pathogenesis of asthma.

Methods

RNA-Seq data were generated from endobronchial biopsies collected from six asthmatic and seven non-asthmatic horses before and after challenge (26 samples total). Sequences were aligned to the equine genome with Spliced Transcripts Alignment to Reference software. Read preparation for sequence variant calling was performed with Picard tools and Genome Analysis Toolkit (GATK). Sequence variants were called and filtered using GATK and Ensembl Variant Effect Predictor (VEP) tools, and two RNA-Seq predicted sequence variants were investigated with both PCR and Sanger sequencing. Supplementary analysis of novel sequence variant selection with VEP was based on a score of <0.01 predicted with Sorting Intolerant from Tolerant software, missense nature, location within the protein coding sequence and presence in all asthmatic individuals. For select variants, effect on protein function was assessed with Polymorphism Phenotyping 2 and screening for non-acceptable polymorphism 2 software. Sequences were aligned and 3D protein structures predicted with Geneious software. Difference in allele frequency between the groups was assessed using a Pearson’s Chi-squared test with Yates’ continuity correction, and difference in genotype frequency was calculated using the Fisher’s exact test for count data.

Results

RNA-Seq variant calling and filtering correctly identified substitution variants in PACRG and RTTN. Sanger sequencing confirmed that the PACRG substitution was appropriately identified in all 26 samples while the RTTN substitution was identified correctly in 24 of 26 samples. These variants of uncertain significance had substitutions that were predicted to result in loss of function and to be non-neutral. Amino acid substitutions projected no change of hydrophobicity and isoelectric point in PACRG, and a change in both for RTTN. For PACRG, no difference in allele frequency between the two groups was detected but a higher proportion of asthmatic horses had the altered RTTN allele compared to non-asthmatic animals.

Discussion

RNA-Seq was sensitive and specific for calling gene sequence variants in this disease model. Even moderate coverage (<10–20 counts per million) yielded correct identification in 92% of samples, suggesting RNA-Seq may be suitable to detect sequence variants in low coverage samples. The impact of amino acid alterations in PACRG and RTTN proteins, and possible association of the sequence variants with asthma, is of uncertain significance, but their role in ciliary function may be of future interest.

Collapse

Adetunji MO, Lamont SJ, Schmidt CJ. TransAtlasDB: an integrated database connecting expression data, metadata and variants. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2018:4904553. [PMID: 29688361 PMCID: PMC5824778 DOI: 10.1093/database/bay014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 01/19/2018] [Indexed: 12/21/2022]

Akila Parvathy Dharshini S, Taguchi YH, Michael Gromiha M. Exploring the selective vulnerability in Alzheimer disease using tissue specific variant analysis. Genomics 2018;111:936-949. [PMID: 29879491 DOI: 10.1016/j.ygeno.2018.05.024] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/03/2018] [Accepted: 05/30/2018] [Indexed: 02/08/2023]

Wolff A, Bayerlová M, Gaedcke J, Kube D, Beißbarth T. A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells. PLoS One 2018;13:e0197162. [PMID: 29768462 PMCID: PMC5955523 DOI: 10.1371/journal.pone.0197162] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 04/27/2018] [Indexed: 12/17/2022] Open

Abstract

Background

Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances.

Methods

Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data.

Results

The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat’s overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67–0.69) than for the cell line dataset (ρ = 0.87–0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21–0.29 and 0.34–0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results.

Conclusion

In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.

Collapse

Piórkowska K, Żukowski K, Ropka-Molik K, Tyra M. Detection of genetic variants between different Polish Landrace and Puławska pigs by means of RNA-seq analysis. Anim Genet 2018;49:215-225. [PMID: 29635698 DOI: 10.1111/age.12654] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/07/2018] [Indexed: 02/06/2023]

Malmberg MM, Pembleton LW, Baillie RC, Drayton MC, Sudheesh S, Kaur S, Shinozuka H, Verma P, Spangenberg GC, Daetwyler HD, Forster JW, Cogan NO. Genotyping-by-sequencing through transcriptomics: implementation in a range of crop species with varying reproductive habits and ploidy levels. PLANT BIOTECHNOLOGY JOURNAL 2018;16:877-889. [PMID: 28913899 PMCID: PMC5866951 DOI: 10.1111/pbi.12835] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 08/03/2017] [Accepted: 09/08/2017] [Indexed: 05/09/2023]

Ghorbani A, Izadpanah K, Dietzgen RG. Gene expression and population polymorphism of maize Iranian mosaic virus in Zea mays, and intracellular localization and interactions of viral N, P, and M proteins in Nicotiana benthamiana. Virus Genes 2018;54:290-296. [PMID: 29450759 DOI: 10.1007/s11262-018-1540-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 02/06/2018] [Indexed: 10/18/2022]

Wang X, Chen Q, Wu Y, Lemmon ZH, Xu G, Huang C, Liang Y, Xu D, Li D, Doebley JF, Tian F. Genome-wide Analysis of Transcriptional Variability in a Large Maize-Teosinte Population. MOLECULAR PLANT 2018;11:443-459. [PMID: 29275164 DOI: 10.1016/j.molp.2017.12.011] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Revised: 10/21/2017] [Accepted: 12/11/2017] [Indexed: 05/18/2023]

Affiliation(s)

Xufeng Wang National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, Laboratory of Crop Heterosis and Utilization, Joint International Research Laboratory of Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
Qiuyue Chen National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, Laboratory of Crop Heterosis and Utilization, Joint International Research Laboratory of Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
Yaoyao Wu National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, Laboratory of Crop Heterosis and Utilization, Joint International Research Laboratory of Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
Zachary H Lemmon Department of Genetics, University of Wisconsin, Madison, WI 53706, USA
Guanghui Xu National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, Laboratory of Crop Heterosis and Utilization, Joint International Research Laboratory of Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
Cheng Huang National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, Laboratory of Crop Heterosis and Utilization, Joint International Research Laboratory of Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
Yameng Liang National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, Laboratory of Crop Heterosis and Utilization, Joint International Research Laboratory of Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
Dingyi Xu National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, Laboratory of Crop Heterosis and Utilization, Joint International Research Laboratory of Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
Dan Li National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, Laboratory of Crop Heterosis and Utilization, Joint International Research Laboratory of Crop Molecular Breeding, China Agricultural University, Beijing 100193, China
John F Doebley Department of Genetics, University of Wisconsin, Madison, WI 53706, USA
Feng Tian National Maize Improvement Center of China, Beijing Key Laboratory of Crop Genetic Improvement, Laboratory of Crop Heterosis and Utilization, Joint International Research Laboratory of Crop Molecular Breeding, China Agricultural University, Beijing 100193, China.

Collapse

Technological Developments in lncRNA Biology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018;1008:283-323. [PMID: 28815544 DOI: 10.1007/978-981-10-5203-3_10] [Citation(s) in RCA: 251] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

van Son M, Tremoen NH, Gaustad AH, Myromslien FD, Våge DI, Stenseth EB, Zeremichael TT, Grindflek E. RNA sequencing reveals candidate genes and polymorphisms related to sperm DNA integrity in testis tissue from boars. BMC Vet Res 2017;13:362. [PMID: 29183316 PMCID: PMC5706377 DOI: 10.1186/s12917-017-1279-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 11/16/2017] [Indexed: 11/17/2022] Open

Abstract

Background

Sperm DNA is protected against fragmentation by a high degree of chromatin packaging. It has been demonstrated that proper chromatin packaging is important for boar fertility outcome. However, little is known about the molecular mechanisms underlying differences in sperm DNA fragmentation. Knowledge of sequence variation influencing this sperm parameter could be beneficial in selecting the best artificial insemination (AI) boars for commercial production. The aim of this study was to identify genes differentially expressed in testis tissue of Norwegian Landrace and Duroc boars, with high and low sperm DNA fragmentation index (DFI), using transcriptome sequencing.

Results

Altogether, 308 and 374 genes were found to display significant differences in expression level between high and low DFI in Landrace and Duroc boars, respectively. Of these genes, 71 were differentially expressed in both breeds. Gene ontology analysis revealed that significant terms in common for the two breeds included extracellular matrix, extracellular region and calcium ion binding. Moreover, different metabolic processes were enriched in Landrace and Duroc, whereas immune response terms were common in Landrace only. Variant detection identified putative polymorphisms in some of the differentially expressed genes. Validation showed that predicted high impact variants in RAMP2, GIMAP6 and three uncharacterized genes are particularly interesting for sperm DNA fragmentation in boars.

Conclusions

We identified differentially expressed genes between groups of boars with high and low sperm DFI, and functional annotation of these genes point towards important biochemical pathways. Moreover, variant detection identified putative polymorphisms in the differentially expressed genes. Our results provide valuable insights into the molecular network underlying DFI in pigs.

Electronic supplementary material

The online version of this article (10.1186/s12917-017-1279-x) contains supplementary material, which is available to authorized users.

Collapse

Li Q, Wang X, Liu X, Liao Q, Sun J, He X, Yang T, Yin J, Jia J, Li X, Colotte M, Bonnet J. Long-Term Room Temperature Storage of Dry Ribonucleic Acid for Use in RNA-Seq Analysis. Biopreserv Biobank 2017;15:502-511. [PMID: 29022740 DOI: 10.1089/bio.2017.0024] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open

Guo Y, Zhao S, Sheng Q, Samuels DC, Shyr Y. The discrepancy among single nucleotide variants detected by DNA and RNA high throughput sequencing data. BMC Genomics 2017;18:690. [PMID: 28984205 PMCID: PMC5629567 DOI: 10.1186/s12864-017-4022-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Audoux J, Salson M, Grosset CF, Beaumeunier S, Holder JM, Commes T, Philippe N. SimBA: A methodology and tools for evaluating the performance of RNA-Seq bioinformatic pipelines. BMC Bioinformatics 2017;18:428. [PMID: 28969586 PMCID: PMC5623974 DOI: 10.1186/s12859-017-1831-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 09/08/2017] [Indexed: 11/10/2022] Open

Abstract

Background

The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices.

Results

To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved.

Conclusion

Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-017-1831-5) contains supplementary material, which is available to authorized users.

Collapse

Zhang Y, Li D, Han R, Wang Y, Li G, Liu X, Tian Y, Kang X, Li Z. Transcriptome analysis of the pectoral muscles of local chickens and commercial broilers using Ribo-Zero ribonucleic acid sequencing. PLoS One 2017;12:e0184115. [PMID: 28863190 PMCID: PMC5581173 DOI: 10.1371/journal.pone.0184115] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 08/20/2017] [Indexed: 12/02/2022] Open