1
|
Premanand A, Shanmuga Priya M, Reena Rajkumari B. Genetic variants in androgenetic alopecia: insights from scalp RNA sequencing data. Arch Dermatol Res 2024; 316:590. [PMID: 39215850 DOI: 10.1007/s00403-024-03351-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 08/03/2024] [Accepted: 08/20/2024] [Indexed: 09/04/2024]
Affiliation(s)
- A Premanand
- Department of Integrative Biology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - M Shanmuga Priya
- Department of Integrative Biology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
| | - B Reena Rajkumari
- Department of Integrative Biology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India.
| |
Collapse
|
2
|
Xu Y, Yu F, Feng W, Wei J, Su S, Li J, Hua G, Li W, Tang Y. Genetic variation mining of the Chinese mitten crab (Eriocheir sinensis) based on transcriptome data from public databases. Brief Funct Genomics 2024:elae030. [PMID: 38984674 DOI: 10.1093/bfgp/elae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 06/02/2024] [Accepted: 06/25/2024] [Indexed: 07/11/2024] Open
Abstract
At present, public databases house an extensive repository of transcriptome data, with the volume continuing to grow at an accelerated pace. Utilizing these data effectively is a shared interest within the scientific community. In this study, we introduced a novel strategy that harnesses SNPs and InDels identified from transcriptome data, combined with sample metadata from databases, to effectively screen for molecular markers correlated with traits. We utilized 228 transcriptome datasets of Eriocheir sinensis from the NCBI database and employed the Genome Analysis Toolkit software to identify 96 388 SNPs and 20 645 InDels. Employing the genome-wide association study analysis, in conjunction with the gender information from databases, we identified 3456 sex-biased SNPs and 639 sex-biased InDels. The KOG and KEGG annotations of the sex-biased SNPs and InDels revealed that these genes were primarily involved in the metabolic processes of E. sinensis. Combined with SnpEff annotation and PCR experimental validation, a highly sex-biased SNP located in the Kelch domain containing 4 (Klhdc4) gene, CHR67-6415071, was found to alter the splicing sites of Klhdc4, generating two splice variants, Klhdc4_a and Klhdc4_b. Additionally, Klhdc4 exhibited robust expression across the ovaries, testes, and accessory glands. The sex-biased SNPs and InDels identified in this study are conducive to the development of unisexual cultivation methods for E. sinensis, and the alternative splicing event caused by the sex-biased SNP in Klhdc4 may serve as a potential mechanism for sex regulation in E. sinensis. The analysis strategy employed in this study represents a new direction for the rational exploitation and utilization of transcriptome data in public databases.
Collapse
Affiliation(s)
- Yuanfeng Xu
- Wuxi Fisheries College, Nanjing Agricultural University, Wuxi 214128, China
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Fan Yu
- Wuxi Fisheries College, Nanjing Agricultural University, Wuxi 214128, China
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Wenrong Feng
- Wuxi Fisheries College, Nanjing Agricultural University, Wuxi 214128, China
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Jia Wei
- Wuxi Fisheries College, Nanjing Agricultural University, Wuxi 214128, China
| | - Shengyan Su
- Wuxi Fisheries College, Nanjing Agricultural University, Wuxi 214128, China
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Jianlin Li
- Wuxi Fisheries College, Nanjing Agricultural University, Wuxi 214128, China
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| | - Guoan Hua
- Jiangsu Haorun Biological Industry Group Co., Ltd, Taizhou 225309, China
| | - Wenjing Li
- Jiangsu Haorun Biological Industry Group Co., Ltd, Taizhou 225309, China
| | - Yongkai Tang
- Wuxi Fisheries College, Nanjing Agricultural University, Wuxi 214128, China
- Key Laboratory of Freshwater Fisheries and Germplasm Resources Utilization, Ministry of Agriculture and Rural Affairs, Freshwater Fisheries Research Center, Chinese Academy of Fishery Sciences, Wuxi 214081, China
| |
Collapse
|
3
|
Razi A, Lo CC, Wang S, Leek JT, Hansen KD. Genotype prediction of 336,463 samples from public expression data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.21.562237. [PMID: 38559266 PMCID: PMC10979922 DOI: 10.1101/2023.10.21.562237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Tens of thousands of RNA-sequencing experiments comprising hundreds of thousands of individual samples have now been performed. These data represent a broad range of experimental conditions, sequencing technologies, and hypotheses under study. The Recount project has aggregated and uniformly processed hundreds of thousands of publicly available RNA-seq samples. Most of these samples only include RNA expression measurements; genotype data for these same samples would enable a wide range of analyses including variant prioritization, eQTL analysis, and studies of allele specific expression. Here, we developed a statistical model based on the existing reference and alternative read counts from the RNA-seq experiments available through Recount3 to predict genotypes at autosomal biallelic loci in coding regions. We demonstrate the accuracy of our model using large-scale studies that measured both gene expression and genotype genome-wide. We show that our predictive model is highly accurate with 99.5% overall accuracy, 99.6% major allele accuracy, and 90.4% minor allele accuracy. Our model is robust to tissue and study effects, provided the coverage is high enough. We applied this model to genotype all the samples in Recount 3 and provide the largest ready-to-use expression repository containing genotype information. We illustrate that the predicted genotype from RNA-seq data is sufficient to unravel the underlying population structure of samples in Recount3 using Principal Component Analysis.
Collapse
Affiliation(s)
- Afrooz Razi
- Department of Genetic Medicine, Johns Hopkins University School of Medicine
| | - Christopher C. Lo
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
| | - Siruo Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
| | - Jeffrey T. Leek
- Biostatistics Program, Division of Public Health Sciences, Fred Hutchinson Cancer Center
| | - Kasper D. Hansen
- Department of Genetic Medicine, Johns Hopkins University School of Medicine
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine
| |
Collapse
|
4
|
Kizilirmak C, Monteleone E, García-Manteiga JM, Brambilla F, Agresti A, Bianchi ME, Zambrano S. Small transcriptional differences among cell clones lead to distinct NF-κB dynamics. iScience 2023; 26:108573. [PMID: 38144455 PMCID: PMC10746373 DOI: 10.1016/j.isci.2023.108573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/06/2023] [Accepted: 11/21/2023] [Indexed: 12/26/2023] Open
Abstract
Transcription factor dynamics is fundamental to determine the activation of accurate transcriptional programs and yet is heterogeneous at a single-cell level, even within homogeneous populations. We asked how such heterogeneity emerges for the nuclear factor κB (NF-κB). We found that clonal populations of immortalized fibroblasts derived from a single mouse embryo display robustly distinct NF-κB dynamics upon tumor necrosis factor ɑ (TNF-ɑ) stimulation including persistent, oscillatory, and weak activation, giving rise to differences in the transcription of its targets. By combining transcriptomics and simulations we show how less than two-fold differences in the expression levels of genes coding for key proteins of the signaling cascade and feedback system are predictive of the differences of the NF-κB dynamic response of the clones to TNF-ɑ and IL-1β. We propose that small transcriptional differences in the regulatory circuit of a transcription factor can lead to distinct signaling dynamics in cells within homogeneous cell populations and among different cell types.
Collapse
Affiliation(s)
- Cise Kizilirmak
- School of Medicine, Vita-Salute San Raffaele University, 20132 Milan, Italy
- Division of Genetics and Cell Biology, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Emanuele Monteleone
- School of Medicine, Vita-Salute San Raffaele University, 20132 Milan, Italy
- Division of Genetics and Cell Biology, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | | | - Francesca Brambilla
- Division of Genetics and Cell Biology, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Alessandra Agresti
- Division of Immunology, Transplantation and Infectious Diseases, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Marco E. Bianchi
- School of Medicine, Vita-Salute San Raffaele University, 20132 Milan, Italy
- Division of Genetics and Cell Biology, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| | - Samuel Zambrano
- School of Medicine, Vita-Salute San Raffaele University, 20132 Milan, Italy
- Division of Genetics and Cell Biology, IRCCS San Raffaele Scientific Institute, 20132 Milan, Italy
| |
Collapse
|
5
|
Al-Aamri A, Kamarul Azman S, Daw Elbait G, Alsafar H, Henschel A. Critical assessment of on-premise approaches to scalable genome analysis. BMC Bioinformatics 2023; 24:354. [PMID: 37735350 PMCID: PMC10512525 DOI: 10.1186/s12859-023-05470-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Accepted: 09/08/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Plummeting DNA sequencing cost in recent years has enabled genome sequencing projects to scale up by several orders of magnitude, which is transforming genomics into a highly data-intensive field of research. This development provides the much needed statistical power required for genotype-phenotype predictions in complex diseases. METHODS In order to efficiently leverage the wealth of information, we here assessed several genomic data science tools. The rationale to focus on on-premise installations is to cope with situations where data confidentiality and compliance regulations etc. rule out cloud based solutions. We established a comprehensive qualitative and quantitative comparison between BCFtools, SnpSift, Hail, GEMINI, and OpenCGA. The tools were compared in terms of data storage technology, query speed, scalability, annotation, data manipulation, visualization, data output representation, and availability. RESULTS Tools that leverage sophisticated data structures are noted as the most suitable for large-scale projects in varying degrees of scalability in comparison to flat-file manipulation (e.g., BCFtools, and SnpSift). Remarkably, for small to mid-size projects, even lightweight relational database. CONCLUSION The assessment criteria provide insights into the typical questions posed in scalable genomics and serve as guidance for the development of scalable computational infrastructure in genomics.
Collapse
Affiliation(s)
- Amira Al-Aamri
- Department of Electrical Engineering and Computer Science, College of Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Syafiq Kamarul Azman
- Department of Electrical Engineering and Computer Science, College of Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Gihan Daw Elbait
- Department of Biology, College of Arts and Sciences, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
- Center for Biotechnology (BTC), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Habiba Alsafar
- Center for Biotechnology (BTC), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates
| | - Andreas Henschel
- Department of Electrical Engineering and Computer Science, College of Engineering, Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates.
- Center for Biotechnology (BTC), Khalifa University, P.O. Box 127788, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
6
|
Vigorito E, Barton A, Pitzalis C, Lewis MJ, Wallace C. BBmix: a Bayesian beta-binomial mixture model for accurate genotyping from RNA-sequencing. Bioinformatics 2023; 39:btad393. [PMID: 37338536 PMCID: PMC10318392 DOI: 10.1093/bioinformatics/btad393] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 05/15/2023] [Accepted: 06/19/2023] [Indexed: 06/21/2023] Open
Abstract
MOTIVATION While many pipelines have been developed for calling genotypes using RNA-sequencing (RNA-Seq) data, they all have adapted DNA genotype callers that do not model biases specific to RNA-Seq such as allele-specific expression (ASE). RESULTS Here, we present Bayesian beta-binomial mixture model (BBmix), a Bayesian beta-binomial mixture model that first learns the expected distribution of read counts for each genotype, and then deploys those learned parameters to call genotypes probabilistically. We benchmarked our model on a wide variety of datasets and showed that our method generally performed better than competitors, mainly due to an increase of up to 1.4% in the accuracy of heterozygous calls, which may have a big impact in reducing false positive rate in applications sensitive to genotyping error such as ASE. Moreover, BBmix can be easily incorporated into standard pipelines for calling genotypes. We further show that parameters are generally transferable within datasets, such that a single learning run of less than 1 h is sufficient to call genotypes in a large number of samples. AVAILABILITY AND IMPLEMENTATION We implemented BBmix as an R package that is available for free under a GPL-2 licence at https://gitlab.com/evigorito/bbmix and https://cran.r-project.org/package=bbmix with accompanying pipeline at https://gitlab.com/evigorito/bbmix_pipeline.
Collapse
Affiliation(s)
- Elena Vigorito
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom
| | - Anne Barton
- Division of Musculoskeletal and Dermatological Sciences, University of Manchester, Manchester M13 9PL, United Kingdom
| | - Costantino Pitzalis
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, United Kingdom
| | - Myles J Lewis
- Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, United Kingdom
| | - Chris Wallace
- MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, United Kingdom
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge CB2 0AW, United Kingdom
| |
Collapse
|
7
|
Iqbal MA, Hadlich F, Reyer H, Oster M, Trakooljul N, Murani E, Perdomo‐Sabogal A, Wimmers K, Ponsuksili S. RNA-Seq-based discovery of genetic variants and allele-specific expression of two layer lines and broiler chicken. Evol Appl 2023; 16:1135-1153. [PMID: 37360029 PMCID: PMC10286233 DOI: 10.1111/eva.13557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 04/21/2023] [Accepted: 04/22/2023] [Indexed: 06/28/2023] Open
Abstract
Recent advances in the selective breeding of broilers and layers have made poultry production one of the fastest-growing industries. In this study, a transcriptome variant calling approach from RNA-seq data was used to determine population diversity between broilers and layers. In total, 200 individuals were analyzed from three different chicken populations (Lohmann Brown (LB), n = 90), Lohmann Selected Leghorn (LSL, n = 89), and Broiler (BR, n = 21). The raw RNA-sequencing reads were pre-processed, quality control checked, mapped to the reference genome, and made compatible with Genome Analysis ToolKit for variant detection. Subsequently, pairwise fixation index (F ST) analysis was performed between broilers and layers. Numerous candidate genes were identified, that were associated with growth, development, metabolism, immunity, and other economically significant traits. Finally, allele-specific expression (ASE) analysis was performed in the gut mucosa of LB and LSL strains at 10, 16, 24, 30, and 60 weeks of age. At different ages, the two-layer strains showed significantly different allele-specific expressions in the gut mucosa, and changes in allelic imbalance were observed across the entire lifespan. Most ASE genes are involved in energy metabolism, including sirtuin signaling pathways, oxidative phosphorylation, and mitochondrial dysfunction. A high number of ASE genes were found during the peak of laying, which were particularly enriched in cholesterol biosynthesis. These findings indicate that genetic architecture as well as biological processes driving particular demands relate to metabolic and nutritional requirements during the laying period shape allelic heterogeneity. These processes are considerably affected by breeding and management, whereby elucidating allele-specific gene regulation is an essential step towards deciphering the genotype to phenotype map or functional diversity between the chicken populations. Additionally, we observed that several genes showing significant allelic imbalance also colocalized with the top 1% of genes identified by the FST approach, suggesting a fixation of genes in cis-regulatory elements.
Collapse
Affiliation(s)
| | - Frieder Hadlich
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Henry Reyer
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Michael Oster
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Nares Trakooljul
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | - Eduard Murani
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| | | | - Klaus Wimmers
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
- Faculty of Agricultural and Environmental SciencesUniversity RostockRostockGermany
| | - Siriluck Ponsuksili
- Research Institute for Farm Animal BiologyInstitute of Genome BiologyDummerstorfGermany
| |
Collapse
|
8
|
Alonso-Garrido M, Lozano M, Riffo-Campos AL, Font G, Vila-Donat P, Manyes L. Assessment of single-nucleotide variant discovery protocols in RNA-seq data from human cells exposed to mycotoxins. Toxicol Mech Methods 2023; 33:215-221. [PMID: 36016515 DOI: 10.1080/15376516.2022.2117673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Abstract
Food and feed contamination by nonlegislated mycotoxins beauvericin (BEA) and enniatin B (ENB) is a worldwide health concern in the present. The principal objective of this work is to assess some of the existing protocols to discover the single nucleotide variants (SNVs) in transcriptomic data obtained by RNA-seq from Jurkat cells in vitro samples individually exposed to BEA and ENB at three concentration levels (1.5, 3 and 5 µM). Moreover, previous transcriptomic results will be compared with new findings obtained using a different protocol. SNVs rs201003509 in BEA exposed cells and the rs36045790 in ENB were found in the differentially expressed genes in all doses compared to controls by means of the Genome Analysis Toolkit (GATK) Best Practices workflow. SNV-RNA-seq complementary pipeline did not show any SNV. Concerning gene expression, discrepant results were found for 1.5 µM BEA exposed cells compared with previous findings. However, 354 overlapped differentially expressed genes (DEGs) were identified in the three ENB concentrations used, with 147 matches with respect to the 245 DEGs found in the previous results. In conclusion, the two discovery SNVs protocols based on variant calling from RNA-seq used in this work displayed very different results and there were SNVs found manually not identified by any pipeline. Additionally, the new gene expression analysis reported comparable but non identical DEGs to the previous transcriptomic results obtained from these RNA-seq data.
Collapse
Affiliation(s)
- M Alonso-Garrido
- Laboratory of Food Chemistry and Toxicology, Faculty of Pharmacy, University of Valencia, Burjassot, Spain
| | - M Lozano
- Laboratory of Food Chemistry and Toxicology, Faculty of Pharmacy, University of Valencia, Burjassot, Spain.,Epidemiology and Environmental Health Joint Research Unit, FISABIO - Universitat Jaume I - Universitat de València, València, Spain
| | - A L Riffo-Campos
- Millennium Nucleus on Sociomedicine (SocioMed) and Vicerrectoría Académica, Universidad de La Frontera, Temuco, Chile.,Department of Computer Science, ETSE, University of Valencia, Valencia, Spain
| | - G Font
- Laboratory of Food Chemistry and Toxicology, Faculty of Pharmacy, University of Valencia, Burjassot, Spain
| | - P Vila-Donat
- Laboratory of Food Chemistry and Toxicology, Faculty of Pharmacy, University of Valencia, Burjassot, Spain
| | - L Manyes
- Laboratory of Food Chemistry and Toxicology, Faculty of Pharmacy, University of Valencia, Burjassot, Spain
| |
Collapse
|
9
|
Muñoz-Espinoza C, Meneses M, Hinrichsen P. Transcriptomic Approach for Global Distribution of SNP/Indel and Plant Genotyping. Methods Mol Biol 2023; 2638:147-164. [PMID: 36781640 DOI: 10.1007/978-1-0716-3024-2_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Single Nucleotide Polymorphisms (SNPs) are the most common structural variants found in any genome. They have been used for different genetic studies, from the understanding of genetic structure of populations to the development of breeding selection markers. In this chapter we present the use of transcriptomic data obtained from contrasting phenotypes for a target trait, in searching of SNPs and insertions/deletions (InDels). This approach has the advantage that the identified markers are in or close to differentially expressed genes, and so they have higher chances to tag the genes underlying the phenotypic expression of a particular trait.
Collapse
Affiliation(s)
| | - Marco Meneses
- Instituto de Investigaciones Agropecuarias, INIA La Platina, Santiago, Chile
| | - Patricio Hinrichsen
- Instituto de Investigaciones Agropecuarias, INIA La Platina, Santiago, Chile.
| |
Collapse
|
10
|
Huang J, Zhang G, Li Y, Lyu M, Zhang H, Zhang N, Chen R. Integrative genomic and transcriptomic analyses of a bud sport mutant 'Jinzao Wuhe' with the phenotype of large berries in grapevines. PeerJ 2023; 11:e14617. [PMID: 36620751 PMCID: PMC9817954 DOI: 10.7717/peerj.14617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 12/01/2022] [Indexed: 01/05/2023] Open
Abstract
Background Bud sport mutation occurs frequently in fruit plants and acts as an important approach for grapevine improvement and breeding. 'Jinzao Wuhe' is a bud sport of the elite cultivar 'Himord Seedless' with obviously enlarged organs and berries. To date, the molecular mechanisms underlying berry enlargement caused by bud sport in grapevines remain unclear. Methods Whole genome resequencing (WGRS) was performed for two pairs of bud sports and their maternal plants with similar phenotype to identify SNPs, InDels and structural variations (SVs) as well as related genes. Furthermore, transcriptomic sequencing at different developmental stages and weighted gene co-expression network analysis (WGCNA) for 'Jinzao Wuhe' and its maternal plant 'Himord Seedless' were carried out to identify the differentially expressed genes (DEGs), which were subsequently analyzed for Gene Ontology (GO) and function annotation. Results In two pairs of enlarged berry bud sports, a total of 1,334 SNPs, 272 InDels and 74 SVs, corresponding to 1,022 target genes related to symbiotic microorganisms, cell death and other processes were identified. Meanwhile, 1,149 DEGs associated with cell wall modification, stress-response and cell killing might be responsible for the phenotypic variation were also determined. As a result, 42 DEGs between 'Himord Seedless' and 'Jinzao Wuhe' harboring genetic variations were further investigated, including pectin esterase, cellulase A, cytochromes P450 (CYP), UDP-glycosyltransferase (UGT), zinc finger protein, auxin response factor (ARF), NAC transcription factor (TF), protein kinase, etc. These candidate genes offer important clues for a better understanding of developmental regulations of berry enlargement in grapevine. Conclusion Our results provide candidate genes and valuable information for dissecting the underlying mechanisms of berry development and contribute to future improvement of grapevine cultivars.
Collapse
Affiliation(s)
- Jianquan Huang
- The Research Institute of Forestry and Pomology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Guan Zhang
- Institute of Crop Germplasm and Biotechnology, Tianjin Academy of Agricultural Sciences, Tianjin, China
- College of Biotechnology and Food Science, Tianjin University of Commerce, Tianjin, China
| | - Yanhao Li
- The Research Institute of Forestry and Pomology, Tianjin Academy of Agricultural Sciences, Tianjin, China
- College of Horticulture and Gardening, Tianjin Agricultural University, Tianjin, China
| | - Mingjie Lyu
- Institute of Crop Germplasm and Biotechnology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - He Zhang
- The Research Institute of Forestry and Pomology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Na Zhang
- The Research Institute of Forestry and Pomology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Rui Chen
- Institute of Crop Germplasm and Biotechnology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| |
Collapse
|
11
|
Spatial variation in gene expression of Tasmanian devil facial tumors despite minimal host transcriptomic response to infection. BMC Genomics 2021; 22:698. [PMID: 34579650 PMCID: PMC8477496 DOI: 10.1186/s12864-021-07994-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 09/08/2021] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Transmissible cancers lie at the intersection of oncology and infectious disease, two traditionally divergent fields for which gene expression studies are particularly useful for identifying the molecular basis of phenotypic variation. In oncology, transcriptomics studies, which characterize the expression of thousands of genes, have identified processes leading to heterogeneity in cancer phenotypes and individual prognoses. More generally, transcriptomics studies of infectious diseases characterize interactions between host, pathogen, and environment to better predict population-level outcomes. Tasmanian devils have been impacted dramatically by a transmissible cancer (devil facial tumor disease; DFTD) that has led to widespread population declines. Despite initial predictions of extinction, populations have persisted at low levels, due in part to heterogeneity in host responses, particularly between sexes. However, the processes underlying this variation remain unknown. RESULTS We sequenced transcriptomes from healthy and DFTD-infected devils, as well as DFTD tumors, to characterize host responses to DFTD infection, identify differing host-tumor molecular interactions between sexes, and investigate the extent to which tumor gene expression varies among host populations. We found minimal variation in gene expression of devil lip tissues, either with respect to DFTD infection status or sex. However, 4088 genes were differentially expressed in tumors among our sampling localities. Pathways that were up- or downregulated in DFTD tumors relative to normal tissues exhibited the same patterns of expression with greater intensity in tumors from localities that experienced DFTD for longer. No mRNA sequence variants were associated with expression variation. CONCLUSIONS Expression variation among localities may reflect morphological differences in tumors that alter ratios of normal-to-tumor cells within biopsies. Phenotypic variation in tumors may arise from environmental variation or differences in host immune response that were undetectable in lip biopsies, potentially reflecting variation in host-tumor coevolutionary relationships among sites that differ in the time since DFTD arrival.
Collapse
|
12
|
Thind AS, Monga I, Thakur PK, Kumari P, Dindhoria K, Krzak M, Ranson M, Ashford B. Demystifying emerging bulk RNA-Seq applications: the application and utility of bioinformatic methodology. Brief Bioinform 2021; 22:6330938. [PMID: 34329375 DOI: 10.1093/bib/bbab259] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/14/2021] [Accepted: 06/18/2021] [Indexed: 12/13/2022] Open
Abstract
Significant innovations in next-generation sequencing techniques and bioinformatics tools have impacted our appreciation and understanding of RNA. Practical RNA sequencing (RNA-Seq) applications have evolved in conjunction with sequence technology and bioinformatic tools advances. In most projects, bulk RNA-Seq data is used to measure gene expression patterns, isoform expression, alternative splicing and single-nucleotide polymorphisms. However, RNA-Seq holds far more hidden biological information including details of copy number alteration, microbial contamination, transposable elements, cell type (deconvolution) and the presence of neoantigens. Recent novel and advanced bioinformatic algorithms developed the capacity to retrieve this information from bulk RNA-Seq data, thus broadening its scope. The focus of this review is to comprehend the emerging bulk RNA-Seq-based analyses, emphasizing less familiar and underused applications. In doing so, we highlight the power of bulk RNA-Seq in providing biological insights.
Collapse
Affiliation(s)
- Amarinder Singh Thind
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Isha Monga
- Columbia University, New York City, NY, USA
| | | | - Pallawi Kumari
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Kiran Dindhoria
- Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | | | - Marie Ranson
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| | - Bruce Ashford
- University of Wollongong, Wollongong, Australia.,Illawarra Health and Medical Research Institute, Wollongong, Australia
| |
Collapse
|
13
|
Jehl F, Degalez F, Bernard M, Lecerf F, Lagoutte L, Désert C, Coulée M, Bouchez O, Leroux S, Abasht B, Tixier-Boichard M, Bed'hom B, Burlot T, Gourichon D, Bardou P, Acloque H, Foissac S, Djebali S, Giuffra E, Zerjal T, Pitel F, Klopp C, Lagarrigue S. RNA-Seq Data for Reliable SNP Detection and Genotype Calling: Interest for Coding Variant Characterization and Cis-Regulation Analysis by Allele-Specific Expression in Livestock Species. Front Genet 2021; 12:655707. [PMID: 34262593 PMCID: PMC8273700 DOI: 10.3389/fgene.2021.655707] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 06/01/2021] [Indexed: 12/19/2022] Open
Abstract
In addition to their common usages to study gene expression, RNA-seq data accumulated over the last 10 years are a yet-unexploited resource of SNPs in numerous individuals from different populations. SNP detection by RNA-seq is particularly interesting for livestock species since whole genome sequencing is expensive and exome sequencing tools are unavailable. These SNPs detected in expressed regions can be used to characterize variants affecting protein functions, and to study cis-regulated genes by analyzing allele-specific expression (ASE) in the tissue of interest. However, gene expression can be highly variable, and filters for SNP detection using the popular GATK toolkit are not yet standardized, making SNP detection and genotype calling by RNA-seq a challenging endeavor. We compared SNP calling results using GATK suggested filters, on two chicken populations for which both RNA-seq and DNA-seq data were available for the same samples of the same tissue. We showed, in expressed regions, a RNA-seq precision of 91% (SNPs detected by RNA-seq and shared by DNA-seq) and we characterized the remaining 9% of SNPs. We then studied the genotype (GT) obtained by RNA-seq and the impact of two factors (GT call-rate and read number per GT) on the concordance of GT with DNA-seq; we proposed thresholds for them leading to a 95% concordance. Applying these thresholds to 767 multi-tissue RNA-seq of 382 birds of 11 chicken populations, we found 9.5 M SNPs in total, of which ∼550,000 SNPs per tissue and population with a reliable GT (call rate ≥ 50%) and among them, ∼340,000 with a MAF ≥ 10%. We showed that such RNA-seq data from one tissue can be used to (i) detect SNPs with a strong predicted impact on proteins, despite their scarcity in each population (16,307 SIFT deleterious missenses and 590 stop-gained), (ii) study, on a large scale, cis-regulations of gene expression, with ∼81% of protein-coding and 68% of long non-coding genes (TPM ≥ 1) that can be analyzed for ASE, and with ∼29% of them that were cis-regulated, and (iii) analyze population genetic using such SNPs located in expressed regions. This work shows that RNA-seq data can be used with good confidence to detect SNPs and associated GT within various populations and used them for different analyses as GTEx studies.
Collapse
Affiliation(s)
- Frédéric Jehl
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Fabien Degalez
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Maria Bernard
- INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France.,INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | | | | | - Colette Désert
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Manon Coulée
- INRAE, INSTITUT AGRO, PEGASE UMR 1348, Saint-Gilles, France
| | - Olivier Bouchez
- INRAE, US 1426, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | - Sophie Leroux
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | - Behnam Abasht
- Department of Animal and Food Sciences, University of Delaware, Newark, DE, United States
| | | | - Bertrand Bed'hom
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | | | | | - Philippe Bardou
- INRAE, SIGENAE, Genotoul Bioinfo MIAT, Castanet-Tolosan, France
| | - Hervé Acloque
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | - Sylvain Foissac
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | - Sarah Djebali
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | - Elisabetta Giuffra
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | - Tatiana Zerjal
- INRAE, AgroParisTech, Université Paris-Saclay, GABI UMR 1313, Jouy-en-Josas, France
| | - Frédérique Pitel
- INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
| | | | | |
Collapse
|
14
|
Youssefian L, Saeidian AH, Palizban F, Bagherieh A, Abdollahimajd F, Sotoudeh S, Mozafari N, Farahani RA, Mahmoudi H, Babashah S, Zabihi M, Zeinali S, Fortina P, Salas-Alanis JC, South AP, Vahidnezhad H, Uitto J. Whole-Transcriptome Analysis by RNA Sequencing for Genetic Diagnosis of Mendelian Skin Disorders in the Context of Consanguinity. Clin Chem 2021; 67:876-888. [PMID: 33969388 DOI: 10.1093/clinchem/hvab042] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 02/11/2021] [Indexed: 02/07/2023]
Abstract
BACKGROUND Among the approximately 8000 Mendelian disorders, >1000 have cutaneous manifestations. In many of these conditions, the underlying mutated genes have been identified by DNA-based techniques which, however, can overlook certain types of mutations, such as exonic-synonymous and deep-intronic sequence variants. Whole-transcriptome sequencing by RNA sequencing (RNA-seq) can identify such mutations and provide information about their consequences. METHODS We analyzed the whole transcriptome of 40 families with different types of Mendelian skin disorders with extensive genetic heterogeneity. The RNA-seq data were examined for variant detection and prioritization, pathogenicity confirmation, RNA expression profiling, and genome-wide homozygosity mapping in the case of consanguineous families. Among the families examined, RNA-seq was able to provide information complementary to DNA-based analyses for exonic and intronic sequence variants with aberrant splicing. In addition, we tested the possibility of using RNA-seq as the first-tier strategy for unbiased genome-wide mutation screening without information from DNA analysis. RESULTS We found pathogenic mutations in 35 families (88%) with RNA-seq in combination with other next-generation sequencing methods, and we successfully prioritized variants and found the culprit genes. In addition, as a novel concept, we propose a pipeline that increases the yield of variant calling from RNA-seq by concurrent use of genome and transcriptome references in parallel. CONCLUSIONS Our results suggest that "clinical RNA-seq" could serve as a primary approach for mutation detection in inherited diseases, particularly in consanguineous families, provided that tissues and cells expressing the relevant genes are available for analysis.
Collapse
Affiliation(s)
- Leila Youssefian
- Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, PA, USA
- Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA
- Genetics, Genomics and Cancer Biology PhD Program, Thomas Jefferson University, Philadelphia, PA, USA
| | - Amir Hossein Saeidian
- Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, PA, USA
- Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA
- Genetics, Genomics and Cancer Biology PhD Program, Thomas Jefferson University, Philadelphia, PA, USA
| | - Fahimeh Palizban
- Laboratory of Complex Biological Systems and Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Atefeh Bagherieh
- Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | | | - Soheila Sotoudeh
- Department of Dermatology, Children's Medical Center, Center of Excellence, Tehran University of Medical Sciences, Tehran, Iran
| | - Nikoo Mozafari
- Skin Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Rahele A Farahani
- Division of Nephrology and Hypertension, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Hamidreza Mahmoudi
- Department of Dermatology, Razi Hospital, Tehran University of Medical Sciences, Tehran, Iran
| | - Sadegh Babashah
- Department of Molecular Genetics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | | | | | - Paolo Fortina
- Cancer Genomics and Bioinformatics, Department of Cancer Biology, Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA, USA
- Department of Translation and Precision Medicine, Sapienza University, Rome, Italy
| | | | - Andrew P South
- Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA
| | - Hassan Vahidnezhad
- Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, PA, USA
- Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA
| | - Jouni Uitto
- Jefferson Institute of Molecular Medicine, Thomas Jefferson University, Philadelphia, PA, USA
- Department of Dermatology and Cutaneous Biology, Sidney Kimmel Medical College, Thomas Jefferson University, Philadelphia, PA, USA
| |
Collapse
|
15
|
Garrido-Rodriguez M, Lopez-Lopez D, Ortuno FM, Peña-Chilet M, Muñoz E, Calzado MA, Dopazo J. A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways. PLoS Comput Biol 2021; 17:e1008748. [PMID: 33571195 PMCID: PMC7904194 DOI: 10.1371/journal.pcbi.1008748] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 02/24/2021] [Accepted: 01/30/2021] [Indexed: 12/13/2022] Open
Abstract
MIGNON is a workflow for the analysis of RNA-Seq experiments, which not only efficiently manages the estimation of gene expression levels from raw sequencing reads, but also calls genomic variants present in the transcripts analyzed. Moreover, this is the first workflow that provides a framework for the integration of transcriptomic and genomic data based on a mechanistic model of signaling pathway activities that allows a detailed biological interpretation of the results, including a comprehensive functional profiling of cell activity. MIGNON covers the whole process, from reads to signaling circuit activity estimations, using state-of-the-art tools, it is easy to use and it is deployable in different computational environments, allowing an optimized use of the resources available.
Collapse
Affiliation(s)
- Martín Garrido-Rodriguez
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
- Departamento de Biología Celular, Fisiología e Inmunología, Universidad de Córdoba, Córdoba, Spain
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, Spain
- Hospital Universitario Reina Sofía, Córdoba, Spain
| | - Daniel Lopez-Lopez
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Sevilla, Spain
| | - Francisco M. Ortuno
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Sevilla, Spain
| | - María Peña-Chilet
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Sevilla, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, Sevilla, Spain
| | - Eduardo Muñoz
- Departamento de Biología Celular, Fisiología e Inmunología, Universidad de Córdoba, Córdoba, Spain
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, Spain
- Hospital Universitario Reina Sofía, Córdoba, Spain
| | - Marco A. Calzado
- Departamento de Biología Celular, Fisiología e Inmunología, Universidad de Córdoba, Córdoba, Spain
- Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Córdoba, Spain
- Hospital Universitario Reina Sofía, Córdoba, Spain
| | - Joaquin Dopazo
- Clinical Bioinformatics Area, Fundación Progreso y Salud (FPS), Hospital Virgen del Rocío, Sevilla, Spain
- Computational Systems Medicine, Institute of Biomedicine of Seville (IBIS), Sevilla, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), FPS, Hospital Virgen del Rocío, Sevilla, Spain
- FPS/ELIXIR-es, Hospital Virgen del Rocío, Sevilla, Spain
| |
Collapse
|
16
|
Chen W, Liu X. Proteoform Identification by Combining RNA-Seq and Top-Down Mass Spectrometry. J Proteome Res 2020; 20:261-269. [PMID: 33183009 DOI: 10.1021/acs.jproteome.0c00369] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In proteogenomic studies, genomic and transcriptomic variants are incorporated into customized protein databases for the identification of proteoforms, especially proteoforms with sample-specific variants. Most proteogenomic research has been focused on combining genomic or transcriptomic data with bottom-up mass spectrometry data. In the last decade, top-down mass spectrometry has attracted increasing attention because of its capacity to identify various proteoforms with alterations. However, top-down proteogenomics, in which genomic or transcriptomic data are combined with top-down mass spectrometry data, has not been widely adopted, and there is still a lack of software tools for top-down proteogenomic data analysis. In this paper, we introduce TopPG, a proteogenomic tool for generating proteoform sequence databases with genetic alterations and alternative splicing events. Experiments on top-down proteogenomic data of DLD-1 colorectal cancer cells showed that TopPG coupled with database search confidently identified proteoforms with sample-specific alterations.
Collapse
Affiliation(s)
- Wenrong Chen
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States
| | - Xiaowen Liu
- Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana 46202, United States.,Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| |
Collapse
|
17
|
Lam S, Zeidan J, Miglior F, Suárez-Vega A, Gómez-Redondo I, Fonseca PAS, Guan LL, Waters S, Cánovas A. Development and comparison of RNA-sequencing pipelines for more accurate SNP identification: practical example of functional SNP detection associated with feed efficiency in Nellore beef cattle. BMC Genomics 2020; 21:703. [PMID: 33032519 PMCID: PMC7545862 DOI: 10.1186/s12864-020-07107-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 09/28/2020] [Indexed: 12/14/2022] Open
Abstract
Background Optimization of an RNA-Sequencing (RNA-Seq) pipeline is critical to maximize power and accuracy to identify genetic variants, including SNPs, which may serve as genetic markers to select for feed efficiency, leading to economic benefits for beef production. This study used RNA-Seq data (GEO Accession ID: PRJEB7696 and PRJEB15314) from muscle and liver tissue, respectively, from 12 Nellore beef steers selected from 585 steers with residual feed intake measures (RFI; n = 6 low-RFI, n = 6 high-RFI). Three RNA-Seq pipelines were compared including multi-sample calling from i) non-merged samples; ii) merged samples by RFI group, iii) merged samples by RFI and tissue group. The RNA-Seq reads were aligned against the UMD3.1 bovine reference genome (release 94) assembly using STAR aligner. Variants were called using BCFtools and variant effect prediction (VeP) and functional annotation (ToppGene) analyses were performed. Results On average, total reads detected for Approach i) non-merged samples for liver and muscle, were 18,362,086.3 and 35,645,898.7, respectively. For Approach ii), merging samples by RFI group, total reads detected for each merged group was 162,030,705, and for Approach iii), merging samples by RFI group and tissues, was 324,061,410, revealing the highest read depth for Approach iii). Additionally, Approach iii) merging samples by RFI group and tissues, revealed the highest read depth per variant coverage (572.59 ± 3993.11) and encompassed the majority of localized positional genes detected by each approach. This suggests Approach iii) had optimized detection power, read depth, and accuracy of SNP calling, therefore increasing confidence of variant detection and reducing false positive detection. Approach iii) was then used to detect unique SNPs fixed within low- (12,145) and high-RFI (14,663) groups. Functional annotation of SNPs revealed positional candidate genes, for each RFI group (2886 for low-RFI, 3075 for high-RFI), which were significantly (P < 0.05) associated with immune and metabolic pathways. Conclusion The most optimized RNA-Seq pipeline allowed for more accurate identification of SNPs, associated positional candidate genes, and significantly associated metabolic pathways in muscle and liver tissues, providing insight on the underlying genetic architecture of feed efficiency in beef cattle.
Collapse
Affiliation(s)
- S Lam
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road E, Guelph, Ontario, N1G2W1, Canada
| | - J Zeidan
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road E, Guelph, Ontario, N1G2W1, Canada
| | - F Miglior
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road E, Guelph, Ontario, N1G2W1, Canada
| | - A Suárez-Vega
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road E, Guelph, Ontario, N1G2W1, Canada
| | - I Gómez-Redondo
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road E, Guelph, Ontario, N1G2W1, Canada.,Spanish National Institute for Agriculture and Food Research and Technology, Carretera de La Coruña, 28040, Madrid, Spain
| | - P A S Fonseca
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road E, Guelph, Ontario, N1G2W1, Canada
| | - L L Guan
- Department of Agriculture, Food & Nutritional Science, University of Alberta, Edmonton, Alberta, T6H 2P5, Canada
| | - S Waters
- Teagasc, Animal & Grassland Research and Innovation Centre, Grange, Dunsany, Co. Meath, C15 PW93, Ireland
| | - A Cánovas
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, 50 Stone Road E, Guelph, Ontario, N1G2W1, Canada.
| |
Collapse
|
18
|
Genome-Wide Development and Validation of Cost-Effective KASP Marker Assays for Genetic Dissection of Heat Stress Tolerance in Maize. Int J Mol Sci 2020; 21:ijms21197386. [PMID: 33036291 PMCID: PMC7582619 DOI: 10.3390/ijms21197386] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 08/24/2020] [Accepted: 08/28/2020] [Indexed: 02/06/2023] Open
Abstract
Maize is the third most important cereal crop worldwide. However, its production is vulnerable to heat stress, which is expected to become more and more severe in coming years. Germplasm resilient to heat stress has been identified, but its underlying genetic basis remains poorly understood. Genomic mapping technologies can fill the void, provided robust markers are available to tease apart the genotype-phenotype relationship. In the present investigation, we used data from an RNA-seq experiment to identify single nucleotide polymorphisms (SNPs) between two contrasting lines, LM11 and CML25, sensitive and tolerant to heat stress, respectively. The libraries for RNA-seq were made following heat stress treatment from three separate tissues/organs, comprising the top leaf, ovule, and pollen, all of which are highly vulnerable to damage by heat stress. The single nucleotide variants (SNVs) calling used STAR mapper and GATK caller pipelines in a combined approach to identify highly accurate SNPs between the two lines. A total of 554,423, 410,698, and 596,868 SNVs were discovered between LM11 and CML25 after comparing the transcript sequence reads from the leaf, pollen, and ovule libraries, respectively. Hundreds of these SNPs were then selected to develop into genome-wide Kompetitive Allele-Specific PCR (KASP) markers, which were validated to be robust with a successful SNP conversion rate of 71%. Subsequently, these KASP markers were used to effectively genotype an F2 mapping population derived from a cross of LM11 and CML25. Being highly cost-effective, these KASP markers provide a reliable molecular marker toolkit to not only facilitate the genetic dissection of the trait of heat stress tolerance but also to accelerate the breeding of heat-resilient maize by marker-assisted selection (MAS).
Collapse
|
19
|
Muñoz-Espinoza C, Di Genova A, Sánchez A, Correa J, Espinoza A, Meneses C, Maass A, Orellana A, Hinrichsen P. Identification of SNPs and InDels associated with berry size in table grapes integrating genetic and transcriptomic approaches. BMC PLANT BIOLOGY 2020; 20:365. [PMID: 32746778 PMCID: PMC7397606 DOI: 10.1186/s12870-020-02564-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 07/21/2020] [Indexed: 05/08/2023]
Abstract
BACKGROUND Berry size is considered as one of the main selection criteria in table grapes breeding programs, due to the consumer preferences. However, berry size is a complex quantitive trait under polygenic control, and its genetic determination of berry weight is not yet fully understood. The aim of this work was to perform marker discovery using a transcriptomic approach, in order to identify and characterize SNP and InDel markers associated with berry size in table grapes. We used an integrative analysis based on RNA-Seq, SNP/InDel search and validation on table grape segregants and varieties with different genetic backgrounds. RESULTS Thirty SNPs and eight InDels were identified using a transcriptomic approach (RNA-Seq). These markers were selected from SNP/InDel found among segregants from a Ruby x Sultanina population with contrasting phenotypes for berry size. The set of 38 SNP and InDel markers was distributed in eight chromosomes. Genotype-phenotype association analyses were performed using a set of 13 RxS segregants and 41 table grapes varieties with different genetic backgrounds during three seasons. The results showed several degrees of association of these markers with berry size (10.2 to 30.7%) as other berry-related traits such as length and width. The co-localization of SNP and /or InDel markers and previously reported QTLs and candidate genes associated with berry size were analysed. CONCLUSIONS We identified a set of informative and transferable SNP and InDel markers associated with berry size. Our results suggest the suitability of SNPs and InDels as candidate markers for berry weight in seedless table grape breeding. The identification of genomic regions associated with berry weight in chromosomes 8, 15 and 17 was achieved with supporting evidence derived from a transcriptome experiment focused on SNP/InDel search, as well as from a QTL-linkage mapping approach. New regions possibly associated with berry weight in chromosomes 3, 6, 9 and 14 were identified.
Collapse
Affiliation(s)
- Claudia Muñoz-Espinoza
- Instituto de Investigaciones Agropecuarias, INIA-La Platina, Santa Rosa 11610, Santiago, Chile
- Centro de Biotecnología Vegetal, Universidad Andrés Bello, Av. República 330, 3rd floor, Santiago, Chile
| | - Alex Di Genova
- Center for Mathematical Modeling (UMI2807-CNRS) and Department of Mathematical Engineering, Faculty of Mathematical and Physical Sciences, Universidad de Chile, Av. Blanco Encalada 2120, 7th floor, Santiago, Chile
| | - Alicia Sánchez
- Instituto de Investigaciones Agropecuarias, INIA-La Platina, Santa Rosa 11610, Santiago, Chile
| | - José Correa
- Instituto de Investigaciones Agropecuarias, INIA-La Platina, Santa Rosa 11610, Santiago, Chile
| | - Alonso Espinoza
- Centro de Biotecnología Vegetal, Universidad Andrés Bello, Av. República 330, 3rd floor, Santiago, Chile
| | - Claudio Meneses
- Centro de Biotecnología Vegetal, Universidad Andrés Bello, Av. República 330, 3rd floor, Santiago, Chile
- Center for Genome Regulation, Av. Blanco Encalada 2085, 3rd floor, Santiago, Chile
| | - Alejandro Maass
- Center for Mathematical Modeling (UMI2807-CNRS) and Department of Mathematical Engineering, Faculty of Mathematical and Physical Sciences, Universidad de Chile, Av. Blanco Encalada 2120, 7th floor, Santiago, Chile
- Center for Genome Regulation, Av. Blanco Encalada 2085, 3rd floor, Santiago, Chile
| | - Ariel Orellana
- Centro de Biotecnología Vegetal, Universidad Andrés Bello, Av. República 330, 3rd floor, Santiago, Chile
- Center for Genome Regulation, Av. Blanco Encalada 2085, 3rd floor, Santiago, Chile
| | - Patricio Hinrichsen
- Instituto de Investigaciones Agropecuarias, INIA-La Platina, Santa Rosa 11610, Santiago, Chile
| |
Collapse
|
20
|
Mellors T, Withers JB, Ameli A, Jones A, Wang M, Zhang L, Sanchez HN, Santolini M, Do Valle I, Sebek M, Cheng F, Pappas DA, Kremer JM, Curtis JR, Johnson KJ, Saleh A, Ghiassian SD, Akmaev VR. Clinical Validation of a Blood-Based Predictive Test for Stratification of Response to Tumor Necrosis Factor Inhibitor Therapies in Rheumatoid Arthritis Patients. NETWORK AND SYSTEMS MEDICINE 2020. [DOI: 10.1089/nsm.2020.0007] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
| | | | - Asher Ameli
- Scipher Medicine, Waltham, Massachusetts, USA
| | - Alex Jones
- Scipher Medicine, Waltham, Massachusetts, USA
| | | | - Lixia Zhang
- Scipher Medicine, Waltham, Massachusetts, USA
| | | | - Marc Santolini
- Center for Research and Interdisciplinarity (CRI), University Paris Descartes, Paris, France
| | - Italo Do Valle
- Center for Complex Network Research, Department of Physics, Northeastern University, Boston, Massachusetts, USA
| | - Michael Sebek
- Center for Complex Network Research, Department of Physics, Northeastern University, Boston, Massachusetts, USA
| | - Feixiong Cheng
- Center for Complex Network Research, Department of Physics, Northeastern University, Boston, Massachusetts, USA
| | - Dimitrios A. Pappas
- Division of Rheumatology, College of Physicians and Surgeons, Columbia University, New York, New York, USA
- CORRONA, LCC, Waltham, Massachusetts, USA
| | - Joel M. Kremer
- CORRONA, LCC, Waltham, Massachusetts, USA
- Albany Medical College, The Center for Rheumatology, Albany, New York, USA
| | - Jeffery R. Curtis
- Department of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | | | - Alif Saleh
- Scipher Medicine, Waltham, Massachusetts, USA
| | | | | |
Collapse
|
21
|
Zhang X, Jonassen I. RASflow: an RNA-Seq analysis workflow with Snakemake. BMC Bioinformatics 2020; 21:110. [PMID: 32183729 PMCID: PMC7079470 DOI: 10.1186/s12859-020-3433-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 02/26/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND With the cost of DNA sequencing decreasing, increasing amounts of RNA-Seq data are being generated giving novel insight into gene expression and regulation. Prior to analysis of gene expression, the RNA-Seq data has to be processed through a number of steps resulting in a quantification of expression of each gene/transcript in each of the analyzed samples. A number of workflows are available to help researchers perform these steps on their own data, or on public data to take advantage of novel software or reference data in data re-analysis. However, many of the existing workflows are limited to specific types of studies. We therefore aimed to develop a maximally general workflow, applicable to a wide range of data and analysis approaches and at the same time support research on both model and non-model organisms. Furthermore, we aimed to make the workflow usable also for users with limited programming skills. RESULTS Utilizing the workflow management system Snakemake and the package management system Conda, we have developed a modular, flexible and user-friendly RNA-Seq analysis workflow: RNA-Seq Analysis Snakemake Workflow (RASflow). Utilizing Snakemake and Conda alleviates challenges with library dependencies and version conflicts and also supports reproducibility. To be applicable for a wide variety of applications, RASflow supports the mapping of reads to both genomic and transcriptomic assemblies. RASflow has a broad range of potential users: it can be applied by researchers interested in any organism and since it requires no programming skills, it can be used by researchers with different backgrounds. The source code of RASflow is available on GitHub: https://github.com/zhxiaokang/RASflow. CONCLUSIONS RASflow is a simple and reliable RNA-Seq analysis workflow covering many use cases.
Collapse
Affiliation(s)
- Xiaokang Zhang
- Computational Biology Unit, Department of Informatics, University of Bergen, Thormohlens Gate 55, Bergen, 5009, Norway
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, Thormohlens Gate 55, Bergen, 5009, Norway.
| |
Collapse
|