1
|
Sagwal V, Sihag P, Singh Y, Mehla S, Kapoor P, Balyan P, Kumar A, Mir RR, Dhankher OP, Kumar U. Development and characterization of nitrogen and phosphorus use efficiency responsive genic and miRNA derived SSR markers in wheat. Heredity (Edinb) 2022; 128:391-401. [PMID: 35132208 PMCID: PMC9177559 DOI: 10.1038/s41437-022-00506-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 01/24/2022] [Accepted: 01/25/2022] [Indexed: 12/21/2022] Open
Abstract
Among all the nutrients, nitrogen (N) and phosphorous (P) are the most limiting factors reducing wheat production and productivity world-wide. These macronutrients are directly applied to soil in the form of fertilizers. However, only 30-40% of these applied fertilizers are utilized by crop plants, while the rest is lost through volatilization, leaching, and surface run off. Therefore, to overcome the deficiency of N and P, it becomes necessary to improve their use efficiency. Marker-assisted selection (MAS) combined with traditional plant breeding approaches is considered best to improve the N and P use efficiency (N/PUE) of wheat varieties. In this study, we developed and evaluated a total of 98 simple sequence repeat (SSR) markers including 66 microRNAs and 32 gene-specific SSRs on a panel of 10 (N and P efficient/deficient) wheat genotypes. Out of these, 35 SSRs were found polymorphic and have been used for the study of genetic diversity and population differentiation. A set of two SSRs, namely miR171a and miR167a were found candidate markers able to discriminate contrasting genotypes for N/PUE, respectively. Therefore, these two markers could be used as functional markers for characterization of wheat germplasm for N and P use efficiency. Target genes of these miRNAs were found to be highly associated with biological processes (24 GO terms) as compared to molecular function and cellular component and shows differential expression under various P starving conditions and abiotic stresses.
Collapse
Affiliation(s)
- Vijeta Sagwal
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, 125004, India
| | - Pooja Sihag
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, 125004, India
| | - Yogita Singh
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, 125004, India
| | - Sheetal Mehla
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, 125004, India
| | - Prexha Kapoor
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, 125004, India
| | - Priyanka Balyan
- Department of Botany, Deva Nagri P.G. College, CCS University, Meerut, 250001, India
| | - Anuj Kumar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, Pusa Campus, New Delhi, 110012, India
| | - Reyazul Rouf Mir
- Division of Genetics and Plant Breeding, Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir (SKUAST-Kashmir), Srinagar, J&K, India
| | - Om Parkash Dhankher
- Stockbridge School of Agriculture, University of Massachusetts, Amherst, MA, 01003, USA
| | - Upendra Kumar
- Department of Molecular Biology, Biotechnology and Bioinformatics, College of Basic Sciences and Humanities, CCS Haryana Agricultural University, Hisar, 125004, India.
| |
Collapse
|
2
|
Kupkova K, Mosquera JV, Smith JP, Stolarczyk M, Danehy TL, Lawson JT, Xue B, Stubbs JT, LeRoy N, Sheffield NC. GenomicDistributions: fast analysis of genomic intervals with Bioconductor. BMC Genomics 2022; 23:299. [PMID: 35413804 PMCID: PMC9003978 DOI: 10.1186/s12864-022-08467-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 03/13/2022] [Indexed: 11/10/2022] Open
Abstract
Background Epigenome analysis relies on defined sets of genomic regions output by widely used assays such as ChIP-seq and ATAC-seq. Statistical analysis and visualization of genomic region sets is essential to answer biological questions in gene regulation. As the epigenomics community continues generating data, there will be an increasing need for software tools that can efficiently deal with more abundant and larger genomic region sets. Here, we introduce GenomicDistributions, an R package for fast and easy summarization and visualization of genomic region data. Results GenomicDistributions offers a broad selection of functions to calculate properties of genomic region sets, such as feature distances, genomic partition overlaps, and more. GenomicDistributions functions are meticulously optimized for best-in-class speed and generally outperform comparable functions in existing R packages. GenomicDistributions also offers plotting functions that produce editable ggplot objects. All GenomicDistributions functions follow a uniform naming scheme and can handle either single or multiple region set inputs. Conclusions GenomicDistributions offers a fast and scalable tool for exploratory genomic region set analysis and visualization. GenomicDistributions excels in user-friendliness, flexibility of outputs, breadth of functions, and computational performance. GenomicDistributions is available from Bioconductor (https://bioconductor.org/packages/release/bioc/html/GenomicDistributions.html). Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08467-y.
Collapse
Affiliation(s)
- Kristyna Kupkova
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA.,Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, USA
| | - Jose Verdezoto Mosquera
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA.,Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, USA
| | - Jason P Smith
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA.,Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, USA
| | - Michał Stolarczyk
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA
| | - Tessa L Danehy
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA
| | - John T Lawson
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA.,Department of Biomedical Engineering, University of Virginia, Charlottesville, USA
| | - Bingjie Xue
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA.,Department of Biomedical Engineering, University of Virginia, Charlottesville, USA
| | - John T Stubbs
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA.,Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, USA
| | - Nathan LeRoy
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA.,Department of Biomedical Engineering, University of Virginia, Charlottesville, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, USA. .,Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, USA. .,Department of Biomedical Engineering, University of Virginia, Charlottesville, USA. .,Department of Public Health Sciences, University of Virginia, Charlottesville, USA.
| |
Collapse
|
3
|
Mordaunt CE, Mouat JS, Schmidt RJ, LaSalle JM. Comethyl: a network-based methylome approach to investigate the multivariate nature of health and disease. Brief Bioinform 2022; 23:bbab554. [PMID: 35037016 PMCID: PMC8921619 DOI: 10.1093/bib/bbab554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 11/15/2021] [Accepted: 12/04/2021] [Indexed: 11/14/2022] Open
Abstract
Health outcomes are frequently shaped by difficult to dissect inter-relationships between biological, behavioral, social and environmental factors. DNA methylation patterns reflect such multivariate intersections, providing a rich source of novel biomarkers and insight into disease etiologies. Recent advances in whole-genome bisulfite sequencing enable investigation of DNA methylation over all genomic CpGs, but existing bioinformatic approaches lack accessible system-level tools. Here, we develop the R package Comethyl, for weighted gene correlation network analysis of user-defined genomic regions that generates modules of comethylated regions, which are then tested for correlations with multivariate sample traits. First, regions are defined by CpG genomic location or regulatory annotation and filtered based on CpG count, sequencing depth and variability. Next, correlation networks are used to find modules of interconnected nodes using methylation values within the selected regions. Each module containing multiple comethylated regions is reduced in complexity to a single eigennode value, which is then tested for correlations with experimental metadata. Comethyl has the ability to cover the noncoding regulatory regions of the genome with high relevance to interpretation of genome-wide association studies and integration with other types of epigenomic data. We demonstrate the utility of Comethyl on a dataset of male cord blood samples from newborns later diagnosed with autism spectrum disorder (ASD) versus typical development. Comethyl successfully identified an ASD-associated module containing regions mapped to genes enriched for brain glial functions. Comethyl is expected to be useful in uncovering the multivariate nature of health disparities for a variety of common disorders. Comethyl is available at github.com/cemordaunt/comethyl with complete documentation and example analyses.
Collapse
Affiliation(s)
- Charles E Mordaunt
- Department of Medical Microbiology and Immunology, Genome Center, Perinatal Origins of Disparities Center, and MIND Institute, University of California, Davis, CA, USA
| | - Julia S Mouat
- Department of Medical Microbiology and Immunology, Genome Center, Perinatal Origins of Disparities Center, and MIND Institute, University of California, Davis, CA, USA
| | - Rebecca J Schmidt
- Department of Public Health Sciences, Perinatal Origins of Disparities Center, and MIND Institute, University of California, Davis, CA, USA
| | - Janine M LaSalle
- Department of Medical Microbiology and Immunology, Genome Center, Perinatal Origins of Disparities Center, and MIND Institute, University of California, Davis, CA, USA
| |
Collapse
|
4
|
Luo L, Gribskov M, Wang S. Bibliometric review of ATAC-Seq and its application in gene expression. Brief Bioinform 2022; 23:6543486. [PMID: 35255493 PMCID: PMC9116206 DOI: 10.1093/bib/bbac061] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 02/06/2022] [Accepted: 02/09/2022] [Indexed: 11/30/2022] Open
Abstract
With recent advances in high-throughput next-generation sequencing, it is possible to describe the regulation and expression of genes at multiple levels. An assay for transposase-accessible chromatin using sequencing (ATAC-seq), which uses Tn5 transposase to sequence protein-free binding regions of the genome, can be combined with chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) and ribonucleic acid sequencing (RNA-seq) to provide a detailed description of gene expression. Here, we reviewed the literature on ATAC-seq and described the characteristics of ATAC-seq publications. We then briefly introduced the principles of RNA-seq, ChIP-seq and ATAC-seq, focusing on the main features of the techniques. We built a phylogenetic tree from species that had been previously studied by using ATAC-seq. Studies of Mus musculus and Homo sapiens account for approximately 90% of the total ATAC-seq data, while other species are still in the process of accumulating data. We summarized the findings from human diseases and other species, illustrating the cutting-edge discoveries and the role of multi-omics data analysis in current research. Moreover, we collected and compared ATAC-seq analysis pipelines, which allowed biological researchers who lack programming skills to better analyze and explore ATAC-seq data. Through this review, it is clear that multi-omics analysis and single-cell sequencing technology will become the mainstream approach in future research.
Collapse
Affiliation(s)
- Liheng Luo
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi, China, 710072
| | - Michael Gribskov
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Sufang Wang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi, China, 710072
| |
Collapse
|
5
|
Minegishi R, Gotoh O, Tanaka N, Maruyama R, Chang JT, Mori S. A method of sample-wise region-set enrichment analysis for DNA methylomics. Epigenomics 2021; 13:1081-1093. [PMID: 34241544 DOI: 10.2217/epi-2021-0065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Aim: Gene set analysis has commonly been used to interpret DNA methylome data. However, summarizing the DNA methylation level of a gene is challenging due to variability in the number, density and methylation levels of CpG sites, and the numerous intergenic CpGs. Instead, we propose to use region sets to annotate the DNA methylome. Methods: We developed single sample region-set enrichment analysis for DNA methylome (methyl-ssRSEA) to conduct sample-wise, region-set enrichment analysis. Results: Methyl-ssRSEA can handle both microarray- and sequencing-based platforms and reproducibly recover the known biology from the methylation profiles of peripheral blood cells and breast cancers. The performance was superior to existing tools for region-set analysis in discriminating blood cell types. Conclusion: Methyl-ssRSEA offers a novel way to functionally interpret the DNA methylome in the cell.
Collapse
Affiliation(s)
- Ryu Minegishi
- Project for Development of Innovative Research on Cancer Therapeutics, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Osamu Gotoh
- Project for Development of Innovative Research on Cancer Therapeutics, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Norio Tanaka
- Project for Development of Innovative Research on Cancer Therapeutics, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Reo Maruyama
- Project for Cancer Epigenomics, Cancer Institute, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Jeffrey T Chang
- Department of Integrative Biology & Pharmacology, University of Texas Health Science Center, Houston, TX 77030, USA
| | - Seiichi Mori
- Project for Development of Innovative Research on Cancer Therapeutics, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo, Japan
| |
Collapse
|
6
|
Vlachavas EI, Bohn J, Ückert F, Nürnberg S. A Detailed Catalogue of Multi-Omics Methodologies for Identification of Putative Biomarkers and Causal Molecular Networks in Translational Cancer Research. Int J Mol Sci 2021; 22:2822. [PMID: 33802234 PMCID: PMC8000236 DOI: 10.3390/ijms22062822] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 03/05/2021] [Accepted: 03/05/2021] [Indexed: 02/06/2023] Open
Abstract
Recent advances in sequencing and biotechnological methodologies have led to the generation of large volumes of molecular data of different omics layers, such as genomics, transcriptomics, proteomics and metabolomics. Integration of these data with clinical information provides new opportunities to discover how perturbations in biological processes lead to disease. Using data-driven approaches for the integration and interpretation of multi-omics data could stably identify links between structural and functional information and propose causal molecular networks with potential impact on cancer pathophysiology. This knowledge can then be used to improve disease diagnosis, prognosis, prevention, and therapy. This review will summarize and categorize the most current computational methodologies and tools for integration of distinct molecular layers in the context of translational cancer research and personalized therapy. Additionally, the bioinformatics tools Multi-Omics Factor Analysis (MOFA) and netDX will be tested using omics data from public cancer resources, to assess their overall robustness, provide reproducible workflows for gaining biological knowledge from multi-omics data, and to comprehensively understand the significantly perturbed biological entities in distinct cancer types. We show that the performed supervised and unsupervised analyses result in meaningful and novel findings.
Collapse
Affiliation(s)
- Efstathios Iason Vlachavas
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Jonas Bohn
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
| | - Frank Ückert
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| | - Sylvia Nürnberg
- Medical Informatics for Translational Oncology, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany; (J.B.); (F.Ü.)
- Applied Medical Informatics, University Hospital Hamburg-Eppendorf, 20251 Hamburg, Germany
| |
Collapse
|
7
|
Feng J, Sheffield NC. IGD: high-performance search for large-scale genomic interval datasets. Bioinformatics 2020; 37:118-120. [PMID: 33367484 DOI: 10.1093/bioinformatics/btaa1062] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 10/19/2020] [Accepted: 12/15/2020] [Indexed: 01/04/2023] Open
Abstract
SUMMARY Databases of large-scale genome projects now contain thousands of genomic interval datasets. These data are a critical resource for understanding the function of DNA. However, our ability to examine and integrate interval data of this scale is limited. Here, we introduce the integrated genome database (IGD), a method and tool for searching genome interval datasets more than three orders of magnitude faster than existing approaches, while using only one hundredth of the memory. IGD uses a novel linear binning method that allows us to scale analysis to billions of genomic regions. AVAILABILITY https://github.com/databio/IGD.
Collapse
Affiliation(s)
- Jianglin Feng
- Center for Public Health Genomics, University of Virginia
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia.,Department of Public Health Sciences, University of Virginia.,Department of Biomedical Engineering, University of Virginia.,Department of Biochemistry and Molecular Genetics, University of Virginia
| |
Collapse
|
8
|
COCOA: coordinate covariation analysis of epigenetic heterogeneity. Genome Biol 2020; 21:240. [PMID: 32894181 PMCID: PMC7487606 DOI: 10.1186/s13059-020-02139-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Accepted: 08/07/2020] [Indexed: 12/20/2022] Open
Abstract
A key challenge in epigenetics is to determine the biological significance of epigenetic variation among individuals. We present Coordinate Covariation Analysis (COCOA), a computational framework that uses covariation of epigenetic signals across individuals and a database of region sets to annotate epigenetic heterogeneity. COCOA is the first such tool for DNA methylation data and can also analyze any epigenetic signal with genomic coordinates. We demonstrate COCOA’s utility by analyzing DNA methylation, ATAC-seq, and multi-omic data in supervised and unsupervised analyses, showing that COCOA provides new understanding of inter-sample epigenetic variation. COCOA is available on Bioconductor (http://bioconductor.org/packages/COCOA).
Collapse
|