1
|
Maruzani R, Brierley L, Jorgensen A, Fowler A. Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection. BMC Genomics 2024; 25:827. [PMID: 39227777 PMCID: PMC11370058 DOI: 10.1186/s12864-024-10737-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 08/22/2024] [Indexed: 09/05/2024] Open
Abstract
BACKGROUND Circulating tumour DNA (ctDNA) is a subset of cell free DNA (cfDNA) released by tumour cells into the bloodstream. Circulating tumour DNA has shown great potential as a biomarker to inform treatment in cancer patients. Collecting ctDNA is minimally invasive and reflects the entire genetic makeup of a patient's cancer. ctDNA variants in NGS data can be difficult to distinguish from sequencing and PCR artefacts due to low abundance, particularly in the early stages of cancer. Unique Molecular Identifiers (UMIs) are short sequences ligated to the sequencing library before amplification. These sequences are useful for filtering out low frequency artefacts. The utility of ctDNA as a cancer biomarker depends on accurate detection of cancer variants. RESULTS In this study, we benchmarked six variant calling tools, including two UMI-aware callers for their ability to call ctDNA variants. The standard variant callers tested included Mutect2, bcftools, LoFreq and FreeBayes. The UMI-aware variant callers benchmarked were UMI-VarCal and UMIErrorCorrect. We used both datasets with known variants spiked in at low frequencies, and datasets containing ctDNA, and generated synthetic UMI sequences for these datasets. Variant callers displayed different preferences for sensitivity and specificity. Mutect2 showed high sensitivity, while returning more privately called variants than any other caller in data without synthetic UMIs - an indicator of false positive variant discovery. In data encoded with synthetic UMIs, UMI-VarCal detected fewer putative false positive variants than all other callers in synthetic datasets. Mutect2 showed a balance between high sensitivity and specificity in data encoded with synthetic UMIs. CONCLUSIONS Our results indicate UMI-aware variant callers have potential to improve sensitivity and specificity in calling low frequency ctDNA variants over standard variant calling tools. There is a growing need for further development of UMI-aware variant calling tools if effective early detection methods for cancer using ctDNA samples are to be realised.
Collapse
Affiliation(s)
- Rugare Maruzani
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK.
| | - Liam Brierley
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK
- MRC-University of Glasgow Centre for Virus Research, University of Glasgow, Garscube Campus, 464 Bearsden Road, Glasgow, G61 1QH, UK
| | - Andrea Jorgensen
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK
| | - Anna Fowler
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK
| |
Collapse
|
2
|
Trinh MDL, Visintainer D, Günther J, Østerberg JT, da Fonseca RR, Fondevilla S, Moog MW, Luo G, Nørrevang AF, Crocoll C, Nielsen PV, Jacobsen S, Wendt T, Bak S, López‐Marqués RL, Palmgren M. Site-directed genotype screening for elimination of antinutritional saponins in quinoa seeds identifies TSARL1 as a master controller of saponin biosynthesis selectively in seeds. PLANT BIOTECHNOLOGY JOURNAL 2024; 22:2216-2234. [PMID: 38572508 PMCID: PMC11258981 DOI: 10.1111/pbi.14340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 03/01/2024] [Accepted: 03/04/2024] [Indexed: 04/05/2024]
Abstract
Climate change may result in a drier climate and increased salinization, threatening agricultural productivity worldwide. Quinoa (Chenopodium quinoa) produces highly nutritious seeds and tolerates abiotic stresses such as drought and high salinity, making it a promising future food source. However, the presence of antinutritional saponins in their seeds is an undesirable trait. We mapped genes controlling seed saponin content to a genomic region that includes TSARL1. We isolated desired genetic variation in this gene by producing a large mutant library of a commercial quinoa cultivar and screening the library for specific nucleotide substitutions using droplet digital PCR. We were able to rapidly isolate two independent tsarl1 mutants, which retained saponins in the leaves and roots for defence, but saponins were undetectable in the seed coat. We further could show that TSARL1 specifically controls seed saponin biosynthesis in the committed step after 2,3-oxidosqualene. Our work provides new important knowledge on the function of TSARL1 and represents a breakthrough for quinoa breeding.
Collapse
Affiliation(s)
- Mai Duy Luu Trinh
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| | - Davide Visintainer
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| | - Jan Günther
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| | | | - Rute R. da Fonseca
- Section for BiodiversityGlobe Institute, University of CopenhagenKøbenhavn ØDenmark
| | | | - Max William Moog
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| | - Guangbin Luo
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| | - Anton F. Nørrevang
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| | - Christoph Crocoll
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| | - Philip V. Nielsen
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| | | | | | - Søren Bak
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| | | | - Michael Palmgren
- Department of Plant and Environmental SciencesUniversity of CopenhagenFrederiksbergDenmark
| |
Collapse
|
3
|
Hemstrom W, Grummer JA, Luikart G, Christie MR. Next-generation data filtering in the genomics era. Nat Rev Genet 2024:10.1038/s41576-024-00738-6. [PMID: 38877133 DOI: 10.1038/s41576-024-00738-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2024] [Indexed: 06/16/2024]
Abstract
Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering - removing sequencing bases, reads, genetic variants and/or individuals from a dataset - to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy-Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima's D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne).
Collapse
Affiliation(s)
- William Hemstrom
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| | - Jared A Grummer
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Gordon Luikart
- Flathead Lake Biological Station, Wildlife Biology Program and Division of Biological Sciences, University of Montana, Missoula, MT, USA
| | - Mark R Christie
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
4
|
Pandey A, Malik P, Kumar A, Kaur N, Saini DK, Gill RK, Kashyap S, Kaur S. Multi-GWAS reveals significant genomic regions for Mungbean yellow mosaic India virus resistance in urdbean (Vigna mungo (L.) across multiple environments. PLANT CELL REPORTS 2024; 43:166. [PMID: 38862789 DOI: 10.1007/s00299-024-03257-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 06/04/2024] [Indexed: 06/13/2024]
Abstract
KEY MESSAGE Unraveling genetic markers for MYMIV resistance in urdbean, with 8 high-confidence marker-trait associations identified across diverse environments, provides crucial insights for combating MYMIV disease, informing future breeding strategies. Globally, yellow mosaic disease (YMD) causes significant yield losses, reaching up to 100% in favorable environments within major urdbean cultivating regions. The introgression of genomic regions conferring resistance into urdbean cultivars is crucial for combating YMD, including resistance against mungbean yellow mosaic India virus (MYMIV). To uncover the genetic basis of MYMIV resistance, we conducted a genome-wide association study (GWAS) using three multi-locus models in 100 diverse urdbean genotypes cultivated across six individual and two combined environments. Leveraging 4538 high-quality single nucleotide polymorphism (SNP) markers, we identified 28 unique significant marker-trait associations (MTAs) for MYMIV resistance, with 8 MTAs considered of high confidence due to detection across multiple GWAS models and/or environments. Notably, 4 out of 28 MTAs were found in proximity to previously reported genomic regions associated with MYMIV resistance in urdbean and mungbean, strengthening our findings and indicating consistent genomic regions for MYMIV resistance. Among the eight highly significant MTAs, one localized on chromosome 6 adjacent to previously identified quantitative trait loci for MYMIV resistance, while the remaining seven were novel. These MTAs contain several genes implicated in disease resistance, including four common ones consistently found across all eight MTAs: receptor-like serine-threonine kinases, E3 ubiquitin-protein ligase, pentatricopeptide repeat, and ankyrin repeats. Previous studies have linked these genes to defense against viral infections across different crops, suggesting their potential for further basic research involving cloning and utilization in breeding programs. This study represents the first GWAS investigation aimed at identifying resistance against MYMIV in urdbean germplasm.
Collapse
Affiliation(s)
- Abhishek Pandey
- Department of Plant Breeding and Genetics, Punjab Agricultural University, Ludhiana, Punjab, 141004, India
| | - Palvi Malik
- Gurdev Singh Khush Institute of Genetics, Plant Breeding and Biotechnology, Punjab Agricultural University, Ludhiana, 141004, India
| | - Ashok Kumar
- Regional Research Station, Punjab Agricultural University, Gurdaspur, Punjab, 143521, India
| | - Navreet Kaur
- Department of Plant Breeding and Genetics, Punjab Agricultural University, Ludhiana, Punjab, 141004, India
| | - Dinesh Kumar Saini
- Department of Plant Breeding and Genetics, Punjab Agricultural University, Ludhiana, Punjab, 141004, India
| | - Ranjit Kaur Gill
- Department of Plant Breeding and Genetics, Punjab Agricultural University, Ludhiana, Punjab, 141004, India
| | - Sunil Kashyap
- Regional Research Station, Punjab Agricultural University, Gurdaspur, Punjab, 143521, India
| | - Satinder Kaur
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, Punjab, 141004, India.
| |
Collapse
|
5
|
Sun Y, Zhao X, Fan X, Wang M, Li C, Liu Y, Wu P, Yan Q, Sun L. Assessing the impact of sequencing platforms and analytical pipelines on whole-exome sequencing. Front Genet 2024; 15:1334075. [PMID: 38818042 PMCID: PMC11137314 DOI: 10.3389/fgene.2024.1334075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/30/2024] [Indexed: 06/01/2024] Open
Affiliation(s)
- Yanping Sun
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Xiaochao Zhao
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Xue Fan
- Clinical Research Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Miao Wang
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Chaoyang Li
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Yongfeng Liu
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Ping Wu
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Qin Yan
- GeneMind Biosciences Company Limited, Shenzhen, China
| | - Lei Sun
- GeneMind Biosciences Company Limited, Shenzhen, China
| |
Collapse
|
6
|
Gaston JM, Alm EJ, Zhang AN. Fast and accurate variant identification tool for sequencing-based studies. BMC Biol 2024; 22:90. [PMID: 38644496 PMCID: PMC11034086 DOI: 10.1186/s12915-024-01891-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 04/17/2024] [Indexed: 04/23/2024] Open
Abstract
BACKGROUND Accurate identification of genetic variants, such as point mutations and insertions/deletions (indels), is crucial for various genetic studies into epidemic tracking, population genetics, and disease diagnosis. Genetic studies into microbiomes often require processing numerous sequencing datasets, necessitating variant identifiers with high speed, accuracy, and robustness. RESULTS We present QuickVariants, a bioinformatics tool that effectively summarizes variant information from read alignments and identifies variants. When tested on diverse bacterial sequencing data, QuickVariants demonstrates a ninefold higher median speed than bcftools, a widely used variant identifier, with higher accuracy in identifying both point mutations and indels. This accuracy extends to variant identification in virus samples, including SARS-CoV-2, particularly with significantly fewer false negative indels than bcftools. The high accuracy of QuickVariants is further demonstrated by its detection of a greater number of Omicron-specific indels (5 versus 0) and point mutations (61 versus 48-54) than bcftools in sewage metagenomes predominated by Omicron variants. Much of the reduced accuracy of bcftools was attributable to its misinterpretation of indels, often producing false negative indels and false positive point mutations at the same locations. CONCLUSIONS We introduce QuickVariants, a fast, accurate, and robust bioinformatics tool designed for identifying genetic variants for microbial studies. QuickVariants is available at https://github.com/caozhichongchong/QuickVariants .
Collapse
Affiliation(s)
| | - Eric J Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, USA
- Department of Biological Engineering, Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, USA
| | - An-Ni Zhang
- Department of Biological Engineering, Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, USA.
| |
Collapse
|
7
|
Jiménez-Madrigal JP, Till BJ, Gatica-Arias A. Genetic Diversity Assessment in Plants from Reduced Representation Sequencing Data. Methods Mol Biol 2024; 2787:107-122. [PMID: 38656485 DOI: 10.1007/978-1-0716-3778-4_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Genetic diversity refers to the variety of genetic traits within a population or a species. It is an essential aspect of both plant ecology and plant breeding because it contributes to the adaptability, survival, and resilience of populations in changing environments. This chapter outlines a pipeline for estimating genetic diversity statistics from reduced representation or whole genome sequencing data. The pipeline involves obtaining DNA sequence reads, mapping the corresponding reads to a reference genome, calling variants from the alignments, and generating an unbiased estimation of nucleotide diversity and divergence between populations. The pipeline is suitable for single-end Illumina reads and can be adjusted for paired-end reads. The resulting pipeline provides a comprehensive approach for aligning and analyzing sequencing data to estimate genetic diversity.
Collapse
Affiliation(s)
- José P Jiménez-Madrigal
- Instituto Tecnológico de Costa Rica, Escuela de Ciencias Naturales y Exactas, Alajuela, Costa Rica.
| | - Bradley J Till
- Veterinary Genetics Laboratory, University of California, Davis, CA, USA
| | - Andrés Gatica-Arias
- Escuela de Biología, Universidad de Costa Rica, San José, Costa Rica
- Capacity Building for Bioinformatics in Latin America (CABANA), San José, Costa Rica
| |
Collapse
|
8
|
Fibi-Smetana S, Inglis C, Schuster D, Eberle N, Granados-Soler JL, Liu W, Krohn S, Junghanss C, Nolte I, Taher L, Murua Escobar H. The TiHoCL panel for canine lymphoma: a feasibility study integrating functional genomics and network biology approaches for comparative oncology targeted NGS panel design. Front Vet Sci 2023; 10:1301536. [PMID: 38144469 PMCID: PMC10748409 DOI: 10.3389/fvets.2023.1301536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 11/20/2023] [Indexed: 12/26/2023] Open
Abstract
Targeted next-generation sequencing (NGS) enables the identification of genomic variants in cancer patients with high sensitivity at relatively low costs, and has thus opened the era to personalized human oncology. Veterinary medicine tends to adopt new technologies at a slower pace compared to human medicine due to lower funding, nonetheless it embraces technological advancements over time. Hence, it is reasonable to assume that targeted NGS will be incorporated into routine veterinary practice in the foreseeable future. Many animal diseases have well-researched human counterparts and hence, insights gained from the latter might, in principle, be harnessed to elucidate the former. Here, we present the TiHoCL targeted NGS panel as a proof of concept, exemplifying how functional genomics and network approaches can be effectively used to leverage the wealth of information available for human diseases in the development of targeted sequencing panels for veterinary medicine. Specifically, the TiHoCL targeted NGS panel is a molecular tool for characterizing and stratifying canine lymphoma (CL) patients designed based on human non-Hodgkin lymphoma (NHL) research outputs. While various single nucleotide polymorphisms (SNPs) have been associated with high risk of developing NHL, poor prognosis and resistance to treatment in NHL patients, little is known about the genetics of CL. Thus, the ~100 SNPs featured in the TiHoCL targeted NGS panel were selected using functional genomics and network approaches following a literature and database search that shielded ~500 SNPs associated with, in nearly all cases, human hematologic malignancies. The TiHoCL targeted NGS panel underwent technical validation and preliminary functional assessment by sequencing DNA samples isolated from blood of 29 lymphoma dogs using an Ion Torrent™ PGM System achieving good sequencing run metrics. Our design framework holds new possibilities for the design of similar molecular tools applied to other diseases for which limited knowledge is available and will improve drug target discovery and patient care.
Collapse
Affiliation(s)
- Silvia Fibi-Smetana
- Institute of Biomedical Informatics, Graz University of Technology, Graz, Austria
| | - Camila Inglis
- Small Animal Clinic, University of Veterinary Medicine Hannover Foundation, Hannover, Germany
- Clinic for Hematology, Oncology and Palliative Care, Rostock University Medical Center, University of Rostock, Rostock, Germany
| | - Daniela Schuster
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-University, Erlangen, Germany
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, University of Rostock, Rostock, Germany
| | - Nina Eberle
- Small Animal Clinic, University of Veterinary Medicine Hannover Foundation, Hannover, Germany
| | - José Luis Granados-Soler
- Small Animal Clinic, University of Veterinary Medicine Hannover Foundation, Hannover, Germany
- UQVETS Small Animal Hospital, School of Veterinary Science, The University of Queensland, Gatton, QLD, Australia
| | - Wen Liu
- Clinic for Hematology, Oncology and Palliative Care, Rostock University Medical Center, University of Rostock, Rostock, Germany
| | - Saskia Krohn
- Clinic for Hematology, Oncology and Palliative Care, Rostock University Medical Center, University of Rostock, Rostock, Germany
| | - Christian Junghanss
- Clinic for Hematology, Oncology and Palliative Care, Rostock University Medical Center, University of Rostock, Rostock, Germany
| | - Ingo Nolte
- Small Animal Clinic, University of Veterinary Medicine Hannover Foundation, Hannover, Germany
| | - Leila Taher
- Institute of Biomedical Informatics, Graz University of Technology, Graz, Austria
- Clinic for Hematology, Oncology and Palliative Care, Rostock University Medical Center, University of Rostock, Rostock, Germany
- Division of Bioinformatics, Department of Biology, Friedrich-Alexander-University, Erlangen, Germany
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, Rostock University Medical Center, University of Rostock, Rostock, Germany
| | - Hugo Murua Escobar
- Clinic for Hematology, Oncology and Palliative Care, Rostock University Medical Center, University of Rostock, Rostock, Germany
| |
Collapse
|
9
|
Wen X, Li J, Yang F, Zhang X, Li Y. Exploring the Effect of High-Energy Heavy Ion Beam on Rice Genome: Transposon Activation. Genes (Basel) 2023; 14:2178. [PMID: 38137000 PMCID: PMC10742395 DOI: 10.3390/genes14122178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 11/25/2023] [Accepted: 11/29/2023] [Indexed: 12/24/2023] Open
Abstract
High-energy heavy ion beams are a new type of physical mutagen that can produce a wide range of phenotypic variations. In order to understand the mechanism of high-energy heavy ion beams, we resequenced the whole genome of individual plants with obvious phenotypic variations in rice. The sequence alignment results revealed a large number of SNPs and InDels, as well as genetic variations related to grain type and heading date. The distribution of SNP and InDel on chromosomes is random, but they often occur in the up/downstream regions and the intergenic region. Mutagenesis can cause changes in transposons such as Dasheng, mPing, Osr13 and RIRE2, affecting the stability of the genome. This study obtained the major gene mutation types, discovered differentially active transposons, screened out gene variants related to phenotype, and explored the mechanism of high-energy heavy ion beam radiation on rice genes.
Collapse
Affiliation(s)
- Xiaoting Wen
- Key Laboratory of Soybean Molecular Design and Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China; (X.W.); (F.Y.); (X.Z.); (Y.L.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingpeng Li
- Key Laboratory of Soybean Molecular Design and Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China; (X.W.); (F.Y.); (X.Z.); (Y.L.)
- Jilin Provincial Laboratory of Crop Germplasm Resources, Changchun 130299, China
| | - Fu Yang
- Key Laboratory of Soybean Molecular Design and Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China; (X.W.); (F.Y.); (X.Z.); (Y.L.)
| | - Xin Zhang
- Key Laboratory of Soybean Molecular Design and Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China; (X.W.); (F.Y.); (X.Z.); (Y.L.)
| | - Yiwei Li
- Key Laboratory of Soybean Molecular Design and Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China; (X.W.); (F.Y.); (X.Z.); (Y.L.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
10
|
Herrick N, Walsh S. ILIAD: a suite of automated Snakemake workflows for processing genomic data for downstream applications. BMC Bioinformatics 2023; 24:424. [PMID: 37940870 PMCID: PMC10633908 DOI: 10.1186/s12859-023-05548-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 10/27/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND Processing raw genomic data for downstream applications such as imputation, association studies, and modeling requires numerous third-party bioinformatics software tools. It is highly time-consuming and resource-intensive with computational demands and storage limitations that pose significant challenges that increase cost. The use of software tools independent of one another, in a disjointed stepwise fashion, increases the difficulty and sets forth higher error rates because of fragmented job executions in alignment, variant calling, and/or build conversion complications. As sequencing data availability grows, the ability for biologists to process it using stable, automated, and reproducible workflows is paramount as it significantly reduces the time to generate clean and reliable data. RESULTS The Iliad suite of genomic data workflows was developed to provide users with seamless file transitions from raw genomic data to a quality-controlled variant call format (VCF) file for downstream applications. Iliad benefits from the efficiency of the Snakemake best practices framework coupled with Singularity and Docker containers for repeatability, portability, and ease of installation. This feat is accomplished from the onset with download acquisitions of any raw data type (FASTQ, CRAM, IDAT) straight through to the generation of a clean merged data file that can combine any user-preferred datasets using robust programs such as BWA, Samtools, and BCFtools. Users can customize and direct their workflow with one straightforward configuration file. Iliad is compatible with Linux, MacOS, and Windows platforms and scalable from a local machine to a high-performance computing cluster. CONCLUSION Iliad offers automated workflows with optimized time and resource management that are comparable to other workflows available but generates analysis-ready VCF files from the most common datatypes using a single command. The storage footprint challenge of genomic data is overcome by utilizing temporary intermediate files before the final VCF is generated. This file is ready for use in imputation, genome-wide association study (GWAS) pipelines, high-throughput population genetics studies, select gene candidate studies, and more. Iliad was developed to be portable, compatible, scalable, robust, and repeatable with a simplistic setup, so biologists that are less familiar with programming can manage their own big data with this open-source suite of workflows.
Collapse
Affiliation(s)
- Noah Herrick
- Department of Biology, Indiana University Indianapolis, 723 W. Michigan Street, Indianapolis, IN, USA.
| | - Susan Walsh
- Department of Biology, Indiana University Indianapolis, 723 W. Michigan Street, Indianapolis, IN, USA
| |
Collapse
|
11
|
Hopper KR. Reduced-representation libraries in insect genetics. CURRENT OPINION IN INSECT SCIENCE 2023; 59:101084. [PMID: 37442341 DOI: 10.1016/j.cois.2023.101084] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 05/04/2023] [Accepted: 07/06/2023] [Indexed: 07/15/2023]
Abstract
Genotyping-by-sequencing of reduced-representation libraries has ushered in an era where genome-wide data can be gotten for any species. Here, I review research on this topic during the last two years, report meta-analysis of the results, and discuss analysis methods and issues. Scanning the literature from 2021 to 2022 identified 21 papers, the majority of which were on population differences, including local adaptation and migration, but several papers were on genetic maps and their use in assembly scaffolding or analysis of quantitative trait loci, on the origin of incursions of pest insects, or on infection rates of a pathogen in a disease vector. The research reviewed includes 33 species from 25 families and 11 orders. Meta-analysis showed that less than 16%, and most often, less than 1% of the genome was implicated in local adaptation and that the number of adaptive loci correlated with genetic divergence among populations.
Collapse
Affiliation(s)
- Keith R Hopper
- Beneficial Insect Introductions Research Unit, ARS, USDA, Newark, DE, United States.
| |
Collapse
|
12
|
Pu T, Peddle A, Zhu J, Tejpar S, Verbandt S. Neoantigen identification: Technological advances and challenges. Methods Cell Biol 2023; 183:265-302. [PMID: 38548414 DOI: 10.1016/bs.mcb.2023.06.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Neoantigens have emerged as promising targets for cutting-edge immunotherapies, such as cancer vaccines and adoptive cell therapy. These neoantigens are unique to tumors and arise exclusively from somatic mutations or non-genomic aberrations in tumor proteins. They encompass a wide range of alterations, including genomic mutations, post-transcriptomic variants, and viral oncoproteins. With the advancements in technology, the identification of immunogenic neoantigens has seen rapid progress, raising new opportunities for enhancing their clinical significance. Prediction of neoantigens necessitates the acquisition of high-quality samples and sequencing data, followed by mutation calling. Subsequently, the pipeline involves integrating various tools that can predict the expression, processing, binding, and recognition potential of neoantigens. However, the continuous improvement of computational tools is constrained by the availability of datasets which contain validated immunogenic neoantigens. This review article aims to provide a comprehensive summary of the current knowledge as well as limitations in neoantigen prediction and validation. Additionally, it delves into the origin and biological role of neoantigens, offering a deeper understanding of their significance in the field of cancer immunotherapy. This article thus seeks to contribute to the ongoing efforts to harness neoantigens as powerful weapons in the fight against cancer.
Collapse
Affiliation(s)
- Ting Pu
- Digestive Oncology Unit, KULeuven, Leuven, Belgium
| | | | - Jingjing Zhu
- de Duve Institute, Université catholique de Louvain, Brussels, Belgium
| | | | | |
Collapse
|
13
|
Barbosa CFC, Asunto JC, Koh RBL, Santos DMC, Zhang D, Cao EP, Galvez LC. Genome-Wide SNP and Indel Discovery in Abaca ( Musa textilis Née) and among Other Musa spp. for Abaca Genetic Resources Management. Curr Issues Mol Biol 2023; 45:5776-5797. [PMID: 37504281 PMCID: PMC10377871 DOI: 10.3390/cimb45070365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 07/05/2023] [Accepted: 07/07/2023] [Indexed: 07/29/2023] Open
Abstract
Abaca (Musa textilis Née) is an economically important fiber crop in the Philippines. Its economic potential, however, is hampered by biotic and abiotic stresses, which are exacerbated by insufficient genomic resources for varietal identification vital for crop improvement. To address these gaps, this study aimed to discover genome-wide polymorphisms among abaca cultivars and other Musa species and analyze their potential as genetic marker resources. This was achieved through whole-genome Illumina resequencing of abaca cultivars and variant calling using BCFtools, followed by genetic diversity and phylogenetic analyses. A total of 20,590,381 high-quality single-nucleotide polymorphisms (SNP) and DNA insertions/deletions (InDels) were mined across 16 abaca cultivars. Filtering based on linkage disequilibrium (LD) yielded 130,768 SNPs and 13,620 InDels, accounting for 0.396 ± 0.106 and 0.431 ± 0.111 of gene diversity across these cultivars. LD-pruned polymorphisms across abaca, M. troglodytarum, M. acuminata and M. balbisiana enabled genetic differentiation within abaca and across the four Musa spp. Phylogenetic analysis revealed the registered varieties Abuab and Inosa to accumulate a significant number of mutations, eliciting further studies linking mutations to their advantageous phenotypes. Overall, this study pioneered in producing marker resources in abaca based on genome-wide polymorphisms vital for varietal authentication and comparative genotyping with the more studied Musa spp.
Collapse
Affiliation(s)
- Cris Francis C Barbosa
- Philippine Fiber Industry Development Authority (PhilFIDA), PCAF Building, Department of Agriculture (DA) Compound, Quezon City 1101, Philippines
- Institute of Biology, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines
| | - Jayson C Asunto
- Philippine Fiber Industry Development Authority (PhilFIDA), PCAF Building, Department of Agriculture (DA) Compound, Quezon City 1101, Philippines
| | - Rhosener Bhea L Koh
- National Institute of Molecular Biology and Biotechnology, University of the Philippines Diliman, Quezon City 1101, Philippines
| | - Daisy May C Santos
- Institute of Biology, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines
| | - Dapeng Zhang
- Sustainable Perennial Crops Laboratory, United States Department of Agriculture-Agricultural Research Service, Beltsville, MD 20705, USA
| | - Ernelea P Cao
- Institute of Biology, College of Science, University of the Philippines Diliman, Quezon City 1101, Philippines
| | - Leny C Galvez
- Philippine Fiber Industry Development Authority (PhilFIDA), PCAF Building, Department of Agriculture (DA) Compound, Quezon City 1101, Philippines
| |
Collapse
|
14
|
Laufer VA, Glover TW, Wilson TE. Applications of advanced technologies for detecting genomic structural variation. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2023; 792:108475. [PMID: 37931775 PMCID: PMC10792551 DOI: 10.1016/j.mrrev.2023.108475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/07/2023] [Accepted: 11/02/2023] [Indexed: 11/08/2023]
Abstract
Chromosomal structural variation (SV) encompasses a heterogenous class of genetic variants that exerts strong influences on human health and disease. Despite their importance, many structural variants (SVs) have remained poorly characterized at even a basic level, a discrepancy predicated upon the technical limitations of prior genomic assays. However, recent advances in genomic technology can identify and localize SVs accurately, opening new questions regarding SV risk factors and their impacts in humans. Here, we first define and classify human SVs and their generative mechanisms, highlighting characteristics leveraged by various SV assays. We next examine the first-ever gapless assembly of the human genome and the technical process of assembling it, which required third-generation sequencing technologies to resolve structurally complex loci. The new portions of that "telomere-to-telomere" and subsequent pangenome assemblies highlight aspects of SV biology likely to develop in the near-term. We consider the strengths and limitations of the most promising new SV technologies and when they or longstanding approaches are best suited to meeting salient goals in the study of human SV in population-scale genomics research, clinical, and public health contexts. It is a watershed time in our understanding of human SV when new approaches are expected to fundamentally change genomic applications.
Collapse
Affiliation(s)
- Vincent A Laufer
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas W Glover
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| | - Thomas E Wilson
- Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA.
| |
Collapse
|
15
|
Samuels ME, Lapointe C, Halwas S, Worley AC. Genomic Sequence of Canadian Chenopodium berlandieri: A North American Wild Relative of Quinoa. PLANTS (BASEL, SWITZERLAND) 2023; 12:467. [PMID: 36771551 PMCID: PMC9920564 DOI: 10.3390/plants12030467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/06/2023] [Accepted: 01/11/2023] [Indexed: 06/18/2023]
Abstract
Chenopodium berlandieri (pitseed goosefoot) is a widespread native North American plant, which was cultivated and consumed by indigenous peoples prior to the arrival of European colonists. Chenopodium berlandieri is closely related to, and freely hybridizes with the domesticated South American food crop C. quinoa. As such it is a potential source of wild germplasm for breeding with C. quinoa, for improved quinoa production in North America. The C. berlandieri genome sequence could also be a useful source of information for improving quinoa adaptation. To this end, we first optimized barcode markers in two chloroplast genes, rbcL and matK. Together these markers can distinguish C. berlandieri from the morphologically similar Eurasian invasive C. album (lamb's quarters). Second, we performed whole genome sequencing and preliminary assembly of a C. berlandieri accession collected in Manitoba, Canada. Our assembly, while fragmented, is consistent with the expected allotetraploid structure containing diploid Chenopodium sub-genomes A and B. The genome of our accession is highly homozygous, with only one variant site per 3-4000 bases in non-repetitive sequences. This is consistent with predominant self-fertilization. As previously reported for the genome of a partly domesticated Mexican accession of C. berlandieri, our genome assembly is similar to that of C. quinoa. Somewhat unexpectedly, the genome of our accession had almost as many variant sites when compared to the Mexican C. berlandieri, as compared to C. quinoa. Despite the overall similarity of our genome sequence to that of C. quinoa, there are differences in genes known to be involved in the domestication or genetics of other food crops. In one example, our genome assembly appears to lack one functional copy of the SOS1 (salt overly sensitive 1) gene. SOS1 is involved in soil salinity tolerance, and by extension may be relevant to the adaptation of C. berlandieri to the wet climate of the Canadian region where it was collected. Our genome assembly will be a useful tool for the improved cultivation of quinoa in North America.
Collapse
Affiliation(s)
- Mark E. Samuels
- Centre de Recherche du CHU Ste-Justine, Montréal, QC H3T 1C5, Canada
- Département de Biochimie, Université de Montréal, Montréal, QC H3T 1C5, Canada
- Département de Médecine, Université de Montréal, Montréal, QC H3T 1C5, Canada
| | - Cassandra Lapointe
- Centre de Recherche du CHU Ste-Justine, Montréal, QC H3T 1C5, Canada
- Département de Biochimie, Université de Montréal, Montréal, QC H3T 1C5, Canada
| | - Sara Halwas
- Department of Anthropology, University of Manitoba, Winnipeg, MB R3T 2M8, Canada
| | - Anne C. Worley
- Department of Biological Sciences, University of Manitoba, Winnipeg, MB R3T 2M8, Canada
| |
Collapse
|
16
|
Sá P, Santos D, Chiaia H, Leitão A, Cordeiro JM, Gama LT, Amaral AJ. Lost pigs of Angola: Whole genome sequencing reveals unique regions of selection with emphasis on metabolism and feed efficiency. Front Genet 2022; 13:1003069. [PMID: 36353101 PMCID: PMC9639768 DOI: 10.3389/fgene.2022.1003069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 09/20/2022] [Indexed: 11/26/2022] Open
Abstract
Angola, in the western coast of Africa, has been through dramatic social events that have led to the near-disappearance of native swine populations, and the recent introduction of European exotic breeds has also contributed to the erosion of this native swine repertoire. In an effort to investigate the genetic basis of native pigs in Angola (ANG) we have generated whole genomes from animals of a remote local pig population in Huambo province, which we have compared with 78 genomes of European and Asian pig breeds as well as European and Asian wild boars that are currently in public domain. Analyses of population structure showed that ANG pigs grouped within the European cluster and were clearly separated from Asian pig breeds. Pairwise FST ranged from 0.14 to 0.26, ANG pigs display lower levels of genetic differentiation towards European breeds. Finally, we have identified candidate regions for selection using a complementary approach based on various methods. All results suggest that selection towards feed efficiency and metabolism has occurred. Moreover, all analysis identified CDKAL1 gene, which is related with insulin and cholesterol metabolism, as a candidate gene overlapping signatures of selection unique to ANG pigs. This study presents the first assessment of the genetic relationship between ANG pigs and other world breeds and uncovers selection signatures that may indicate adaptation features unique to this important genetic resource.
Collapse
Affiliation(s)
- Pedro Sá
- CIISA—Centro de Investigação Interdisciplinar em Sanidade Animal, Faculdade de Medicina Veterinária, Universidade de Lisboa, Lisboa, Portugal
- Laboratório Associado para a Ciência Animal e Veterinária (AL4AnimalS), Avenida da Universidade Técnica, Lisboa, Portugal
| | - Dulce Santos
- CIISA—Centro de Investigação Interdisciplinar em Sanidade Animal, Faculdade de Medicina Veterinária, Universidade de Lisboa, Lisboa, Portugal
- Laboratório Associado para a Ciência Animal e Veterinária (AL4AnimalS), Avenida da Universidade Técnica, Lisboa, Portugal
| | - Hermenegildo Chiaia
- Faculdade de Medicina Veterinária, Universidade José Eduardo dos Santos, Huambo, Angola
| | - Alexandre Leitão
- CIISA—Centro de Investigação Interdisciplinar em Sanidade Animal, Faculdade de Medicina Veterinária, Universidade de Lisboa, Lisboa, Portugal
- Laboratório Associado para a Ciência Animal e Veterinária (AL4AnimalS), Avenida da Universidade Técnica, Lisboa, Portugal
| | - José Moras Cordeiro
- Faculdade de Medicina Veterinária, Universidade José Eduardo dos Santos, Huambo, Angola
| | - Luís T. Gama
- CIISA—Centro de Investigação Interdisciplinar em Sanidade Animal, Faculdade de Medicina Veterinária, Universidade de Lisboa, Lisboa, Portugal
- Laboratório Associado para a Ciência Animal e Veterinária (AL4AnimalS), Avenida da Universidade Técnica, Lisboa, Portugal
| | - Andreia J. Amaral
- CIISA—Centro de Investigação Interdisciplinar em Sanidade Animal, Faculdade de Medicina Veterinária, Universidade de Lisboa, Lisboa, Portugal
- Laboratório Associado para a Ciência Animal e Veterinária (AL4AnimalS), Avenida da Universidade Técnica, Lisboa, Portugal
- *Correspondence: Andreia J. Amaral,
| |
Collapse
|