1
|
Oury N, Magalon H. Investigating the potential roles of intra-colonial genetic variability in Pocillopora corals using genomics. Sci Rep 2024; 14:6437. [PMID: 38499737 PMCID: PMC10948807 DOI: 10.1038/s41598-024-57136-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 03/14/2024] [Indexed: 03/20/2024] Open
Abstract
Intra-colonial genetic variability (IGV), the presence of more than one genotype in a single colony, has been increasingly studied in scleractinians, revealing its high prevalence. Several studies hypothesised that IGV brings benefits, but few have investigated its roles from a genetic perspective. Here, using genomic data (SNPs), we investigated these potential benefits in populations of the coral Pocillopora acuta from Reunion Island (southwestern Indian Ocean). As the detection of IGV depends on sequencing and bioinformatics errors, we first explored the impact of the bioinformatics pipeline on its detection. Then, SNPs and genes variable within colonies were characterised. While most of the tested bioinformatics parameters did not significantly impact the detection of IGV, filtering on genotype depth of coverage strongly improved its detection by reducing genotyping errors. Mosaicism and chimerism, the two processes leading to IGV (the first through somatic mutations, the second through fusion of distinct organisms), were found in 7% and 12% of the colonies, respectively. Both processes led to several intra-colonial allelic differences, but most were non-coding or silent. However, 7% of the differences were non-silent and found in genes involved in a high diversity of biological processes, some of which were directly linked to responses to environmental stresses. IGV, therefore, appears as a source of genetic diversity and genetic plasticity, increasing the adaptive potential of colonies. Such benefits undoubtedly play an important role in the maintenance and the evolution of scleractinian populations and appear crucial for the future of coral reefs in the context of ongoing global changes.
Collapse
Affiliation(s)
- Nicolas Oury
- UMR ENTROPIE (Université de La Réunion, IRD, IFREMER, Université de Nouvelle-Calédonie, CNRS), Université de La Réunion, 97744, St Denis Cedex 09, La Réunion, France.
- Laboratoire Cogitamus, Paris, France.
- KAUST Red Sea Research Center and Marine Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia.
| | - Hélène Magalon
- UMR ENTROPIE (Université de La Réunion, IRD, IFREMER, Université de Nouvelle-Calédonie, CNRS), Université de La Réunion, 97744, St Denis Cedex 09, La Réunion, France
- Laboratoire Cogitamus, Paris, France
- Laboratoire d'Excellence CORAIL, Perpignan, France
| |
Collapse
|
2
|
Freitas FAO, Brito LF, Fanalli SL, Gonçales JL, da Silva BPM, Durval MC, Ciconello FN, de Oliveira CS, Nascimento LE, Gervásio IC, Gomes JD, Moreira GCM, Silva-Vignato B, Coutinho LL, de Almeida VV, Cesar ASM. Identification of eQTLs using different sets of single nucleotide polymorphisms associated with carcass and body composition traits in pigs. BMC Genomics 2024; 25:14. [PMID: 38166730 PMCID: PMC10759680 DOI: 10.1186/s12864-023-09863-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Mapping expression quantitative trait loci (eQTLs) in skeletal muscle tissue in pigs is crucial for understanding the relationship between genetic variation and phenotypic expression of carcass traits in meat animals. Therefore, the primary objective of this study was to evaluate the impact of different sets of single nucleotide polymorphisms (SNP), including scenarios removing SNPs pruned for linkage disequilibrium (LD) and SNPs derived from SNP chip arrays and RNA-seq data from liver, brain, and skeletal muscle tissues, on the identification of eQTLs in the Longissimus lumborum tissue, associated with carcass and body composition traits in Large White pigs. The SNPs identified from muscle mRNA were combined with SNPs identified in the brain and liver tissue transcriptomes, as well as SNPs from the GGP Porcine 50 K SNP chip array. Cis- and trans-eQTLs were identified based on the skeletal muscle gene expression level, followed by functional genomic analyses and statistical associations with carcass and body composition traits in Large White pigs. RESULTS The number of cis- and trans-eQTLs identified across different sets of SNPs (scenarios) ranged from 261 to 2,539 and from 29 to 13,721, respectively. Furthermore, 6,180 genes were modulated by eQTLs in at least one of the scenarios evaluated. The eQTLs identified were not significantly associated with carcass and body composition traits but were significantly enriched for many traits in the "Meat and Carcass" type QTL. The scenarios with the highest number of cis- (n = 304) and trans- (n = 5,993) modulated genes were the unpruned and LD-pruned SNP set scenarios identified from the muscle transcriptome. These genes include 84 transcription factor coding genes. CONCLUSIONS After LD pruning, the set of SNPs identified based on the transcriptome of the skeletal muscle tissue of pigs resulted in the highest number of genes modulated by eQTLs. Most eQTLs are of the trans type and are associated with genes influencing complex traits in pigs, such as transcription factors and enhancers. Furthermore, the incorporation of SNPs from other genomic regions to the set of SNPs identified in the porcine skeletal muscle transcriptome contributed to the identification of eQTLs that had not been identified based on the porcine skeletal muscle transcriptome alone.
Collapse
Affiliation(s)
- Felipe André Oliveira Freitas
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Luiz F Brito
- Department of Animal Sciences, Purdue University, West Lafayette, IN, 47907, USA
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil
| | - Simara Larissa Fanalli
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil
| | - Janaína Lustosa Gonçales
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | | | - Mariah Castro Durval
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil
| | - Fernanda Nery Ciconello
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | | | | | - Izally Carvalho Gervásio
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | - Julia Dezen Gomes
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | | | - Bárbara Silva-Vignato
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil
| | - Luiz Lehmann Coutinho
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil
| | - Vivian Vezzoni de Almeida
- College of Veterinary Medicine and Animal Science, Federal University of Goiás, Goiânia, 74001-970, GO, Brazil
| | - Aline Silva Mello Cesar
- Luiz de Queiroz College of Agriculture, University of São Paulo, Piracicaba, 13416-000, SP, Brazil.
- Faculty of Animal Science and Food Engineering, University of São Paulo, Pirassununga, 13635- 900, SP, Brazil.
| |
Collapse
|
3
|
Song M, Han C, Liu L, Li Q, Fan Y, Gao H, Zhang D, Ren Y, Qin F, Yang M. MIST: A microbial identification and source tracking system for next-generation sequencing data. IMETA 2023; 2:e146. [PMID: 38868214 PMCID: PMC10989743 DOI: 10.1002/imt2.146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 09/18/2023] [Accepted: 09/26/2023] [Indexed: 06/14/2024]
Abstract
The Professional Committee of Microbiology of the National Pharmacopoeia Commission organized the drafting of the Technical Guidelines for Microbial Whole Genome Sequencing (WGS), aiming to standardize the method process and technical indicators of microbial WGS and ensure the accuracy of sequencing and identification. On the basis of the Guidelines, we developed an integrated microbial identification and source tracking (MIST) system, which could meet the needs of microbial identification and contamination investigation in food and drug quality control. MIST integrates three analysis pipelines: 16S/18S/internal transcribed spacer amplicon-based microbial identification, WGS-based microbial identification, and single-nucleotide polymorphism-based microbial source tracking. MIST can analyze sequence data in a variety of formats, such as Fasta, base call file, and FASTQ. It can be connected to a high-throughput sequencing instrument to acquire sequencing data directly. We also developed a publicly accessible web server for MIST (http://syj.i-sanger.cn).
Collapse
Affiliation(s)
- Minghui Song
- Shanghai Institute for Food and Drug ControlNMPA Key Laboratory for Testing Technology of Pharmaceutical MicrobiologyShanghai
| | - Chang Han
- Shanghai Majorbio Bio‐Pharm Technology Co., Ltd.ShanghaiChina
| | - Linmeng Liu
- Shanghai Majorbio Bio‐Pharm Technology Co., Ltd.ShanghaiChina
| | - Qiongqiong Li
- Shanghai Institute for Food and Drug ControlNMPA Key Laboratory for Testing Technology of Pharmaceutical MicrobiologyShanghai
| | - Yiling Fan
- Shanghai Institute for Food and Drug ControlNMPA Key Laboratory for Testing Technology of Pharmaceutical MicrobiologyShanghai
| | - Hao Gao
- Shanghai Majorbio Bio‐Pharm Technology Co., Ltd.ShanghaiChina
| | - Dan Zhang
- Shanghai Majorbio Bio‐Pharm Technology Co., Ltd.ShanghaiChina
| | - Yi Ren
- Shanghai Majorbio Bio‐Pharm Technology Co., Ltd.ShanghaiChina
| | - Feng Qin
- Shanghai Institute for Food and Drug ControlNMPA Key Laboratory for Testing Technology of Pharmaceutical MicrobiologyShanghai
| | - Meicheng Yang
- Shanghai Institute for Food and Drug ControlNMPA Key Laboratory for Testing Technology of Pharmaceutical MicrobiologyShanghai
- Shanghai food and drug packaging material control centerShanghaiChina
| |
Collapse
|
4
|
Potential Targeted Therapies in Ovarian Cancer. Pharmaceuticals (Basel) 2022; 15:ph15111324. [PMID: 36355495 PMCID: PMC9697427 DOI: 10.3390/ph15111324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/18/2022] [Accepted: 10/21/2022] [Indexed: 12/05/2022] Open
Abstract
Background: We aimed to identify somatic pathogenic and likely pathogenic mutations using next-generation sequencing (NGS). The mutational findings were held against clinically well-described data to identify potential targeted therapies in Danish patients diagnosed with high-grade serous ovarian cancer (HGSC). Methods: We characterized the mutational profile of 128 HGSC patients. Clinical data were obtained from the Danish Gynecological Database and tissue samples were collected through the Danish CancerBiobank. DNA was analyzed using NGS. Results: 47 (37%) patients were platinum-sensitive, 32 (25%) partially platinum-sensitive, 35 (27%) platinum-resistant, and three (2%) platinum-refractory, while 11 (9%) patients did not receive chemotherapy. Overall, 27 (21%) had known druggable targets. Twelve (26%) platinum-sensitive patients had druggable targets for PARP inhibitors: one for tyrosine kinase inhibitors and one for immunotherapy treatment. Eight (25%) partially platinum-sensitive patients had druggable targets: seven were eligible for PARP inhibitors and one was potentially eligible for alpesilib and hormone therapy. Seven (20%) platinum-resistant patients had druggable targets: six (86%) were potentially eligible for PARP inhibitors, one for immunotherapy, and one for erdafitinib. Conclusions: PARP inhibitors are the most frequent potential targeted therapy in HGSC. However, other targeted therapies remain relevant for investigation according to our mutational findings.
Collapse
|
5
|
Bhat GR, Sethi I, Rah B, Kumar R, Afroze D. Innovative in Silico Approaches for Characterization of Genes and Proteins. Front Genet 2022; 13:865182. [PMID: 35664302 PMCID: PMC9159363 DOI: 10.3389/fgene.2022.865182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Bioinformatics is an amalgamation of biology, mathematics and computer science. It is a science which gathers the information from biology in terms of molecules and applies the informatic techniques to the gathered information for understanding and organizing the data in a useful manner. With the help of bioinformatics, the experimental data generated is stored in several databases available online like nucleotide database, protein databases, GENBANK and others. The data stored in these databases is used as reference for experimental evaluation and validation. Till now several online tools have been developed to analyze the genomic, transcriptomic, proteomics, epigenomics and metabolomics data. Some of them include Human Splicing Finder (HSF), Exonic Splicing Enhancer Mutation taster, and others. A number of SNPs are observed in the non-coding, intronic regions and play a role in the regulation of genes, which may or may not directly impose an effect on the protein expression. Many mutations are thought to influence the splicing mechanism by affecting the existing splice sites or creating a new sites. To predict the effect of mutation (SNP) on splicing mechanism/signal, HSF was developed. Thus, the tool is helpful in predicting the effect of mutations on splicing signals and can provide data even for better understanding of the intronic mutations that can be further validated experimentally. Additionally, rapid advancement in proteomics have steered researchers to organize the study of protein structure, function, relationships, and dynamics in space and time. Thus the effective integration of all of these technological interventions will eventually lead to steering up of next-generation systems biology, which will provide valuable biological insights in the field of research, diagnostic, therapeutic and development of personalized medicine.
Collapse
Affiliation(s)
- Gh. Rasool Bhat
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Itty Sethi
- Institute of Human Genetics, University of Jammu, Jammu, India
| | - Bilal Rah
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Rakesh Kumar
- School of Biotechnology, Shri Mata Vaishno Devi University, Katra, India
| | - Dil Afroze
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
- *Correspondence: Dil Afroze,
| |
Collapse
|
6
|
Saleh D, Chen J, Leplé J, Leroy T, Truffaut L, Dencausse B, Lalanne C, Labadie K, Lesur I, Bert D, Lagane F, Morneau F, Aury J, Plomion C, Lascoux M, Kremer A. Genome-wide evolutionary response of European oaks during the Anthropocene. Evol Lett 2022; 6:4-20. [PMID: 35127134 PMCID: PMC8802238 DOI: 10.1002/evl3.269] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 11/26/2021] [Accepted: 12/02/2021] [Indexed: 12/23/2022] Open
Abstract
The pace of tree microevolution during Anthropocene warming is largely unknown. We used a retrospective approach to monitor genomic changes in oak trees since the Little Ice Age (LIA). Allelic frequency changes were assessed from whole-genome pooled sequences for four age-structured cohorts of sessile oak (Quercus petraea) dating back to 1680, in each of three different oak forests in France. The genetic covariances of allelic frequency changes increased between successive time periods, highlighting genome-wide effects of linked selection. We found imprints of parallel linked selection in the three forests during the late LIA, and a shift of selection during more recent time periods of the Anthropocene. The changes in allelic covariances within and between forests mirrored the documented changes in the occurrence of extreme events (droughts and frosts) over the last 300 years. The genomic regions with the highest covariances were enriched in genes involved in plant responses to pathogens and abiotic stresses (temperature and drought). These responses are consistent with the reported sequence of frost (or drought) and disease damage ultimately leading to the oak dieback after extreme events. They provide support for adaptive evolution of long-lived species during recent climatic changes. Although we acknowledge that other sources (e.g., gene flow, generation overlap) may have contributed to temporal covariances of allelic frequency changes, the consistent and correlated response across the three forests lends support to the existence of a systematic driving force such as natural selection.
Collapse
Affiliation(s)
- Dounia Saleh
- UMR BIOGECO, INRAEUniversité de BordeauxCestas33612France
| | - Jun Chen
- College of Life SciencesZhejiang UniversityHangzhou310058China
| | | | - Thibault Leroy
- Department of Botany and Biodiversity ResearchUniversity of ViennaVienna1010Austria
| | - Laura Truffaut
- UMR BIOGECO, INRAEUniversité de BordeauxCestas33612France
| | | | - Céline Lalanne
- UMR BIOGECO, INRAEUniversité de BordeauxCestas33612France
| | - Karine Labadie
- Genoscope, Institut de Biologie François Jacob, Commissariat à l’énergie atomique (CEA)Université de Paris‐SaclayEvry91057France
| | | | - Didier Bert
- UMR BIOGECO, INRAEUniversité de BordeauxCestas33612France
| | | | - François Morneau
- Département Recherche Développement InnovationOffice National des ForêtsBoigny‐Sur‐Bionne45760France,Current Address: Service de l'Information Statistique Forestière et EnvironnementaleInstitut National de l'Information géographique et ForestièreNogent‐sur‐Vernisson45290France
| | - Jean‐Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRSUniv Evry, Université Paris‐SaclayEvry91057France
| | | | - Martin Lascoux
- Department of Ecology and Genetics, Evolutionary Biology CentreUppsala UniversityUppsalaSE‐75236Sweden
| | - Antoine Kremer
- UMR BIOGECO, INRAEUniversité de BordeauxCestas33612France
| |
Collapse
|
7
|
Naji MM, Utsunomiya YT, Sölkner J, Rosen BD, Mészáros G. Assessing Bos taurus introgression in the UOA Bos indicus assembly. Genet Sel Evol 2021; 53:96. [PMID: 34922445 PMCID: PMC8684283 DOI: 10.1186/s12711-021-00688-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 11/29/2021] [Indexed: 12/30/2022] Open
Abstract
Background Reference genomes are essential in the analysis of genomic data. As the cost of sequencing decreases, multiple reference genomes are being produced within species to alleviate problems such as low mapping accuracy and reference allele bias in variant calling that can be associated with the alignment of divergent samples to a single reference individual. The latest reference sequence adopted by the scientific community for the analysis of cattle data is ARS_UCD1.2, built from the DNA of a Hereford cow (Bos taurus taurus—B. taurus). A complementary genome assembly, UOA_Brahman_1, was recently built to represent the other cattle subspecies (Bos taurus indicus—B. indicus) from a Brahman cow haplotype to further support analysis of B. indicus data. In this study, we aligned the sequence data of 15 B. taurus and B. indicus breeds to each of these references. Results The alignment of B. taurus individuals against UOA_Brahman_1 detected up to five million more single-nucleotide variants (SNVs) compared to that against ARS_UCD1.2. Similarly, the alignment of B. indicus individuals against ARS_UCD1.2 resulted in one and a half million more SNVs than that against UOA_Brahman_1. The number of SNVs with nearly fixed alternative alleles also increased in the alignments with cross-subspecies. Interestingly, the alignment of B. taurus cattle against UOA_Brahman_1 revealed regions with a smaller than expected number of counts of SNVs with nearly fixed alternative alleles. Since B. taurus introgression represents on average 10% of the genome of Brahman cattle, we suggest that these regions comprise taurine DNA as opposed to indicine DNA in the UOA_Brahman_1 reference genome. Principal component and admixture analyses using genotypes inferred from this region support these taurine-introgressed loci. Overall, the flagged taurine segments represent 13.7% of the UOA_Brahman_1 assembly. The genes located within these segments were previously reported to be under positive selection in Brahman cattle, and include functional candidate genes implicated in feed efficiency, development and immunity. Conclusions We report a list of taurine segments that are in the UOA_Brahman_1 assembly, which will be useful for the interpretation of interesting genomic features (e.g., signatures of selection, runs of homozygosity, increased mutation rate, etc.) that could appear in future re-sequencing analysis of indicine cattle. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-021-00688-1.
Collapse
Affiliation(s)
- Maulana M Naji
- University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Yuri T Utsunomiya
- AgroPartners Consulting, R. Floriano Peixoto, 120 - Sala 43A - Centro, Araçatuba, SP, 16010-220, Brazil.,Department of Production and Animal Health, School of Veterinary Medicine, São Paulo State University (Unesp), Araçatuba, São Paulo, Brazil.,International Atomic Energy Agency (IAEA) Collaborating Centre on Animal Genomics and Bioinformatics, Araçatuba, São Paulo, Brazil
| | - Johann Sölkner
- University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, USDA, ARS, Beltsville, MD, USA.
| | - Gábor Mészáros
- University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| |
Collapse
|
8
|
Abstract
Minimizing false positives is a critical issue when variant calling as no method is without error. It is common practice to post-process a variant-call file (VCF) using hard filter criteria intended to discriminate true-positive (TP) from false-positive (FP) calls. These are applied on the simple principle that certain characteristics are disproportionately represented among the set of FP calls and that a user-chosen threshold can maximize the number detected. To provide guidance on this issue, this study empirically characterized all false SNP and indel calls made using real Illumina sequencing data from six disparate species and 166 variant-calling pipelines (the combination of 14 read aligners with up to 13 different variant callers, plus four ‘all-in-one’ pipelines). We did not seek to optimize filter thresholds but instead to draw attention to those filters of greatest efficacy and the pipelines to which they may most usefully be applied. In this respect, this study acts as a coda to our previous benchmarking evaluation of bacterial variant callers, and provides general recommendations for effective practice. The results suggest that, of the pipelines analysed in this study, the most straightforward way of minimizing false positives would simply be to use Snippy. We also find that a disproportionate number of false calls, irrespective of the variant-calling pipeline, are located in the vicinity of indels, and highlight this as an issue for future development.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
9
|
Zheng Q. New approaches to mutation rate fold change in Luria-Delbrück fluctuation experiments. Math Biosci 2021; 335:108572. [PMID: 33662405 DOI: 10.1016/j.mbs.2021.108572] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 02/23/2021] [Accepted: 02/23/2021] [Indexed: 02/03/2023]
Abstract
For nearly eight decades the Luria-Delbrück protocol remains the preferred method for experimentally determining microbial mutation rates. However, earnest development and rigorous applications of statistical methods for mutation rate comparison using fluctuation assay data are a relatively recent phenomenon. While likelihood ratio tests tailored for the fluctuation protocol give investigators appropriate tools, researchers sometimes may prefer to view the comparison of two mutation rates through the lens of the ratio of the two mutation rates. The idea of using the bootstrap technique to construct intervals for mutation rate fold change was proposed nearly a decade ago, but it failed to gain traction partly due to a failure to incorporate likelihood-based estimation. In addition to extending the bootstrap method, this paper proposes two new methods of constructing intervals for mutation rate fold change: a profile likelihood method and a Bayesian method utilizing Monte Carlo Markov chain. All three methods are assessed by large-scale simulations.
Collapse
Affiliation(s)
- Qi Zheng
- Department of Epidemiology and Biostatistics, Texas A&M School of Public Health, College Station, TX 77843, United States of America.
| |
Collapse
|
10
|
Naji MM, Utsunomiya YT, Sölkner J, Rosen BD, Mészáros G. Investigation of ancestral alleles in the Bovinae subfamily. BMC Genomics 2021; 22:108. [PMID: 33557747 PMCID: PMC7871596 DOI: 10.1186/s12864-021-07412-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 01/27/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND In evolutionary theory, divergence and speciation can arise from long periods of reproductive isolation, genetic mutation, selection and environmental adaptation. After divergence, alleles can either persist in their initial state (ancestral allele - AA), co-exist or be replaced by a mutated state (derived alleles -DA). In this study, we aligned whole genome sequences of individuals from the Bovinae subfamily to the cattle reference genome (ARS.UCD-1.2) for defining ancestral alleles necessary for selection signatures study. RESULTS Accommodating independent divergent of each lineage from the initial ancestral state, AA were defined based on fixed alleles on at least two groups of yak, bison and gayal-gaur-banteng resulting in ~ 32.4 million variants. Using non-overlapping scanning windows of 10 Kb, we counted the AA observed within taurine and zebu cattle. We focused on the extreme points, regions with top 0. 1% (high count) and regions without any occurrence of AA (null count). High count regions preserved gene functions from ancestral states that are still beneficial in the current condition, while null counts regions were linked to mutated ones. For both cattle, high count regions were associated with basal lipid metabolism, essential for survival of various environmental pressures. Mutated regions were associated to productive traits in taurine, i.e. higher metabolism, cell development and behaviors and in immune response domain for zebu. CONCLUSIONS Our findings suggest that retaining and losing AA in some regions are varied and made it species-specific with possibility of overlapping as it depends on the selective pressure they had to experience.
Collapse
Affiliation(s)
- Maulana M. Naji
- University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | - Yuri T. Utsunomiya
- São Paulo State University (Unesp), School of Veterinary Medicine, Department of Production and Animal Health, Araçatuba, São Paulo Brazil
- International Atomic Energy Agency (IAEA) Collaborating Centre on Animal Genomics and Bioinformatics, Araçatuba, São Paulo Brazil
- AgroPartners Consulting. R. Floriano Peixoto, 120-Sala 43A-Centro, Araçatuba, SP 16010-220 Brazil
- Personal-PEC. R. Sebastiao Lima, 1336-Centro, Campo Grande, MS 79004-600 Brazil
| | - Johann Sölkner
- University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| | | | - Gábor Mészáros
- University of Natural Resources and Life Sciences (BOKU), Vienna, Austria
| |
Collapse
|
11
|
Eschenbrenner CJ, Feurtey A, Stukenbrock EH. Population Genomics of Fungal Plant Pathogens and the Analyses of Rapidly Evolving Genome Compartments. Methods Mol Biol 2021; 2090:337-355. [PMID: 31975174 DOI: 10.1007/978-1-0716-0199-0_14] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Genome sequencing of fungal pathogens have documented extensive variation in genome structure and composition between species and in many cases between individuals of the same species. This type of genomic variation can be adaptive for pathogens to rapidly evolve new virulence phenotypes. Analyses of genome-wide variation in fungal pathogen genomes rely on high quality assemblies and methods to detect and quantify structural variation. Population genomic studies in fungi have addressed the underlying mechanisms whereby structural variation can be rapidly generated. Transposable elements, high mutation and recombination rates as well as incorrect chromosome segregation during mitosis and meiosis contribute to extensive variation observed in many species. We here summarize key findings in the field of fungal pathogen genomics and we discuss methods to detect and characterize structural variants including an alignment-based pipeline to study variation in population genomic data.
Collapse
Affiliation(s)
- Christoph J Eschenbrenner
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Alice Feurtey
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Eva H Stukenbrock
- Environmental Genomics, Christian-Albrechts University of Kiel, Kiel, Germany.
- Max Planck Institute for Evolutionary Biology, Plön, Germany.
| |
Collapse
|
12
|
Sukumar S, Krishnan A, Banerjee S. An Overview of Bioinformatics Resources for SNP Analysis. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
13
|
Truelsen D, Pereira V, Phillips C, Morling N, Børsting C. Evaluation of a custom GeneRead™ massively parallel sequencing assay with 210 ancestry informative SNPs using the Ion S5™ and MiSeq platforms. Forensic Sci Int Genet 2020; 50:102411. [PMID: 33176271 DOI: 10.1016/j.fsigen.2020.102411] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 10/14/2020] [Accepted: 10/19/2020] [Indexed: 01/20/2023]
Abstract
A custom GeneRead DNAseq SNP panel with 210 markers was evaluated using the Ion S5 and MiSeq sequencing platforms. Sensitivity, PCR cycle number, and the use of half volume of reagents for target enrichment and library preparation were tested. Furthermore, genotype concordance between results obtained with the different sequencing platforms and with known profiles generated using other sequencing assays was analysed. The GeneRead DNASeq SNP assay gave reproducible results with an input of 200 pg DNA on both platforms. A total of 204 loci were successfully sequenced. Three loci failed completely in the PCR amplification, and three additional loci displayed frequent locus drop-outs due to low read depth or high heterozygote imbalance. Overall, the read depth across the loci was more well-balanced with the MiSeq, while the heterozygote balance was less variable with the Ion S5. Noise levels were low on both platforms (median< 0.2 %). Two simple criteria for genotyping were applied: A minimum threshold of 45 reads and an acceptable heterozygote balance range of 0.3-3.0. Complete concordance between platforms was observed except for three genotypes in one of the poorly performing loci, rs1470637. This locus had relatively low read depths on both platforms, skewed heterozygote balance, and frequent locus drop-outs. There was also full genotype concordance between the results from the GeneRead assay and known profiles generated with the QIAseq and Ion AmpliSeq assays. The few discordant results were either due to locus drop-outs in the poorly performing loci or allele drop-outs in the QIAseq assay. Profiles with a minimum of 179 SNPs were obtained from four challenging case work samples (blood swabs, bone, or blood from a corpse). Overall, the GeneRead DNASeq assay showed considerable potential and could provide a reliable method for SNP genotyping in cases involving identification of individuals, prediction of phenotypic traits, and ancestry inference.
Collapse
Affiliation(s)
- Ditte Truelsen
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark.
| | - Vania Pereira
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Chris Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain
| | - Niels Morling
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark; Department of Mathematical Sciences, Aalborg University, DK-9220 Aalborg East, Denmark
| | - Claus Børsting
- Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, 2100 Copenhagen, Denmark
| |
Collapse
|
14
|
Yan R, Luo J, He X, Li S. Association between ABC family variants rs1800977, rs4149313, and rs1128503 and susceptibility to type 2 diabetes in a Chinese Han population. J Int Med Res 2020; 48:300060520941347. [PMID: 32762489 PMCID: PMC7557792 DOI: 10.1177/0300060520941347] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 06/19/2020] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVE To investigate the association between three single nucleotide polymorphisms (SNPs) of the ATP-binding cassette (ABC) gene family and susceptibility to type 2 diabetes mellitus in a Chinese Han population. METHODS A total of 1086 type 2 diabetes patients and 1122 healthy controls were included in this retrospective study. Three genetic variants, rs1800977 and rs4149313 in ABCA1, and rs1128503 in ABCB1 were included in the study. Susceptibility to type 2 diabetes was evaluated under three genetic models. RESULTS A significant association between rs1800977 and type 2 diabetes was identified in three different genetic models (TT vs CC, odds ratio [OR] = 0.611 [95% confidence interval (CI), 0.469-0.798]; T vs C, OR = 0.841 [95% CI, 0.745-0.950]; and the recessive model, OR = 0.606 [95% CI, 0.474-0.774]). Additionally, a significant association between rs4149313 and type 2 diabetes was identified in three different genetic models (AA vs GG, OR = 0.467 [95% CI, 0.326-0.670]; A vs G, OR = 0.819 [95% CI, 0.717-0.935]; and the recessive model, OR = 0.478 [95% CI, 0.336-0.680]). CONCLUSION We found that SNPs rs1800977 and rs4149313 in ABCA1 are significantly associated with susceptibility to type 2 diabetes in a Chinese population, although this should be confirmed in a larger study.
Collapse
Affiliation(s)
- Ruicheng Yan
- Department of Gastrointestinal Surgery, East Section of Renmin Hospital of Wuhan University, Wuhan, China
- Department of Bariatric Surgery, Renmin Hospital of Wuhan University, Wuhan, China
| | - Jianfei Luo
- Department of Bariatric Surgery, Renmin Hospital of Wuhan University, Wuhan, China
| | - Xiaobo He
- Department of Gastrointestinal Surgery, East Section of Renmin Hospital of Wuhan University, Wuhan, China
| | - Shijun Li
- Department of Bariatric Surgery, Renmin Hospital of Wuhan University, Wuhan, China
| |
Collapse
|
15
|
Devadasan MJ, Kumar DR, Vineeth MR, Choudhary A, Surya T, Niranjan SK, Verma A, Sivalingam J. Reduced representation approach for identification of genome-wide SNPs and their annotation for economically important traits in Indian Tharparkar cattle. 3 Biotech 2020; 10:309. [PMID: 32582506 DOI: 10.1007/s13205-020-02297-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 06/09/2020] [Indexed: 11/24/2022] Open
Abstract
The present study was carried out in Tharparkar cattle for identification of genome-wide SNPs and microsatellites, and then annotate the identified high-quality SNPs to milk production, fertility, carcass, adaptability and immune response of economically important traits. A total of 146,011 SNPs were identified with respect to Bos taurus reference genome which are indicus specific, out of which 10,519 SNPs were found to be novel. Similarly, a total of 87,047 SNPs were identified with respect to Bos indicus reference genome. After final annotation of SNPs identified with respect to Bos indicus reference genome, 2871 SNPs were found to be associated in 383 candidate genes having to do with milk production, fertility, carcass, immune response and adaptability traits. Following that, 2571 microsatellites were identified. The information mined from the data might be of importance for the future breed improvement programs, conservation efforts and for enhancing the SNPs density of the existing bovine SNP chips.
Collapse
Affiliation(s)
| | - D Ravi Kumar
- ICAR-National Dairy Research Institute, Karnal, India
| | - M R Vineeth
- ICAR-National Dairy Research Institute, Karnal, India
| | | | - T Surya
- ICAR-National Dairy Research Institute, Karnal, India
| | - S K Niranjan
- ICAR-National Bureau of Animal Genetic Resources, Karnal, India
| | - Archana Verma
- ICAR-National Dairy Research Institute, Karnal, India
| | | |
Collapse
|
16
|
Challenges and opportunities for strain verification by whole-genome sequencing. Sci Rep 2020; 10:5873. [PMID: 32245992 PMCID: PMC7125075 DOI: 10.1038/s41598-020-62364-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 03/11/2020] [Indexed: 11/28/2022] Open
Abstract
Laboratory strains, cell lines, and other genetic materials change hands frequently in the life sciences. Despite evidence that such materials are subject to mix-ups, contamination, and accumulation of secondary mutations, verification of strains and samples is not an established part of many experimental workflows. With the plummeting cost of next generation technologies, it is conceivable that whole genome sequencing (WGS) could be applied to routine strain and sample verification in the future. To demonstrate the need for strain validation by WGS, we sequenced haploid yeast segregants derived from a popular commercial mutant collection and identified several unexpected mutations. We determined that available bioinformatics tools may be ill-suited for verification and highlight the importance of finishing reference genomes for commonly used laboratory strains.
Collapse
|
17
|
Tahir M, Sardaraz M. A Fast and Scalable Workflow for SNPs Detection in Genome Sequences Using Hadoop Map-Reduce. Genes (Basel) 2020; 11:E166. [PMID: 32033366 PMCID: PMC7074349 DOI: 10.3390/genes11020166] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2020] [Revised: 01/31/2020] [Accepted: 02/01/2020] [Indexed: 11/16/2022] Open
Abstract
Next generation sequencing (NGS) technologies produce a huge amount of biological data, which poses various issues such as requirements of high processing time and large memory. This research focuses on the detection of single nucleotide polymorphism (SNP) in genome sequences. Currently, SNPs detection algorithms face several issues, e.g., computational overhead cost, accuracy, and memory requirements. In this research, we propose a fast and scalable workflow that integrates Bowtie aligner with Hadoop based Heap SNP caller to improve the SNPs detection in genome sequences. The proposed workflow is validated through benchmark datasets obtained from publicly available web-portals, e.g., NCBI and DDBJ DRA. Extensive experiments have been performed and the results obtained are compared with Bowtie and BWA aligner in the alignment phase, while compared with GATK, FaSD, SparkGA, Halvade, and Heap in SNP calling phase. Experimental results analysis shows that the proposed workflow outperforms existing frameworks e.g., GATK, FaSD, Heap integrated with BWA and Bowtie aligners, SparkGA, and Halvade. The proposed framework achieved 22.46% more efficient F-score and 99.80% consistent accuracy on average. More, comparatively 0.21% mean higher accuracy is achieved. Moreover, SNP mining has also been performed to identify specific regions in genome sequences. All the frameworks are implemented with the default configuration of memory management. The observations show that all workflows have approximately same memory requirement. In the future, it is intended to graphically show the mined SNPs for user-friendly interaction, analyze and optimize the memory requirements as well.
Collapse
Affiliation(s)
| | - Muhammad Sardaraz
- Department of Computer Science, COMSATS University Islamabad, Attock Campus 43600, Pakistan;
| |
Collapse
|
18
|
Bush SJ, Foster D, Eyre DW, Clark EL, De Maio N, Shaw LP, Stoesser N, Peto TEA, Crook DW, Walker AS. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines. Gigascience 2020; 9:giaa007. [PMID: 32025702 PMCID: PMC7002876 DOI: 10.1093/gigascience/giaa007] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 12/02/2019] [Accepted: 01/15/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella. RESULTS We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis. CONCLUSIONS The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Dona Foster
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - David W Eyre
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Emily L Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SH, UK
| | - Liam P Shaw
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Nicole Stoesser
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Tim E A Peto
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Derrick W Crook
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - A Sarah Walker
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| |
Collapse
|
19
|
Variant Calling Using Whole Genome Resequencing and Sequence Capture for Population and Evolutionary Genomic Inferences in Norway Spruce (Picea Abies). COMPENDIUM OF PLANT GENOMES 2020. [DOI: 10.1007/978-3-030-21001-4_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
20
|
Calarco L, Barratt J, Ellis J. Detecting sequence variants in clinically important protozoan parasites. Int J Parasitol 2019; 50:1-18. [PMID: 31857072 DOI: 10.1016/j.ijpara.2019.10.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 09/29/2019] [Accepted: 10/01/2019] [Indexed: 02/06/2023]
Abstract
Second and third generation sequencing methods are crucial for population genetic studies, and variant detection is a popular approach for exploiting this sequence data. While mini- and microsatellites are historically useful markers for studying important Protozoa such as Toxoplasma and Plasmodium spp., detecting non-repetitive variants such as those found in genes can be fundamental to investigating a pathogen's biology. These variants, namely single nucleotide polymorphisms and insertions and deletions, can help elucidate the genetic basis of an organism's pathogenicity, identify selective pressures, and resolve phylogenetic relationships. They also have the added benefit of possessing a comparatively low mutation rate, which contributes to their stability. However, there is a plethora of variant analysis tools with nuanced pipelines and conflicting recommendations for best practise, which can be confounding. This lack of standardisation means that variant analysis requires careful parameter optimisation, an understanding of its limitations, and the availability of high quality data. This review explores the value of variant detection when applied to non-model organisms such as clinically important protozoan pathogens. The limitations of current methods are discussed, including special considerations that require the end-users' attention to ensure that the results generated are reproducible, and the biological conclusions drawn are valid.
Collapse
Affiliation(s)
- Larissa Calarco
- School of Life Sciences, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia.
| | - Joel Barratt
- School of Life Sciences, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| | - John Ellis
- School of Life Sciences, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia
| |
Collapse
|
21
|
Jin J, Liu J, Yin Y, Li Z, Lu P, Xu Y, Zhang J, Cao P, Hu D. PVCTools: parallel variation calling tools. Heliyon 2019; 5:e02530. [PMID: 31667383 PMCID: PMC6812194 DOI: 10.1016/j.heliyon.2019.e02530] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 09/08/2019] [Accepted: 09/24/2019] [Indexed: 11/28/2022] Open
Abstract
As the development of sequencing technology, it is now possible to sequence individuals of each species. Although a number of different tools have been developed to detect individual variations, most of them cannot be run in parallel modes. To accelerate variation detection, PVCTools is introduced in this study. PVCTools splits the reference genome and alignment files into small pieces and runs them in parallel mode. Meanwhile, boundary noise is also considered in PVCTools. From the result of three different sets of test data, PVCTools performs much faster than most other current tools. At the same time, it keeps similar accuracy with other tools. PVCTools is free and open source software. The development of sequencing technology and growing sample numbers will make performance improvements such as PVCTools increasingly interesting.
Collapse
Affiliation(s)
- Jingjing Jin
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 450001, China
| | - Jiajun Liu
- Computer Science and Technology, Sichuan University, Chengdu, 450000, China
| | - Yelin Yin
- Zhengzhou Tongbiao Environmental Testing Co., LTD, Zhengzhou, 450001, China
| | - Zefeng Li
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 450001, China
| | - Peng Lu
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 450001, China
| | - Yalong Xu
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 450001, China
| | - Jianfeng Zhang
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 450001, China
| | - Peijian Cao
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou, 450001, China
| | - Dasha Hu
- Computer Science and Technology, Sichuan University, Chengdu, 450000, China
| |
Collapse
|
22
|
Huang CJ, Lu MY, Chang YW, Li WH. Experimental Evolution of Yeast for High-Temperature Tolerance. Mol Biol Evol 2019; 35:1823-1839. [PMID: 29684163 DOI: 10.1093/molbev/msy077] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Thermotolerance is a polygenic trait that contributes to cell survival and growth under unusually high temperatures. Although some genes associated with high-temperature growth (Htg+) have been identified, how cells accumulate mutations to achieve prolonged thermotolerance is still mysterious. Here, we conducted experimental evolution of a Saccharomyces cerevisiae laboratory strain with stepwise temperature increases for it to grow at 42 °C. Whole genome resequencing of 14 evolved strains and the parental strain revealed a total of 153 mutations in the evolved strains, including single nucleotide variants, small INDELs, and segmental duplication/deletion events. Some mutations persisted from an intermediate temperature to 42 °C, so they might be Htg+ mutations. Functional categorization of mutations revealed enrichment of exonic mutations in the SWI/SNF complex and F-type ATPase, pointing to their involvement in high-temperature tolerance. In addition, multiple mutations were found in a general stress-associated signal transduction network consisting of Hog1 mediated pathway, RAS-cAMP pathway, and Rho1-Pkc1 mediated cell wall integrity pathway, implying that cells can achieve Htg+ partly through modifying existing stress regulatory mechanisms. Using pooled segregant analysis of five Htg+ phenotype-orientated pools, we inferred causative mutations for growth at 42 °C and identified those mutations with stronger impacts on the phenotype. Finally, we experimentally validated a number of the candidate Htg+ mutations. This study increased our understanding of the genetic basis of yeast tolerance to high temperature.
Collapse
Affiliation(s)
- Chih-Jen Huang
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan.,Molecular and Biological Agricultural Sciences Program, Taiwan International Graduate Program, Academia Sinica and National Chung-Hsing University, Taipei, Taiwan.,Graduate Institute of Biotechnology, National Chung-Hsing University, Taichung, Taiwan
| | - Mei-Yeh Lu
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Ya-Wen Chang
- Department of Clinical Laboratory Sciences and Medical Biotechnology, National Taiwan University, Taipei, Taiwan
| | - Wen-Hsiung Li
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan.,Molecular and Biological Agricultural Sciences Program, Taiwan International Graduate Program, Academia Sinica and National Chung-Hsing University, Taipei, Taiwan.,Biotechnology Center, National Chung-Hsing University, Taichung, Taiwan.,Department of Ecology and Evolution, University of Chicago, Chicago, IL
| |
Collapse
|
23
|
Wang Y, Shahid MQ, Ghouri F, Ercişli S, Baloch FS, Nie F. Transcriptome analysis and annotation: SNPs identified from single copy annotated unigenes of three polyploid blueberry crops. PLoS One 2019; 14:e0216299. [PMID: 31034501 PMCID: PMC6488077 DOI: 10.1371/journal.pone.0216299] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 04/17/2019] [Indexed: 02/03/2023] Open
Abstract
Blueberry is a kind of new rising popular perennial fruit with high healthful quality. It is of utmost importance to develop new blueberry varieties for different climatic zones to satisfy the demand of people in the world. Molecular marker assisted breeding is believed to be an ideal method for the development of new blueberry varieties for its shorter breeding cycle than the conventional breeding. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) markers are widely used molecular tools for marker assisted breeding, which could be detected at large scale by the transcriptome sequencing. Here, we sequenced the leaves transcriptome of 19 rabbiteye (Vaccinium ashei Reade), 13 southern highbush (Vaccinium. corymbosum L × native southern Vaccinium Spp) and 22 cultivars of northern highbush blueberry (Vaccinium corymbosum L) by using next generation sequencing technologies. A total of 80.825 Gb clean data with an average of about 12.525 million reads per cultivar were obtained. We assembled 58,968, 55,973 and 53,887 unigenes by using the clean data from rabbiteye, southern highbush and northern highbush blueberry cultivars, respectively. Among these unigenes, 3599, 3495 and 3513 unigenes were detected as candidate resistance genes in three blueberry crops. Moreover, we identified more than 8756, 9020, and 9198 SSR markers from these unigenes, and 7665, 4861, 13,063 SNPs from the annotated single copy unigenes, respectively. The results will be helpful for the molecular genetics and association analysis of blueberry and the basic molecular information of pest and disease resistance of blueberry, and would also offer huge number of molecular tools for the marker assisted breeding to produce blueberry cultivars with different adaptive characteristics.
Collapse
Affiliation(s)
- Yunsheng Wang
- College of Life and Health Science, Kaili University, Kaili City, Guizhou Province, China
- * E-mail: (YW); (FN)
| | - Muhammad Qasim Shahid
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou, China
- College of Agriculture, South China Agricultural University, Guangzhou, Guangdong Province, China
| | - Fozia Ghouri
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, South China Agricultural University, Guangzhou, China
- College of Agriculture, South China Agricultural University, Guangzhou, Guangdong Province, China
| | - Sezai Ercişli
- Department of Horticulture, Faculty of Agriculture, Ataturk University, Erzurum, Turkey
| | - Faheem Shehzad Baloch
- Department of Field Crops, Faculty of Agricultural and Natural Sciences, Abant İzzet Baysal University, Bolu, Turkey
| | - Fei Nie
- Biological Institute of Guizhou Province, Guiyang City, Guizhou Province, China
- * E-mail: (YW); (FN)
| |
Collapse
|
24
|
Tang M, Hasan MS, Zhu H, Zhang L, Wu X. vi-HMM: a novel HMM-based method for sequence variant identification in short-read data. Hum Genomics 2019; 13:9. [PMID: 30795817 PMCID: PMC6387560 DOI: 10.1186/s40246-019-0194-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 01/29/2019] [Indexed: 12/30/2022] Open
Abstract
Background Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in next-generation sequencing (NGS) applications. Existing methods for calling these variants often make simplified assumptions of positional independence and fail to leverage the dependence between genotypes at nearby loci that is caused by linkage disequilibrium (LD). Results and conclusion We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short-read data. This method allows transitions between hidden states (defined as “SNP,” “Ins,” “Del,” and “Match”) of adjacent genomic bases and determines an optimal hidden state path by using the Viterbi algorithm. The inferred hidden state path provides a direct solution to the identification of SNPs and INDELs. Simulation studies show that, under various sequencing depths, vi-HMM outperforms commonly used variant calling methods in terms of sensitivity and F1 score. When applied to the real data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs. Electronic supplementary material The online version of this article (10.1186/s40246-019-0194-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Man Tang
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, Blacksburg, 24061, VA, USA
| | - Mohammad Shabbir Hasan
- Department of Computer Science, Virginia Tech, 225 Stanger Street, Blacksburg, 24060, VA, USA
| | - Hongxiao Zhu
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, Blacksburg, 24061, VA, USA
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, 225 Stanger Street, Blacksburg, 24060, VA, USA
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, Blacksburg, 24061, VA, USA.
| |
Collapse
|
25
|
Muyas F, Bosio M, Puig A, Susak H, Domènech L, Escaramis G, Zapata L, Demidov G, Estivill X, Rabionet R, Ossowski S. Allele balance bias identifies systematic genotyping errors and false disease associations. Hum Mutat 2018; 40:115-126. [PMID: 30353964 PMCID: PMC6587442 DOI: 10.1002/humu.23674] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 09/17/2018] [Accepted: 10/20/2018] [Indexed: 12/13/2022]
Abstract
In recent years, next‐generation sequencing (NGS) has become a cornerstone of clinical genetics and diagnostics. Many clinical applications require high precision, especially if rare events such as somatic mutations in cancer or genetic variants causing rare diseases need to be identified. Although random sequencing errors can be modeled statistically and deep sequencing minimizes their impact, systematic errors remain a problem even at high depth of coverage. Understanding their source is crucial to increase precision of clinical NGS applications. In this work, we studied the relation between recurrent biases in allele balance (AB), systematic errors, and false positive variant calls across a large cohort of human samples analyzed by whole exome sequencing (WES). We have modeled the AB distribution for biallelic genotypes in 987 WES samples in order to identify positions recurrently deviating significantly from the expectation, a phenomenon we termed allele balance bias (ABB). Furthermore, we have developed a genotype callability score based on ABB for all positions of the human exome, which detects false positive variant calls that passed state‐of‐the‐art filters. Finally, we demonstrate the use of ABB for detection of false associations proposed by rare variant association studies. Availability: https://github.com/Francesc-Muyas/ABB.
Collapse
Affiliation(s)
- Francesc Muyas
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Mattia Bosio
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Anna Puig
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Hana Susak
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Laura Domènech
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER in Epidemiology and Public Health (CIBERESP), Barcelona, Spain
| | - Georgia Escaramis
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER in Epidemiology and Public Health (CIBERESP), Barcelona, Spain
| | - Luis Zapata
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - German Demidov
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| | - Xavier Estivill
- Sidra Medicine, Doha, Qatar.,Women's Health Dexeus, Barcelona, Spain
| | - Raquel Rabionet
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,CIBER in Epidemiology and Public Health (CIBERESP), Barcelona, Spain.,Institut de Recerca Sant Joan de Déu; Institut de Biomedicina de la Universitat de Barcelona (IBUB), ; & Department of Genetics, Microbiology and Statistics, University of Barcelona, Barcelona, Spain
| | - Stephan Ossowski
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
| |
Collapse
|
26
|
Barbosa S, Mestre F, White TA, Paupério J, Alves PC, Searle JB. Integrative approaches to guide conservation decisions: Using genomics to define conservation units and functional corridors. Mol Ecol 2018; 27:3452-3465. [PMID: 30030869 DOI: 10.1111/mec.14806] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 07/01/2018] [Accepted: 07/05/2018] [Indexed: 01/13/2023]
Abstract
Climate change and increasing habitat loss greatly impact species survival, requiring range shifts, phenotypic plasticity and/or evolutionary change for long-term persistence, which may not readily occur unaided in threatened species. Therefore, defining conservation actions requires a detailed assessment of evolutionary factors. Existing genetic diversity needs to be thoroughly evaluated and spatially mapped to define conservation units (CUs) in an evolutionary context, and we address that here. We also propose a multidisciplinary approach to determine corridors and functional connectivity between CUs by including genetic diversity in the modelling while controlling for isolation by distance and phylogeographic history. We evaluate our approach on a Near Threatened Iberian endemic rodent by analysing genotyping-by-sequencing (GBS) genomic data from 107 Cabrera voles (Microtus cabrerae), screening the entire species distribution to define categories of CUs and their connectivity: We defined six management units (MUs) which can be grouped into four evolutionarily significant units (ESUs) and three (putatively) adaptive units (AUs). We demonstrate that the three different categories of CU can be objectively defined using genomic data, and their characteristics and connectivity can inform conservation decision-making. In particular, we show that connectivity of the Cabrera vole is very limited in eastern Iberia and that the pre-Pyrenean and part of the Betic geographic nuclei contribute the most to the species genetic diversity. We argue that a multidisciplinary framework for CU definition is essential and that this framework needs a strong evolutionary basis.
Collapse
Affiliation(s)
- Soraia Barbosa
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto/InBIO Laboratório Associado, Vairão, Portugal.,Departamento de Biologia, Faculdade de Ciências da Universidade do Porto, Porto, Portugal.,Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York
| | - Frederico Mestre
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade de Évora/InBIO Laboratório Associado, Évora, Portugal
| | - Thomas A White
- Lancaster Environment Centre, Lancaster University, Lancaster, UK
| | - Joana Paupério
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto/InBIO Laboratório Associado, Vairão, Portugal
| | - Paulo C Alves
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto/InBIO Laboratório Associado, Vairão, Portugal.,Departamento de Biologia, Faculdade de Ciências da Universidade do Porto, Porto, Portugal
| | - Jeremy B Searle
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto/InBIO Laboratório Associado, Vairão, Portugal.,Departamento de Biologia, Faculdade de Ciências da Universidade do Porto, Porto, Portugal.,Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York
| |
Collapse
|
27
|
Abstract
Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. Insect guts harbor various microbes that are important for host digestion, immune response, and disease dispersal in certain cases. Bacteria, which are among the primary endosymbionts, have been studied extensively. However, fungi, which are also frequently encountered, are poorly known with respect to their biology within the insect guts. To understand the genomic features and related biology, we produced the whole-genome sequences of nine gut commensal fungi from disease-bearing insects (black flies, midges, and mosquitoes). The results show that insect gut fungi tend to have low GC content across their genomes. By comparing these commensals with entomopathogenic and free-living fungi that have available genome sequences, we found a universal core gene toolbox that is unique and thus potentially important for the insect-fungus symbiosis. This comparative work also uncovered different host invasion strategies employed by insect pathogens and commensals, as well as a model system to study ancient fungal genome duplication within the gut of insects.
Collapse
|
28
|
Ye S, Yuan X, Lin X, Gao N, Luo Y, Chen Z, Li J, Zhang X, Zhang Z. Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population. J Anim Sci Biotechnol 2018; 9:30. [PMID: 29581880 PMCID: PMC5861640 DOI: 10.1186/s40104-018-0241-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2017] [Accepted: 01/26/2018] [Indexed: 11/24/2022] Open
Abstract
Background Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation. Results We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24X to 144X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth (12X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to 0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for re-sequencing. With fixed reference population size (24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1X to 12X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study. Conclusions In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations. Electronic supplementary material The online version of this article (10.1186/s40104-018-0241-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shaopan Ye
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Xiaolong Yuan
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Xiran Lin
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Ning Gao
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Yuanyu Luo
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Zanmou Chen
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Jiaqi Li
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Xiquan Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, National Engineering Research Centre for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou, Guangdong China
| |
Collapse
|
29
|
Reis VNDS, Kitajima JP, Tahira AC, Feio-dos-Santos AC, Fock RA, Lisboa BCG, Simões SN, Krepischi ACV, Rosenberg C, Lourenço NC, Passos-Bueno MR, Brentani H. Integrative Variation Analysis Reveals that a Complex Genotype May Specify Phenotype in Siblings with Syndromic Autism Spectrum Disorder. PLoS One 2017; 12:e0170386. [PMID: 28118382 PMCID: PMC5261619 DOI: 10.1371/journal.pone.0170386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 12/31/2016] [Indexed: 12/30/2022] Open
Abstract
It has been proposed that copy number variations (CNVs) are associated with increased risk of autism spectrum disorder (ASD) and, in conjunction with other genetic changes, contribute to the heterogeneity of ASD phenotypes. Array comparative genomic hybridization (aCGH) and exome sequencing, together with systems genetics and network analyses, are being used as tools for the study of complex disorders of unknown etiology, especially those characterized by significant genetic and phenotypic heterogeneity. Therefore, to characterize the complex genotype-phenotype relationship, we performed aCGH and sequenced the exomes of two affected siblings with ASD symptoms, dysmorphic features, and intellectual disability, searching for de novo CNVs, as well as for de novo and rare inherited point variations—single nucleotide variants (SNVs) or small insertions and deletions (indels)—with probable functional impacts. With aCGH, we identified, in both siblings, a duplication in the 4p16.3 region and a deletion at 8p23.3, inherited by a paternal balanced translocation, t(4, 8) (p16; p23). Exome variant analysis found a total of 316 variants, of which 102 were shared by both siblings, 128 were in the male sibling exome data, and 86 were in the female exome data. Our integrative network analysis showed that the siblings’ shared translocation could explain their similar syndromic phenotype, including overgrowth, macrocephaly, and intellectual disability. However, exome data aggregate genes to those already connected from their translocation, which are important to the robustness of the network and contribute to the understanding of the broader spectrum of psychiatric symptoms. This study shows the importance of using an integrative approach to explore genotype-phenotype variability.
Collapse
MESH Headings
- Autism Spectrum Disorder/genetics
- Child
- Chromosomes, Human, Pair 4/genetics
- Chromosomes, Human, Pair 4/ultrastructure
- Chromosomes, Human, Pair 8/genetics
- Chromosomes, Human, Pair 8/ultrastructure
- Comparative Genomic Hybridization
- DNA Copy Number Variations
- Exome/genetics
- Female
- Gene Duplication
- Gene Regulatory Networks
- Genetic Association Studies
- Humans
- In Situ Hybridization, Fluorescence
- Intellectual Disability/genetics
- Learning Disabilities/genetics
- Male
- Megalencephaly/genetics
- Nerve Tissue Proteins/genetics
- Nucleic Acid Amplification Techniques
- Sequence Deletion
- Siblings
- Syndrome
- Translocation, Genetic
Collapse
Affiliation(s)
| | | | - Ana Carolina Tahira
- LIM23-Institute of Psychiatry, University of São Paulo School of Medicine, São Paulo, Brazil
| | | | - Rodrigo Ambrósio Fock
- Department of Morphology and Genetics, Federal University of São Paulo, São Paulo, Brazil
| | | | - Sérgio Nery Simões
- Department of Informatics, Federal Institute of Espírito Santo, Serra, Brazil
| | - Ana C. V. Krepischi
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, University of Sao Paulo, São Paulo, Brazil
| | - Carla Rosenberg
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, University of Sao Paulo, São Paulo, Brazil
| | - Naila Cristina Lourenço
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, University of Sao Paulo, São Paulo, Brazil
| | - Maria Rita Passos-Bueno
- Department of Genetics and Evolutionary Biology, Institute of Biosciences, University of Sao Paulo, São Paulo, Brazil
| | - Helena Brentani
- LIM23-Institute of Psychiatry, University of São Paulo School of Medicine, São Paulo, Brazil
| |
Collapse
|
30
|
Wright MN, Gola D, Ziegler A. Preprocessing and Quality Control for Whole-Genome Sequences from the Illumina HiSeq X Platform. Methods Mol Biol 2017; 1666:629-647. [PMID: 28980267 DOI: 10.1007/978-1-4939-7274-6_30] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The advancement of high-throughput sequencing technologies enables sequencing of human genomes at steadily decreasing costs and increasing quality. Before variants can be analyzed, e.g., in association studies, the raw data obtained from the sequencer need to be preprocessed. These preprocessing steps include the removal of adapters, duplicates, and contaminations, alignment to a reference genome and the postprocessing of the alignment. All later steps, such as variant discovery, rely on high data quality and proper preprocessing, emphasizing the great importance of quality control. This chapter presents a workflow for preprocessing Illumina HiSeq X sequencing data. Code snippets are provided for illustrating all necessary steps, along with a brief description of the tools and underlying methods.
Collapse
Affiliation(s)
- Marvin N Wright
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein - Campus Lübeck, Lübeck, Germany.
| | - Damian Gola
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein - Campus Lübeck, Lübeck, Germany
| | - Andreas Ziegler
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein - Campus Lübeck, Lübeck, Germany
| |
Collapse
|
31
|
Abstract
The outcome of an Entamoeba histolytica infection is variable and the contribution of genetic diversity within E. histolytica to human disease is not fully understood. The information provided by the whole genome sequence of the E. histolytica reference laboratory strain (HM-1:IMSS) and thirteen additional laboratory strains have been made publically available. In this review theories on the source of the unexpected level of structural diversity found in E. histolytica will be discussed.
Collapse
Affiliation(s)
- Carol A Gilchrist
- Department of Medicine, School of Medicine, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
32
|
Yoshihara M, Saito D, Sato T, Ohara O, Kuramoto T, Suyama M. Design and application of a target capture sequencing of exons and conserved non-coding sequences for the rat. BMC Genomics 2016; 17:593. [PMID: 27506932 PMCID: PMC4979189 DOI: 10.1186/s12864-016-2975-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Accepted: 07/28/2016] [Indexed: 12/22/2022] Open
Abstract
Background Target capture sequencing is an efficient approach to directly identify the causative mutations of genetic disorders. To apply this strategy to laboratory rats exhibiting various phenotypes, we developed a novel target capture probe set, TargetEC (target capture for exons and conserved non-coding sequences), which can identify mutations not only in exonic regions but also in conserved non-coding sequences and thus can detect regulatory mutations. Results TargetEC covers 1,078,129 regions spanning 146.8 Mb of the genome. We applied TargetEC to four inbred rat strains (WTC/Kyo, WTC-swh/Kyo, PVG/Seac, and KFRS4/Kyo) maintained by the National BioResource Project for the Rat in Japan, and successfully identified mutations associated with these phenotypes, including one mutation detected in a conserved non-coding sequence. Conclusions The method developed in this study can be used to efficiently identify regulatory mutations, which cannot be detected using conventional exome sequencing, and will help to deepen our understanding of the relationships between regulatory mutations and associated phenotypes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2975-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Minako Yoshihara
- Medical Institute of Bioregulation, Kyushu University, Maidashi 3-1-1, Higashi-ku, Fukuoka, 812-8582, Japan.,AMED-CREST, Japan Agency for Medical Research and Development, Fukuoka, 812-8582, Japan
| | - Daisuke Saito
- Medical Institute of Bioregulation, Kyushu University, Maidashi 3-1-1, Higashi-ku, Fukuoka, 812-8582, Japan.,AMED-CREST, Japan Agency for Medical Research and Development, Fukuoka, 812-8582, Japan
| | - Tetsuya Sato
- Medical Institute of Bioregulation, Kyushu University, Maidashi 3-1-1, Higashi-ku, Fukuoka, 812-8582, Japan.,AMED-CREST, Japan Agency for Medical Research and Development, Fukuoka, 812-8582, Japan
| | - Osamu Ohara
- Department of Technology Development, Kazusa DNA Research Institute, Kisarazu, 292-0818, Chiba, Japan
| | - Takashi Kuramoto
- Institute of Laboratory Animals, Graduate School of Medicine, Kyoto University, Kyoto, 606-8501, Japan
| | - Mikita Suyama
- Medical Institute of Bioregulation, Kyushu University, Maidashi 3-1-1, Higashi-ku, Fukuoka, 812-8582, Japan. .,AMED-CREST, Japan Agency for Medical Research and Development, Fukuoka, 812-8582, Japan.
| |
Collapse
|
33
|
Yang S, Mercante DE, Zhang K, Fang Z. An Integrated Approach for RNA-seq Data Normalization. Cancer Inform 2016; 15:129-41. [PMID: 27385909 PMCID: PMC4924883 DOI: 10.4137/cin.s39781] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Revised: 05/12/2016] [Accepted: 05/30/2016] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND DNA copy number alteration is common in many cancers. Studies have shown that insertion or deletion of DNA sequences can directly alter gene expression, and significant correlation exists between DNA copy number and gene expression. Data normalization is a critical step in the analysis of gene expression generated by RNA-seq technology. Successful normalization reduces/removes unwanted nonbiological variations in the data, while keeping meaningful information intact. However, as far as we know, no attempt has been made to adjust for the variation due to DNA copy number changes in RNA-seq data normalization. RESULTS In this article, we propose an integrated approach for RNA-seq data normalization. Comparisons show that the proposed normalization can improve power for downstream differentially expressed gene detection and generate more biologically meaningful results in gene profiling. In addition, our findings show that due to the effects of copy number changes, some housekeeping genes are not always suitable internal controls for studying gene expression. CONCLUSIONS Using information from DNA copy number, integrated approach is successful in reducing noises due to both biological and nonbiological causes in RNA-seq data, thus increasing the accuracy of gene profiling.
Collapse
Affiliation(s)
- Shengping Yang
- Department of Pathology, School of Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA.; Biostatistics Program, School of Public Health, LSU Health Sciences Center, New Orleans, LA, USA
| | - Donald E Mercante
- Biostatistics Program, School of Public Health, LSU Health Sciences Center, New Orleans, LA, USA
| | - Kun Zhang
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA, USA
| | - Zhide Fang
- Biostatistics Program, School of Public Health, LSU Health Sciences Center, New Orleans, LA, USA
| |
Collapse
|
34
|
Liu Y, Yan L, Li Z, Huang WF, Pokhrel S, Liu X, Su S. Larva-mediated chalkbrood resistance-associated single nucleotide polymorphism markers in the honey bee Apis mellifera. INSECT MOLECULAR BIOLOGY 2016; 25:239-250. [PMID: 26991518 DOI: 10.1111/imb.12216] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Chalkbrood is a disease affecting honey bees that seriously impairs brood growth and productivity of diseased colonies. Although honey bees can develop chalkbrood resistance naturally, the details underlying the mechanisms of resistance are not fully understood, and no easy method is currently available for selecting and breeding resistant bees. Finding the genes involved in the development of resistance and identifying single nucleotide polymorphisms (SNPs) that can be used as molecular markers of resistance is therefore a high priority. We conducted genome resequencing to compare resistant (Res) and susceptible (Sus) larvae that were selected following in vitro chalkbrood inoculation. Twelve genomic libraries, including 14.4 Gb of sequence data, were analysed using SNP-finding algorithms. Unique SNPs derived from chromosomes 2 and 11 were analysed in this study. SNPs from resistant individuals were confirmed by PCR and Sanger sequencing using in vitro reared larvae and resistant colonies. We found strong support for an association between the C allele at SNP C2587245T and chalkbrood resistance. SNP C2587245T may be useful as a genetic marker for the selection of chalkbrood resistance and high royal jelly production honey bee lines, thereby helping to minimize the negative effects of chalkbrood on managed honey bees.
Collapse
Affiliation(s)
- Y Liu
- College of Bee Science, Fujian Agriculture and Forestry University, Fuzhou, China
| | - L Yan
- College of Animal Sciences, Zhejiang University, Hangzhou, China
| | - Z Li
- College of Bee Science, Fujian Agriculture and Forestry University, Fuzhou, China
| | - W-F Huang
- College of Bee Science, Fujian Agriculture and Forestry University, Fuzhou, China
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois at Urbana-Champaign, Illinois, USA
| | - S Pokhrel
- College of Bee Science, Fujian Agriculture and Forestry University, Fuzhou, China
| | - X Liu
- College of Bee Science, Fujian Agriculture and Forestry University, Fuzhou, China
| | - S Su
- College of Bee Science, Fujian Agriculture and Forestry University, Fuzhou, China
- College of Animal Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
35
|
Payseur BA, Rieseberg LH. A genomic perspective on hybridization and speciation. Mol Ecol 2016; 25:2337-60. [PMID: 26836441 PMCID: PMC4915564 DOI: 10.1111/mec.13557] [Citation(s) in RCA: 292] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 01/18/2016] [Accepted: 01/25/2016] [Indexed: 12/13/2022]
Abstract
Hybridization among diverging lineages is common in nature. Genomic data provide a special opportunity to characterize the history of hybridization and the genetic basis of speciation. We review existing methods and empirical studies to identify recent advances in the genomics of hybridization, as well as issues that need to be addressed. Notable progress has been made in the development of methods for detecting hybridization and inferring individual ancestries. However, few approaches reconstruct the magnitude and timing of gene flow, estimate the fitness of hybrids or incorporate knowledge of recombination rate. Empirical studies indicate that the genomic consequences of hybridization are complex, including a highly heterogeneous landscape of differentiation. Inferred characteristics of hybridization differ substantially among species groups. Loci showing unusual patterns - which may contribute to reproductive barriers - are usually scattered throughout the genome, with potential enrichment in sex chromosomes and regions of reduced recombination. We caution against the growing trend of interpreting genomic variation in summary statistics across genomes as evidence of differential gene flow. We argue that converting genomic patterns into useful inferences about hybridization will ultimately require models and methods that directly incorporate key ingredients of speciation, including the dynamic nature of gene flow, selection acting in hybrid populations and recombination rate variation.
Collapse
Affiliation(s)
- Bret A. Payseur
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Loren H. Rieseberg
- Department of Botany, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
36
|
|
37
|
Abstract
Genetic heterogeneity explains variation in predisposition for cancer. Whole-genome analysis allows risk to be quantified, giving better targeted screening and quantification of the personalized risk posed by environmental factors. Array-based approaches to whole-genome analysis are rapidly being overtaken by next-generation sequencing (NGS). In this review the different platforms currently available for NGS are compared and the opportunities and risks of this approach are discussed: including the informatics packages required and the ethical issues. Methods applicable to the personal genome machine (PGM) are given as an example of workflows.
Collapse
Affiliation(s)
- Victoria Shaw
- NIHR Pancreatic Biomedical Research Unit, Molecular and Clinical Cancer Medicine, Royal Liverpool University Hospital, 5th Floor UCD Block, Daulby Street, Liverpool, L69 3GA, UK
| | - Katie Bullock
- NIHR Pancreatic Biomedical Research Unit, Molecular and Clinical Cancer Medicine, Royal Liverpool University Hospital, 5th Floor UCD Block, Daulby Street, Liverpool, L69 3GA, UK
| | - William Greenhalf
- NIHR Pancreatic Biomedical Research Unit, Molecular and Clinical Cancer Medicine, Royal Liverpool University Hospital, 5th Floor UCD Block, Daulby Street, Liverpool, L69 3GA, UK.
| |
Collapse
|
38
|
Hugall AF, O'Hara TD, Hunjan S, Nilsen R, Moussalli A. An Exon-Capture System for the Entire Class Ophiuroidea. Mol Biol Evol 2015; 33:281-94. [PMID: 26474846 PMCID: PMC4693979 DOI: 10.1093/molbev/msv216] [Citation(s) in RCA: 73] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Exon-capture studies have typically been restricted to relatively shallow phylogenetic scales due primarily to hybridization constraints. Here, we present an exon-capture system for an entire class of marine invertebrates, the Ophiuroidea, built upon a phylogenetically diverse transcriptome foundation. The system captures approximately 90% of the 1,552 exon target, across all major lineages of the quarter-billion-year-old extant crown group. Key features of our system are 1) basing the target on an alignment of orthologous genes determined from 52 transcriptomes spanning the phylogenetic diversity and trimmed to remove anything difficult to capture, map, or align; 2) use of multiple artificial representatives based on ancestral state reconstructions rather than exemplars to improve capture and mapping of the target; 3) mapping reads to a multi-reference alignment; and 4) using patterns of site polymorphism to distinguish among paralogy, polyploidy, allelic differences, and sample contamination. The resulting data give a well-resolved tree (currently standing at 417 samples, 275,352 sites, 91% data-complete) that will transform our understanding of ophiuroid evolution and biogeography.
Collapse
Affiliation(s)
| | | | | | - Roger Nilsen
- Georgia Genomics Facility, University of Georgia
| | | |
Collapse
|
39
|
Čejková D, Strouhal M, Norris SJ, Weinstock GM, Šmajs D. A Retrospective Study on Genetic Heterogeneity within Treponema Strains: Subpopulations Are Genetically Distinct in a Limited Number of Positions. PLoS Negl Trop Dis 2015; 9:e0004110. [PMID: 26436423 PMCID: PMC4593590 DOI: 10.1371/journal.pntd.0004110] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 09/02/2015] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Pathogenic uncultivable treponemes comprise human and animal pathogens including agents of syphilis, yaws, bejel, pinta, and venereal spirochetosis in rabbits and hares. A set of 10 treponemal genome sequences including those of 4 Treponema pallidum ssp. pallidum (TPA) strains (Nichols, DAL-1, Mexico A, SS14), 4 T. p. ssp. pertenue (TPE) strains (CDC-2, Gauthier, Samoa D, Fribourg-Blanc), 1 T. p. ssp. endemicum (TEN) strain (Bosnia A) and one strain (Cuniculi A) of Treponema paraluisleporidarum ecovar Cuniculus (TPLC) were examined with respect to the presence of nucleotide intrastrain heterogeneous sites. METHODOLOGY/PRINCIPAL FINDINGS The number of identified intrastrain heterogeneous sites in individual genomes ranged between 0 and 7. Altogether, 23 intrastrain heterogeneous sites (in 17 genes) were found in 5 out of 10 investigated treponemal genomes including TPA strains Nichols (n = 5), DAL-1 (n = 4), and SS14 (n = 7), TPE strain Samoa D (n = 1), and TEN strain Bosnia A (n = 5). Although only one heterogeneous site was identified among 4 tested TPE strains, 16 such sites were identified among 4 TPA strains. Heterogeneous sites were mostly strain-specific and were identified in four tpr genes (tprC, GI, I, K), in genes involved in bacterial motility and chemotaxis (fliI, cheC-fliY), in genes involved in cell structure (murC), translation (prfA), general and DNA metabolism (putative SAM dependent methyltransferase, topA), and in seven hypothetical genes. CONCLUSIONS/SIGNIFICANCE Heterogeneous sites likely represent both the selection of adaptive changes during infection of the host as well as an ongoing diversifying evolutionary process.
Collapse
Affiliation(s)
- Darina Čejková
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
- Department of Immunology, Veterinary Research Institute, Brno, Czech Republic
| | - Michal Strouhal
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Steven J. Norris
- Pathology & Laboratory Medicine, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - George M. Weinstock
- The Genome Institute, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - David Šmajs
- Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| |
Collapse
|
40
|
Børsting C, Morling N. Next generation sequencing and its applications in forensic genetics. Forensic Sci Int Genet 2015; 18:78-89. [DOI: 10.1016/j.fsigen.2015.02.002] [Citation(s) in RCA: 268] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Revised: 01/12/2015] [Accepted: 02/11/2015] [Indexed: 12/13/2022]
|
41
|
Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit ML, Zook JM. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 2015. [PMID: 26217378 PMCID: PMC4493402 DOI: 10.3389/fgene.2015.00235] [Citation(s) in RCA: 106] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.
Collapse
Affiliation(s)
- Nathan D Olson
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Steven P Lund
- Statistical Engineering Division, Information Technology Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Rebecca E Colman
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Jeffrey T Foster
- Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jason W Sahl
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - James M Schupp
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Paul Keim
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jayne B Morrow
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Marc L Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA ; Department of Bioengineering, Stanford University , Stanford, CA, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| |
Collapse
|
42
|
Mielczarek M, Szyda J. Review of alignment and SNP calling algorithms for next-generation sequencing data. J Appl Genet 2015; 57:71-9. [PMID: 26055432 DOI: 10.1007/s13353-015-0292-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Revised: 02/27/2015] [Accepted: 05/15/2015] [Indexed: 01/21/2023]
Abstract
Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.
Collapse
Affiliation(s)
- M Mielczarek
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kożuchowska 7, 51-631, Wroclaw, Poland.
| | - J Szyda
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kożuchowska 7, 51-631, Wroclaw, Poland
| |
Collapse
|
43
|
Brenndörfer J, Altmann A, Widner-Andrä R, Pütz B, Czamara D, Tilch E, Kam-Thong T, Weber P, Rex-Haffner M, Bettecken T, Bultmann A, Müller-Myhsok B, Binder EE, Landgraf R, Czibere L. Connecting Anxiety and Genomic Copy Number Variation: A Genome-Wide Analysis in CD-1 Mice. PLoS One 2015; 10:e0128465. [PMID: 26011321 PMCID: PMC4444327 DOI: 10.1371/journal.pone.0128465] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 04/27/2015] [Indexed: 12/05/2022] Open
Abstract
Genomic copy number variants (CNVs) have been implicated in multiple psychiatric disorders, but not much is known about their influence on anxiety disorders specifically. Using next-generation sequencing (NGS) and two additional array-based genotyping approaches, we detected CNVs in a mouse model consisting of two inbred mouse lines showing high (HAB) and low (LAB) anxiety-related behavior, respectively. An influence of CNVs on gene expression in the central (CeA) and basolateral (BLA) amygdala, paraventricular nucleus (PVN), and cingulate cortex (Cg) was shown by a two-proportion Z-test (p = 1.6 x 10-31), with a positive correlation in the CeA (p = 0.0062), PVN (p = 0.0046) and Cg (p = 0.0114), indicating a contribution of CNVs to the genetic predisposition to trait anxiety in the specific context of HAB/LAB mice. In order to confirm anxiety-relevant CNVs and corresponding genes in a second mouse model, we further examined CD-1 outbred mice. We revealed the distribution of CNVs by genotyping 64 CD 1 individuals using a high-density genotyping array (Jackson Laboratory). 78 genes within those CNVs were identified to show nominally significant association (48 genes), or a statistical trend in their association (30 genes) with the time animals spent on the open arms of the elevated plus-maze (EPM). Fifteen of them were considered promising candidate genes of anxiety-related behavior as we could show a significant overlap (permutation test, p = 0.0051) with genes within HAB/LAB CNVs. Thus, here we provide what is to our knowledge the first extensive catalogue of CNVs in CD-1 mice and potential corresponding candidate genes linked to anxiety-related behavior in mice.
Collapse
Affiliation(s)
- Julia Brenndörfer
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
- * E-mail:
| | - André Altmann
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Regina Widner-Andrä
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| | - Benno Pütz
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Darina Czamara
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Erik Tilch
- Institute of Human Genetics, Helmholtz Zentrum München, Munich, Germany
- Institute of Human Genetics, Technische Universität München, Munich, Germany
| | - Tony Kam-Thong
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Peter Weber
- Department of Molecular Genetics of Affective Disorders, Max Planck Institute of Psychiatry, Munich, Germany
| | - Monika Rex-Haffner
- Department of Molecular Genetics of Affective Disorders, Max Planck Institute of Psychiatry, Munich, Germany
| | - Thomas Bettecken
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| | - Andrea Bultmann
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| | - Bertram Müller-Myhsok
- Department of Statistical Genetics, Max Planck Institute of Psychiatry, Munich, Germany
| | - Elisabeth E. Binder
- Department of Molecular Genetics of Affective Disorders, Max Planck Institute of Psychiatry, Munich, Germany
| | - Rainer Landgraf
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| | - Ludwig Czibere
- Department of Behavioral Neuroendocrinology, Max Planck Institute of Psychiatry, Munich, Germany
| |
Collapse
|
44
|
PhyResSE: a Web Tool Delineating Mycobacterium tuberculosis Antibiotic Resistance and Lineage from Whole-Genome Sequencing Data. J Clin Microbiol 2015; 53:1908-14. [PMID: 25854485 DOI: 10.1128/jcm.00025-15] [Citation(s) in RCA: 199] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Accepted: 03/31/2015] [Indexed: 11/20/2022] Open
Abstract
Antibiotic-resistant tuberculosis poses a global threat, causing the deaths of hundreds of thousands of people annually. While whole-genome sequencing (WGS), with its unprecedented level of detail, promises to play an increasingly important role in diagnosis, data analysis is a daunting challenge. Here, we present a simple-to-use web service (free for academic use at http://phyresse.org). Delineating both lineage and resistance, it provides state-of-the-art methodology to life scientists and physicians untrained in bioinformatics. It combines elaborate data processing and quality control, as befits human diagnostics, with a treasure trove of validated resistance data collected from well-characterized samples in-house and worldwide.
Collapse
|
45
|
Chan LF, Campbell DC, Novoselova TV, Clark AJL, Metherell LA. Whole-Exome Sequencing in the Differential Diagnosis of Primary Adrenal Insufficiency in Children. Front Endocrinol (Lausanne) 2015; 6:113. [PMID: 26300845 PMCID: PMC4525066 DOI: 10.3389/fendo.2015.00113] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 07/10/2015] [Indexed: 12/02/2022] Open
Abstract
Adrenal insufficiency is a rare, but potentially fatal medical condition. In children, the cause is most commonly congenital and in recent years a growing number of causative gene mutations have been identified resulting in a myriad of syndromes that share adrenal insufficiency as one of the main characteristics. The evolution of adrenal insufficiency is dependent on the variant and the particular gene affected, meaning that rapid and accurate diagnosis is imperative for effective treatment of the patient. Common practice is for candidate genes to be sequenced individually, which is a time-consuming process and complicated by overlapping clinical phenotypes. However, with the availability, and increasing cost effectiveness of whole-exome sequencing, there is the potential for this to become a powerful diagnostic tool. Here, we report the results of whole-exome sequencing of 43 patients referred to us with a diagnosis of familial glucocorticoid deficiency (FGD) who were mutation negative for MC2R, MRAP, and STAR the most commonly mutated genes in FGD. WES provided a rapid genetic diagnosis in 17/43 sequenced patients, for the remaining 60% the gene defect may be within intronic/regulatory regions not covered by WES or may be in gene(s) representing novel etiologies. The diagnosis of isolated or familial glucocorticoid deficiency was only confirmed in 3 of the 17 patients, other genetic diagnoses were adrenal hypo- and hyperplasia, Triple A, and autoimmune polyendocrinopathy syndrome type I, emphasizing both the difficulty of phenotypically distinguishing between disorders of PAI and the utility of WES as a tool to achieve this.
Collapse
Affiliation(s)
- Li F. Chan
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Daniel C. Campbell
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Tatiana V. Novoselova
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Adrian J. L. Clark
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Louise A. Metherell
- Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
- *Correspondence: Louise A. Metherell, Centre for Endocrinology, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK,
| |
Collapse
|
46
|
Vo NS, Tran Q, Phan V. An integrated approach for SNP calling based on population of genomes. BMC Bioinformatics 2014. [PMCID: PMC4196081 DOI: 10.1186/1471-2105-15-s10-p30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
47
|
Bahlo M, Tankard R, Lukic V, Oliver KL, Smith KR. Using familial information for variant filtering in high-throughput sequencing studies. Hum Genet 2014; 133:1331-41. [PMID: 25129038 PMCID: PMC4185103 DOI: 10.1007/s00439-014-1479-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Accepted: 08/07/2014] [Indexed: 12/30/2022]
Abstract
High-throughput sequencing studies (HTS) have been highly successful in identifying the genetic causes of human disease, particularly those following Mendelian inheritance. Many HTS studies to date have been performed without utilizing available family relationships between samples. Here, we discuss the many merits and occasional pitfalls of using identity by descent information in conjunction with HTS studies. These methods are not only applicable to family studies but are also useful in cohorts of apparently unrelated, ‘sporadic’ cases and small families underpowered for linkage and allow inference of relationships between individuals. Incorporating familial/pedigree information not only provides powerful filtering options for the extensive variant lists that are usually produced by HTS but also allows valuable quality control checks, insights into the genetic model and the genotypic status of individuals of interest. In particular, these methods are valuable for challenging discovery scenarios in HTS analysis, such as in the study of populations poorly represented in variant databases typically used for filtering, and in the case of poor-quality HTS data.
Collapse
Affiliation(s)
- Melanie Bahlo
- The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, 3052, Australia,
| | | | | | | | | |
Collapse
|
48
|
Yi M, Zhao Y, Jia L, He M, Kebebew E, Stephens RM. Performance comparison of SNP detection tools with illumina exome sequencing data--an assessment using both family pedigree information and sample-matched SNP array data. Nucleic Acids Res 2014; 42:e101. [PMID: 24831545 PMCID: PMC4081058 DOI: 10.1093/nar/gku392] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 03/27/2014] [Accepted: 04/22/2014] [Indexed: 12/30/2022] Open
Abstract
To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios--family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest.
Collapse
Affiliation(s)
- Ming Yi
- Advanced Biomedical Computing Center, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA Current address: Cancer Research and Technology Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc. PO Box B, Frederick, MD, 21702.
| | - Yongmei Zhao
- Advanced Biomedical Computing Center, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Li Jia
- Advanced Biomedical Computing Center, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Mei He
- Endocrine Oncology Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Electron Kebebew
- Endocrine Oncology Branch, Center for Cancer Research, National Cancer Institute, Bethesda, MD 20892, USA
| | - Robert M Stephens
- Advanced Biomedical Computing Center, SAIC-Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA Current address: Cancer Research and Technology Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc. PO Box B, Frederick, MD, 21702.
| |
Collapse
|
49
|
Hwang KB, Lee IH, Park JH, Hambuch T, Choe Y, Kim M, Lee K, Song T, Neu MB, Gupta N, Kohane IS, Green RC, Kong SW. Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods. Hum Mutat 2014; 35:936-44. [PMID: 24829188 DOI: 10.1002/humu.22587] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 04/29/2014] [Indexed: 12/29/2022]
Abstract
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates.
Collapse
Affiliation(s)
- Kyu-Baek Hwang
- Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Boston Children's Hospital, Boston, Massachusetts; School of Computer Science and Engineering, Soongsil University, Seoul, 156-743, South Korea
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
D'Auria G, Schneider MV, Moya A. Live genomics for pathogen monitoring in public health. Pathogens 2014; 3:93-108. [PMID: 25437609 PMCID: PMC4235738 DOI: 10.3390/pathogens3010093] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Revised: 12/16/2013] [Accepted: 01/07/2014] [Indexed: 02/07/2023] Open
Abstract
Whole genome analysis based on next generation sequencing (NGS) now represents an affordable framework in public health systems. Robust analytical pipelines of genomic data provides in short laps of time (hours) information about taxonomy, comparative genomics (pan-genome) and single polymorphisms profiles. Pathogenic organisms of interest can be tracked at the genomic level, allowing monitoring at one-time several variables including: epidemiology, pathogenicity, resistance to antibiotics, virulence, persistence factors, mobile elements and adaptation features. Such information can be obtained not only at large spectra, but also at the "local" level, such as in the event of a recurrent or emergency outbreak. This paper reviews the state of the art in infection diagnostics in the context of modern NGS methodologies. We describe how actuation protocols in a public health environment will benefit from a "streaming approach" (pipeline). Such pipeline would NGS data quality assessment, data mining for comparative analysis, searching differential genetic features, such as virulence, resistance persistence factors and mutation profiles (SNPs and InDels) and formatted "comprehensible" results. Such analytical protocols will enable a quick response to the needs of locally circumscribed outbreaks, providing information on the causes of resistance and genetic tracking elements for rapid detection, and monitoring actuations for present and future occurrences.
Collapse
Affiliation(s)
- Giuseppe D'Auria
- Genómica y Salud, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunidad Valenciana (FISABIO-Salud Pública), Avenida de Cataluña 21, 46020 Valencia, Spain.
| | | | - Andrés Moya
- Genómica y Salud, Fundación para el Fomento de la Investigación Sanitaria y Biomédica de la Comunidad Valenciana (FISABIO-Salud Pública), Avenida de Cataluña 21, 46020 Valencia, Spain.
| |
Collapse
|