1
|
A New Regression Model for the Analysis of Overdispersed and Zero-Modified Count Data. ENTROPY 2021; 23:e23060646. [PMID: 34064281 PMCID: PMC8224290 DOI: 10.3390/e23060646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/23/2021] [Accepted: 01/24/2021] [Indexed: 11/18/2022]
Abstract
Count datasets are traditionally analyzed using the ordinary Poisson distribution. However, said model has its applicability limited, as it can be somewhat restrictive to handling specific data structures. In this case, the need arises for obtaining alternative models that accommodate, for example, overdispersion and zero modification (inflation/deflation at the frequency of zeros). In practical terms, these are the most prevalent structures ruling the nature of discrete phenomena nowadays. Hence, this paper’s primary goal was to jointly address these issues by deriving a fixed-effects regression model based on the hurdle version of the Poisson–Sujatha distribution. In this framework, the zero modification is incorporated by considering that a binary probability model determines which outcomes are zero-valued, and a zero-truncated process is responsible for generating positive observations. Posterior inferences for the model parameters were obtained from a fully Bayesian approach based on the g-prior method. Intensive Monte Carlo simulation studies were performed to assess the Bayesian estimators’ empirical properties, and the obtained results have been discussed. The proposed model was considered for analyzing a real dataset, and its competitiveness regarding some well-established fixed-effects models for count data was evaluated. A sensitivity analysis to detect observations that may impact parameter estimates was performed based on standard divergence measures. The Bayesian p-value and the randomized quantile residuals were considered for the task of model validation.
Collapse
|
2
|
Bertoli W, Conceição KS, Andrade MG, Louzada F. A new mixed-effects regression model for the analysis of zero-modified hierarchical count data. Biom J 2020; 63:81-104. [PMID: 33073871 DOI: 10.1002/bimj.202000046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 08/06/2020] [Accepted: 08/14/2020] [Indexed: 11/07/2022]
Abstract
Count data sets are traditionally analyzed using the ordinary Poisson distribution. However, such a model has its applicability limited as it can be somewhat restrictive to handle specific data structures. In this case, it arises the need for obtaining alternative models that accommodate, for example, (a) zero-modification (inflation or deflation at the frequency of zeros), (b) overdispersion, and (c) individual heterogeneity arising from clustering or repeated (correlated) measurements made on the same subject. Cases (a)-(b) and (b)-(c) are often treated together in the statistical literature with several practical applications, but models supporting all at once are less common. Hence, this paper's primary goal was to jointly address these issues by deriving a mixed-effects regression model based on the hurdle version of the Poisson-Lindley distribution. In this framework, the zero-modification is incorporated by assuming that a binary probability model determines which outcomes are zero-valued, and a zero-truncated process is responsible for generating positive observations. Approximate posterior inferences for the model parameters were obtained from a fully Bayesian approach based on the Adaptive Metropolis algorithm. Intensive Monte Carlo simulation studies were performed to assess the empirical properties of the Bayesian estimators. The proposed model was considered for the analysis of a real data set, and its competitiveness regarding some well-established mixed-effects models for count data was evaluated. A sensitivity analysis to detect observations that may impact parameter estimates was performed based on standard divergence measures. The Bayesian p -value and the randomized quantile residuals were considered for model diagnostics.
Collapse
Affiliation(s)
- Wesley Bertoli
- Department of Statistics, Federal University of Technology - Paraná, Curitiba, Brazil
| | - Katiane S Conceição
- Department of Applied Mathematics and Statistics, Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, Brazil
| | - Marinho G Andrade
- Department of Applied Mathematics and Statistics, Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, Brazil
| | - Francisco Louzada
- Department of Applied Mathematics and Statistics, Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, Brazil
| |
Collapse
|
3
|
Bertoli W, Conceição KS, Andrade MG, Louzada F. Bayesian approach for the zero-modified Poisson–Lindley regression model. BRAZ J PROBAB STAT 2019. [DOI: 10.1214/19-bjps447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
4
|
Bertoli W, Conceição KS, Andrade MG, Louzada F. A Bayesian approach for some zero-modified Poisson mixture models. STAT MODEL 2019. [DOI: 10.1177/1471082x19841984] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this article, we propose a class of zero-modified Poisson mixture models as an alternative to model overdispersed count data exhibiting inflation or deflation of zeros. A relevant feature of this class is that the zero modification can be incorporated using a zero truncation process and consequently, the proposed models can be expressed in the hurdle version. This procedure leads to the fact that the proposed models can be fitted without any previous information about the zero modification present in agiven dataset. A fully Bayesian approach has been considered for estimation and inference concerns. Three different simulation studies have been conducted to illustrate the performance of the developed methodology. The usefulness of the proposed class of models has been assessed by using three real datasets provided by the literature. A general model comparison with some well-known discrete distributions has been presented.
Collapse
Affiliation(s)
- Wesley Bertoli
- Department of Statistics, Federal University of Technology–Paraná, Curitiba, PR, Brazil
| | - Katiane S Conceição
- Department of Applied Mathematics and Statistics, Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil
| | - Marinho G Andrade
- Department of Applied Mathematics and Statistics, Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil
| | - Francisco Louzada
- Department of Applied Mathematics and Statistics, Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil
| |
Collapse
|
5
|
Abstract
The human microbiome is associated with complex disorders such as diabetes, cancer, obesity and cardiovascular disorders. Recent technological developments have allowed researchers to fully quantify the composition of the microbiome using culture-independent approaches, resulting in a large amount of microbiome data, which provide invaluable opportunities to assess the important contributions of the microbiome to human health and disease. In this chapter, we discuss and evaluate multiple statistical approaches for processing, summarizing, and analyzing microbiome data. Specifically, we provide programming scripts for processing microbiome data using QIIME and calculating alpha and beta diversities, assessing the association between diversities and outcomes of interest using R programs, as well as interpretation of results. We illustrate the methods in the context of analyzing the foregut microbiome in esophageal adenocarcinoma.
Collapse
|
6
|
Ivády G, Madar L, Dzsudzsák E, Koczok K, Kappelmayer J, Krulisova V, Macek M, Horváth A, Balogh I. Analytical parameters and validation of homopolymer detection in a pyrosequencing-based next generation sequencing system. BMC Genomics 2018; 19:158. [PMID: 29466940 PMCID: PMC5822529 DOI: 10.1186/s12864-018-4544-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Accepted: 02/13/2018] [Indexed: 01/14/2023] Open
Abstract
Background Current technologies in next-generation sequencing are offering high throughput reads at low costs, but still suffer from various sequencing errors. Although pyro- and ion semiconductor sequencing both have the advantage of delivering long and high quality reads, problems might occur when sequencing homopolymer-containing regions, since the repeating identical bases are going to incorporate during the same synthesis cycle, which leads to uncertainty in base calling. The aim of this study was to evaluate the analytical performance of a pyrosequencing-based next-generation sequencing system in detecting homopolymer sequences using homopolymer-preintegrated plasmid constructs and human DNA samples originating from patients with cystic fibrosis. Results In the plasmid system average correct genotyping was 95.8% in 4-mers, 87.4% in 5-mers and 72.1% in 6-mers. Despite the experienced low genotyping accuracy in 5- and 6-mers, it was possible to generate amplicons with more than a 90% adequate detection rate in every homopolymer tract. When homopolymers in the CFTR gene were sequenced average accuracy was 89.3%, but varied in a wide range (52.2 – 99.1%). In all but one case, an optimal amplicon-sequencing primer combination could be identified. In that single case (7A tract in exon 14 (c.2046_2052)), none of the tested primer sets produced the required analytical performance. Conclusions Our results show that pyrosequencing is the most reliable in case of 4-mers and as homopolymer length gradually increases, accuracy deteriorates. With careful primer selection, the NGS system was able to correctly genotype all but one of the homopolymers in the CFTR gene. In conclusion, we configured a plasmid test system that can be used to assess genotyping accuracy of NGS devices and developed an accurate NGS assay for the molecular diagnosis of CF using self-designed primers for amplification and sequencing. Electronic supplementary material The online version of this article (10.1186/s12864-018-4544-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gergely Ivády
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - László Madar
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - Erika Dzsudzsák
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - Katalin Koczok
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary.,Division of Clinical Genetics, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - János Kappelmayer
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary
| | - Veronika Krulisova
- Department of Biology and Medical Genetics, Second Faculty of Medicine and University Hospital Motol, Charles University, Prague, Czech Republic
| | - Milan Macek
- Department of Biology and Medical Genetics, Second Faculty of Medicine and University Hospital Motol, Charles University, Prague, Czech Republic
| | - Attila Horváth
- Genomic Medicine and Bioinformatic Core Facility, University of Debrecen, Debrecen, Hungary
| | - István Balogh
- Department of Laboratory Medicine, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary. .,Division of Clinical Genetics, University of Debrecen, Nagyerdei krt. 98, Debrecen, H-4032, Hungary.
| |
Collapse
|
7
|
On the zero-modified Poisson–Shanker regression model and its application to fetal deaths notification data. Comput Stat 2018. [DOI: 10.1007/s00180-017-0788-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Cunha MLR, Meijers JCM, Middeldorp S. Introduction to the analysis of next generation sequencing data and its application to venous thromboembolism. Thromb Haemost 2015; 114:920-32. [PMID: 26446408 DOI: 10.1160/th15-05-0411] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 08/26/2015] [Indexed: 12/13/2022]
Abstract
Despite knowledge of various inherited risk factors associated with venous thromboembolism (VTE), no definite cause can be found in about 50% of patients. The application of data-driven searches such as GWAS has not been able to identify genetic variants with implications for clinical care, and unexplained heritability remains. In the past years, the development of several so-called next generation sequencing (NGS) platforms is offering the possibility of generating fast, inexpensive and accurate genomic information. However, so far their application to VTE has been very limited. Here we review basic concepts of NGS data analysis and explore the application of NGS technology to VTE. We provide both computational and biological viewpoints to discuss potentials and challenges of NGS-based studies.
Collapse
Affiliation(s)
- Marisa L R Cunha
- Marisa L. R. Cunha, Department of Experimental Vascular Medicine, Academic Medical Center, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands, Tel.: +31 20 5662824, Fax: +31 20 6968833, E-mail:
| | | | | |
Collapse
|
9
|
Verbist B, Clement L, Reumers J, Thys K, Vapirev A, Talloen W, Wetzels Y, Meys J, Aerssens J, Bijnens L, Thas O. ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering. BMC Bioinformatics 2015; 16:59. [PMID: 25887734 PMCID: PMC4369097 DOI: 10.1186/s12859-015-0458-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 12/16/2014] [Indexed: 11/10/2022] Open
Abstract
Background Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Results Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. Conclusions ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0458-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bie Verbist
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000, Belgium.
| | - Lieven Clement
- Department of Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, Gent, 9000, Belgium.
| | - Joke Reumers
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Kim Thys
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Alexander Vapirev
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium. .,ExaScience Life Lab, Kapeldreef 75, Leuven, 3001, Belgium.
| | - Willem Talloen
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Yves Wetzels
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Joris Meys
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000, Belgium.
| | - Jeroen Aerssens
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Luc Bijnens
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Olivier Thas
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000, Belgium. .,University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW, 2522, Australia.
| |
Collapse
|
10
|
454 screening of individual MHC variation in an endemic island passerine. Immunogenetics 2014; 67:149-62. [PMID: 25515684 PMCID: PMC4325181 DOI: 10.1007/s00251-014-0822-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 11/28/2014] [Indexed: 11/03/2022]
Abstract
Genes of the major histocompatibility complex (MHC) code for receptors that are central to the adaptive immune response of vertebrates. These genes are therefore important genetic markers with which to study adaptive genetic variation in the wild. Next-generation sequencing (NGS) has increasingly been used in the last decade to genotype the MHC. However, NGS methods are highly prone to sequencing errors, and although several methodologies have been proposed to deal with this, until recently there have been no standard guidelines for the validation of putative MHC alleles. In this study, we used the 454 NGS platform to screen MHC class I exon 3 variation in a population of the island endemic Berthelot's pipit (Anthus berthelotii). We were able to characterise MHC genotypes across 309 individuals with high levels of repeatability. We were also able to determine alleles that had low amplification efficiencies, whose identification within individuals may thus be less reliable. At the population level we found lower levels of MHC diversity in Berthelot's pipit than in its widespread continental sister species the tawny pipit (Anthus campestris), and observed trans-species polymorphism. Using the sequence data, we identified signatures of gene conversion and evidence of maintenance of functionally divergent alleles in Berthelot's pipit. We also detected positive selection at 10 codons. The present study therefore shows that we have an efficient method for screening individual MHC variation across large datasets in Berthelot's pipit, and provides data that can be used in future studies investigating spatio-temporal patterns and scales of selection on the MHC.
Collapse
|
11
|
Verbist BMP, Thys K, Reumers J, Wetzels Y, Van der Borght K, Talloen W, Aerssens J, Clement L, Thas O. VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering. ACTA ACUST UNITED AC 2014; 31:94-101. [PMID: 25178459 DOI: 10.1093/bioinformatics/btu587] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
MOTIVATION In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. RESULTS A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. AVAILABILITY The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory.
Collapse
Affiliation(s)
- Bie M P Verbist
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Kim Thys
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Joke Reumers
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Yves Wetzels
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Koen Van der Borght
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Willem Talloen
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Jeroen Aerssens
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Lieven Clement
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Olivier Thas
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| |
Collapse
|
12
|
Gene discovery through transcriptome sequencing for the invasive mussel Limnoperna fortunei. PLoS One 2014; 9:e102973. [PMID: 25047650 PMCID: PMC4105566 DOI: 10.1371/journal.pone.0102973] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2014] [Accepted: 06/24/2014] [Indexed: 11/22/2022] Open
Abstract
The success of the Asian bivalve Limnoperna fortunei as an invader in South America is related to its high acclimation capability. It can inhabit waters with a wide range of temperatures and salinity and handle long-term periods of air exposure. We describe the transcriptome of L. fortunei aiming to give a first insight into the phenotypic plasticity that allows non-native taxa to become established and widespread. We sequenced 95,219 reads from five main tissues of the mussel L. fortunei using Roche’s 454 and assembled them to form a set of 84,063 unigenes (contigs and singletons) representing partial or complete gene sequences. We annotated 24,816 unigenes using a BLAST sequence similarity search against a NCBI nr database. Unigenes were divided into 20 eggNOG functional categories and 292 KEGG metabolic pathways. From the total unigenes, 1,351 represented putative full-length genes of which 73.2% were functionally annotated. We described the first partial and complete gene sequences in order to start understanding bivalve invasiveness. An expansion of the hsp70 gene family, seen also in other bivalves, is present in L. fortunei and could be involved in its adaptation to extreme environments, e.g. during intertidal periods. The presence of toll-like receptors gives a first insight into an immune system that could be more complex than previously assumed and may be involved in the prevention of disease and extinction when population densities are high. Finally, the apparent lack of special adaptations to extremely low O2 levels is a target worth pursuing for the development of a molecular control approach.
Collapse
|
13
|
Golan D, Medvedev P. Using state machines to model the Ion Torrent sequencing process and to improve read error rates. Bioinformatics 2013; 29:i344-51. [PMID: 23813003 PMCID: PMC3694666 DOI: 10.1093/bioinformatics/btt212] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation: The importance of fast and affordable DNA sequencing methods for current day life sciences, medicine and biotechnology is hard to overstate. A major player is Ion Torrent, a pyrosequencing-like technology which produces flowgrams – sequences of incorporation values – which are converted into nucleotide sequences by a base-calling algorithm. Because of its exploitation of ubiquitous semiconductor technology and innovation in chemistry, Ion Torrent has been gaining popularity since its debut in 2011. Despite the advantages, however, Ion Torrent read accuracy remains a significant concern. Results: We present FlowgramFixer, a new algorithm for converting flowgrams into reads. Our key observation is that the incorporation signals of neighboring flows, even after normalization and phase correction, carry considerable mutual information and are important in making the correct base-call. We therefore propose that base-calling of flowgrams should be done on a read-wide level, rather than one flow at a time. We show that this can be done in linear-time by combining a state machine with a Viterbi algorithm to find the nucleotide sequence that maximizes the likelihood of the observed flowgram. FlowgramFixer is applicable to any flowgram-based sequencing platform. We demonstrate FlowgramFixer’s superior performance on Ion Torrent Escherichia coli data, with a 4.8% improvement in the number of high-quality mapped reads and a 7.1% improvement in the number of uniquely mappable reads. Availability: Binaries and source code of FlowgramFixer are freely available at: http://www.cs.tau.ac.il/~davidgo5/flowgramfixer.html. Contact:davidgo5@post.tau.ac.il
Collapse
Affiliation(s)
- David Golan
- Department of Statistics and Operations Research, School of Mathematical Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | |
Collapse
|
14
|
Besnard T, García-García G, Baux D, Vaché C, Faugère V, Larrieu L, Léonard S, Millan JM, Malcolm S, Claustres M, Roux AF. Experience of targeted Usher exome sequencing as a clinical test. Mol Genet Genomic Med 2013; 2:30-43. [PMID: 24498627 PMCID: PMC3907913 DOI: 10.1002/mgg3.25] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2013] [Accepted: 06/06/2013] [Indexed: 12/15/2022] Open
Abstract
We show that massively parallel targeted sequencing of 19 genes provides a new and reliable strategy for molecular diagnosis of Usher syndrome (USH) and nonsyndromic deafness, particularly appropriate for these disorders characterized by a high clinical and genetic heterogeneity and a complex structure of several of the genes involved. A series of 71 patients including Usher patients previously screened by Sanger sequencing plus newly referred patients was studied. Ninety-eight percent of the variants previously identified by Sanger sequencing were found by next-generation sequencing (NGS). NGS proved to be efficient as it offers analysis of all relevant genes which is laborious to reach with Sanger sequencing. Among the 13 newly referred Usher patients, both mutations in the same gene were identified in 77% of cases (10 patients) and one candidate pathogenic variant in two additional patients. This work can be considered as pilot for implementing NGS for genetically heterogeneous diseases in clinical service.
Collapse
Affiliation(s)
- Thomas Besnard
- U827, Inserm Montpellier, F-34000, France ; Univ, Montpellier I Montpellier, F-34000, France
| | - Gema García-García
- U827, Inserm Montpellier, F-34000, France ; Grupo de Investigación en Enfermedades Neurosensoriales, Instituto de Investigación Sanitaria IIS-La Fe and CIBERER Valencia, Spain
| | - David Baux
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Christel Vaché
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Valérie Faugère
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Lise Larrieu
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Susana Léonard
- Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Jose M Millan
- Grupo de Investigación en Enfermedades Neurosensoriales, Instituto de Investigación Sanitaria IIS-La Fe and CIBERER Valencia, Spain
| | - Sue Malcolm
- Clinical and Molecular Genetics, Institute of Child Health, University College London London, United Kingdom
| | - Mireille Claustres
- U827, Inserm Montpellier, F-34000, France ; Univ, Montpellier I Montpellier, F-34000, France ; Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| | - Anne-Françoise Roux
- U827, Inserm Montpellier, F-34000, France ; Laboratoire de Génétique Moléculaire, CHU Montpellier Montpellier, F-34000, France
| |
Collapse
|