1
|
Yan A, Baricordi C, Nguyen Q, Barbarossa L, Loperfido M, Biasco L. IS-Seq: a bioinformatics pipeline for integration sites analysis with comprehensive abundance quantification methods. BMC Bioinformatics 2023; 24:286. [PMID: 37464281 DOI: 10.1186/s12859-023-05390-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 06/16/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Integration site (IS) analysis is a fundamental analytical platform for evaluating the safety and efficacy of viral vector based preclinical and clinical Gene Therapy (GT). A handful of groups have developed standardized bioinformatics pipelines to process IS sequencing data, to generate reports, and/or to perform comparative studies across different GT trials. Keeping up with the technological advances in the field of IS analysis, different computational pipelines have been published over the past decade. These pipelines focus on identifying IS from single-read sequencing or paired-end sequencing data either using read-based or using sonication fragment-based methods, but there is a lack of a bioinformatics tool that automatically includes unique molecular identifiers (UMI) for IS abundance estimations and allows comparing multiple quantification methods in one integrated pipeline. RESULTS Here we present IS-Seq a bioinformatics pipeline that can process data from paired-end sequencing of both old restriction sites-based IS collection methods and new sonication-based IS retrieval systems while allowing the selection of different abundance estimation methods, including read-based, Fragment-based and UMI-based systems. CONCLUSIONS We validated the performance of IS-Seq by testing it against the most popular analytical workflow available in the literature (INSPIIRED) and using different scenarios. Lastly, by performing extensive simulation studies and a comprehensive wet-lab assessment of our IS-Seq pipeline we could show that in clinically relevant scenarios, UMI quantification provides better accuracy than the currently most widely used sonication fragment counts as a method for IS abundance estimation.
Collapse
Affiliation(s)
| | | | | | | | | | - Luca Biasco
- AVROBIO, Inc., Cambridge, MA, USA.
- Infection, Immunity and Inflammation Department, Great Ormond Street Institute of Child Health, University College London, London, UK.
| |
Collapse
|
2
|
Artesi M, Hahaut V, Cole B, Lambrechts L, Ashrafi F, Marçais A, Hermine O, Griebel P, Arsic N, van der Meer F, Burny A, Bron D, Bianchi E, Delvenne P, Bours V, Charlier C, Georges M, Vandekerckhove L, Van den Broeke A, Durkin K. PCIP-seq: simultaneous sequencing of integrated viral genomes and their insertion sites with long reads. Genome Biol 2021; 22:97. [PMID: 33823910 PMCID: PMC8025556 DOI: 10.1186/s13059-021-02307-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 02/25/2021] [Indexed: 12/30/2022] Open
Abstract
The integration of a viral genome into the host genome has a major impact on the trajectory of the infected cell. Integration location and variation within the associated viral genome can influence both clonal expansion and persistence of infected cells. Methods based on short-read sequencing can identify viral insertion sites, but the sequence of the viral genomes within remains unobserved. We develop PCIP-seq, a method that leverages long reads to identify insertion sites and sequence their associated viral genome. We apply the technique to exogenous retroviruses HTLV-1, BLV, and HIV-1, endogenous retroviruses, and human papillomavirus.
Collapse
Affiliation(s)
- Maria Artesi
- Unit of Animal Genomics, GIGA, Université de Liège (ULiège), Avenue de l’Hôpital 11, 4000 Liège, Belgium
- Laboratory of Experimental Hematology, Institut Jules Bordet, Université Libre de Bruxelles (ULB), Boulevard de Waterloo 121, 1000 Brussels, Belgium
- Laboratory of Human Genetics, GIGA, Université de Liège (ULiège), Avenue de l’Hôpital 11, 4000 Liège, Belgium
| | - Vincent Hahaut
- Unit of Animal Genomics, GIGA, Université de Liège (ULiège), Avenue de l’Hôpital 11, 4000 Liège, Belgium
- Laboratory of Experimental Hematology, Institut Jules Bordet, Université Libre de Bruxelles (ULB), Boulevard de Waterloo 121, 1000 Brussels, Belgium
| | - Basiel Cole
- HIV Cure Research Center, Department of Internal Medicine and Pediatrics, Ghent University Hospital and Ghent University, 9000 Ghent, Belgium
| | - Laurens Lambrechts
- HIV Cure Research Center, Department of Internal Medicine and Pediatrics, Ghent University Hospital and Ghent University, 9000 Ghent, Belgium
- BioBix, Department of Data Analysis and Mathematical Modelling, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Fereshteh Ashrafi
- Unit of Animal Genomics, GIGA, Université de Liège (ULiège), Avenue de l’Hôpital 11, 4000 Liège, Belgium
- Department of Animal Science, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Ambroise Marçais
- Service d’hématologie, Hôpital Universitaire Necker, Université René Descartes, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Olivier Hermine
- Service d’hématologie, Hôpital Universitaire Necker, Université René Descartes, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Philip Griebel
- Vaccine and Infectious Disease Organization, VIDO-Intervac, University of Saskatchewan, 120 Veterinary Road, Saskatoon, S7N 5E3 Canada
| | - Natasa Arsic
- Vaccine and Infectious Disease Organization, VIDO-Intervac, University of Saskatchewan, 120 Veterinary Road, Saskatoon, S7N 5E3 Canada
| | - Frank van der Meer
- Faculty of Veterinary Medicine: Ecosystem and Public Health, Calgary, AB Canada
| | - Arsène Burny
- Laboratory of Experimental Hematology, Institut Jules Bordet, Université Libre de Bruxelles (ULB), Boulevard de Waterloo 121, 1000 Brussels, Belgium
| | - Dominique Bron
- Laboratory of Experimental Hematology, Institut Jules Bordet, Université Libre de Bruxelles (ULB), Boulevard de Waterloo 121, 1000 Brussels, Belgium
| | - Elettra Bianchi
- Department of Pathology, University Hospital (CHU), University of Liège, Liège, Belgium
| | - Philippe Delvenne
- Department of Pathology, University Hospital (CHU), University of Liège, Liège, Belgium
| | - Vincent Bours
- Laboratory of Human Genetics, GIGA, Université de Liège (ULiège), Avenue de l’Hôpital 11, 4000 Liège, Belgium
- Department of Human Genetics, University Hospital (CHU), University of Liège, Liège, Belgium
| | - Carole Charlier
- Unit of Animal Genomics, GIGA, Université de Liège (ULiège), Avenue de l’Hôpital 11, 4000 Liège, Belgium
| | - Michel Georges
- Unit of Animal Genomics, GIGA, Université de Liège (ULiège), Avenue de l’Hôpital 11, 4000 Liège, Belgium
| | - Linos Vandekerckhove
- HIV Cure Research Center, Department of Internal Medicine and Pediatrics, Ghent University Hospital and Ghent University, 9000 Ghent, Belgium
| | - Anne Van den Broeke
- Unit of Animal Genomics, GIGA, Université de Liège (ULiège), Avenue de l’Hôpital 11, 4000 Liège, Belgium
- Laboratory of Experimental Hematology, Institut Jules Bordet, Université Libre de Bruxelles (ULB), Boulevard de Waterloo 121, 1000 Brussels, Belgium
| | - Keith Durkin
- Unit of Animal Genomics, GIGA, Université de Liège (ULiège), Avenue de l’Hôpital 11, 4000 Liège, Belgium
- Laboratory of Experimental Hematology, Institut Jules Bordet, Université Libre de Bruxelles (ULB), Boulevard de Waterloo 121, 1000 Brussels, Belgium
| |
Collapse
|
3
|
Spinozzi G, Calabria A, Brasca S, Beretta S, Merelli I, Milanesi L, Montini E. VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites. BMC Bioinformatics 2017; 18:520. [PMID: 29178837 DOI: 10.1186/s12859-017-1937-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 11/14/2017] [Indexed: 01/09/2023] Open
Abstract
Background Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process “big data” in a reasonable computational time. Results Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). Conclusions We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 (http://openserver.itb.cnr.it/vispa/) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository (https://bitbucket.org/andreacalabria/vispa2). Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1937-9) contains supplementary material, which is available to authorized users.
Collapse
|
4
|
Kamboj A, Hallwirth CV, Alexander IE, McCowage GB, Kramer B. Ub-ISAP: a streamlined UNIX pipeline for mining unique viral vector integration sites from next generation sequencing data. BMC Bioinformatics 2017. [PMID: 28623888 PMCID: PMC5474025 DOI: 10.1186/s12859-017-1719-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The analysis of viral vector genomic integration sites is an important component in assessing the safety and efficiency of patient treatment using gene therapy. Alongside this clinical application, integration site identification is a key step in the genetic mapping of viral elements in mutagenesis screens that aim to elucidate gene function. RESULTS We have developed a UNIX-based vector integration site analysis pipeline (Ub-ISAP) that utilises a UNIX-based workflow for automated integration site identification and annotation of both single and paired-end sequencing reads. Reads that contain viral sequences of interest are selected and aligned to the host genome, and unique integration sites are then classified as transcription start site-proximal, intragenic or intergenic. CONCLUSION Ub-ISAP provides a reliable and efficient pipeline to generate large datasets for assessing the safety and efficiency of integrating vectors in clinical settings, with broader applications in cancer research. Ub-ISAP is available as an open source software package at https://sourceforge.net/projects/ub-isap/ .
Collapse
Affiliation(s)
- Atul Kamboj
- Children's Cancer Research Unit, Kids' Research Institute, The Children's Hospital at Westmead, Locked Bag 4001, Westmead, NSW, 2145, Australia.
| | - Claus V Hallwirth
- Gene Therapy Research Unit, Children's Medical Research Institute and The Children's Hospital at Westmead, Westmead, NSW, Australia
| | - Ian E Alexander
- Gene Therapy Research Unit, Children's Medical Research Institute and The Children's Hospital at Westmead, Westmead, NSW, Australia.,The University of Sydney, Discipline of Paediatrics and Child Health, Westmead, NSW, Australia
| | - Geoffrey B McCowage
- Cancer Centre for Children, The Children's Hospital, Westmead, NSW, Australia
| | - Belinda Kramer
- Children's Cancer Research Unit, Kids' Research Institute, The Children's Hospital at Westmead, Locked Bag 4001, Westmead, NSW, 2145, Australia
| |
Collapse
|