Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Döring A, Weese D, Rausch T, Reinert K. SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics 2008;9:11. [PMID: 18184432 PMCID: PMC2246154 DOI: 10.1186/1471-2105-9-11] [Citation(s) in RCA: 166] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2007] [Accepted: 01/09/2008] [Indexed: 11/30/2022] Open

For:	Döring A, Weese D, Rausch T, Reinert K. SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinformatics 2008;9:11. [PMID: 18184432 PMCID: PMC2246154 DOI: 10.1186/1471-2105-9-11] [Citation(s) in RCA: 166] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2007] [Accepted: 01/09/2008] [Indexed: 11/30/2022] Open

Number

Cited by Other Article(s)

Seah BKB, Singh A, Vetter DE, Emmerich C, Peters M, Soltys V, Huettel B, Swart EC. Nuclear dualism without extensive DNA elimination in the ciliate Loxodes magnus. Proc Natl Acad Sci U S A 2024;121:e2400503121. [PMID: 39298487 PMCID: PMC11441545 DOI: 10.1073/pnas.2400503121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 08/08/2024] [Indexed: 09/21/2024] Open

Stankovic S, Shekari S, Huang QQ, Gardner EJ, Ivarsdottir EV, Owens NDL, Mavaddat N, Azad A, Hawkes G, Kentistou KA, Beaumont RN, Day FR, Zhao Y, Jonsson H, Rafnar T, Tragante V, Sveinbjornsson G, Oddsson A, Styrkarsdottir U, Gudmundsson J, Stacey SN, Gudbjartsson DF, Kennedy K, Wood AR, Weedon MN, Ong KK, Wright CF, Hoffmann ER, Sulem P, Hurles ME, Ruth KS, Martin HC, Stefansson K, Perry JRB, Murray A. Genetic links between ovarian ageing, cancer risk and de novo mutation rates. Nature 2024;633:608-614. [PMID: 39261734 PMCID: PMC11410666 DOI: 10.1038/s41586-024-07931-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 08/08/2024] [Indexed: 09/13/2024]

Affiliation(s)

Stasa Stankovic MRC Epidemiology Unit, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK
Saleh Shekari University of Exeter Medical School, University of Exeter, Exeter, UK School of Public Health, Faculty of Medicine, University of Queensland, Brisbane, Queensland, Australia
Qin Qin Huang Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
Eugene J Gardner MRC Epidemiology Unit, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK
Erna V Ivarsdottir deCODE Genetics/Amgen, Reykjavik, Iceland
Nick D L Owens University of Exeter Medical School, University of Exeter, Exeter, UK
Nasim Mavaddat Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
Ajuna Azad DNRF Center for Chromosome Stability, Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Gareth Hawkes University of Exeter Medical School, University of Exeter, Exeter, UK
Katherine A Kentistou MRC Epidemiology Unit, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK
Robin N Beaumont University of Exeter Medical School, University of Exeter, Exeter, UK
Felix R Day MRC Epidemiology Unit, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK
Yajie Zhao MRC Epidemiology Unit, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK
Hakon Jonsson deCODE Genetics/Amgen, Reykjavik, Iceland
Thorunn Rafnar deCODE Genetics/Amgen, Reykjavik, Iceland
Vinicius Tragante deCODE Genetics/Amgen, Reykjavik, Iceland
Gardar Sveinbjornsson deCODE Genetics/Amgen, Reykjavik, Iceland
Asmundur Oddsson deCODE Genetics/Amgen, Reykjavik, Iceland
Unnur Styrkarsdottir deCODE Genetics/Amgen, Reykjavik, Iceland
Julius Gudmundsson deCODE Genetics/Amgen, Reykjavik, Iceland
Simon N Stacey deCODE Genetics/Amgen, Reykjavik, Iceland
Daniel F Gudbjartsson deCODE Genetics/Amgen, Reykjavik, Iceland
Kitale Kennedy University of Exeter Medical School, University of Exeter, Exeter, UK
Andrew R Wood University of Exeter Medical School, University of Exeter, Exeter, UK
Michael N Weedon University of Exeter Medical School, University of Exeter, Exeter, UK
Ken K Ong MRC Epidemiology Unit, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK Department of Paediatrics, University of Cambridge, Cambridge, UK
Caroline F Wright University of Exeter Medical School, University of Exeter, Exeter, UK
Eva R Hoffmann DNRF Center for Chromosome Stability, Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Patrick Sulem deCODE Genetics/Amgen, Reykjavik, Iceland
Matthew E Hurles Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
Katherine S Ruth University of Exeter Medical School, University of Exeter, Exeter, UK
Hilary C Martin Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK
Kari Stefansson deCODE Genetics/Amgen, Reykjavik, Iceland
John R B Perry MRC Epidemiology Unit, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK. Metabolic Research Laboratory, Wellcome-MRC Institute of Metabolic Science, University of Cambridge, Cambridge, UK.
Anna Murray University of Exeter Medical School, University of Exeter, Exeter, UK.

Collapse

Yang C, Trivedi V, Dyson K, Gu T, Candelario KM, Yegorov O, Mitchell DA. Identification of tumor rejection antigens and the immunologic landscape of medulloblastoma. Genome Med 2024;16:102. [PMID: 39160595 PMCID: PMC11331754 DOI: 10.1186/s13073-024-01363-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 07/12/2024] [Indexed: 08/21/2024] Open

Abstract

BACKGROUND

The current standard of care treatments for medulloblastoma are insufficient as these do not take tumor heterogeneity into account. Newer, safer, patient-specific treatment approaches are required to treat high-risk medulloblastoma patients who are not cured by the standard therapies. Immunotherapy is a promising treatment modality that could be key to improving survival and avoiding morbidity. For an effective immune response, appropriate tumor antigens must be targeted. While medulloblastoma patients with subgroup-specific genetic substitutions have been previously reported, the immunogenicity of these genetic alterations remains unknown. The aim of this study is to identify potential tumor rejection antigens for the development of antigen-directed cellular therapies for medulloblastoma.

METHODS

We developed a cancer immunogenomics pipeline and performed a comprehensive analysis of medulloblastoma subgroup-specific transcription profiles (n = 170, 18 WNT, 46 SHH, 41 Group 3, and 65 Group 4 patient tumors) available through International Cancer Genome Consortium (ICGC) and European Genome-Phenome Archive (EGA). We performed in silico antigen prediction across a broad array of antigen classes including neoantigens, tumor-associated antigens (TAAs), and fusion proteins. Furthermore, we evaluated the antigen processing and presentation pathway in tumor cells and the immune infiltrating cell landscape using the latest computational deconvolution methods.

RESULTS

Medulloblastoma patients were found to express multiple private and shared immunogenic antigens. The proportion of predicted TAAs was higher than neoantigens and gene fusions for all molecular subgroups, except for sonic hedgehog (SHH), which had a higher neoantigen burden. Importantly, cancer-testis antigens, as well as previously unappreciated neurodevelopmental antigens, were found to be expressed by most patients across all medulloblastoma subgroups. Despite being immunologically cold, medulloblastoma subgroups were found to have distinct immune cell gene signatures.

CONCLUSIONS

Using a custom antigen prediction pipeline, we identified potential tumor rejection antigens with important implications for the development of immunotherapy for medulloblastoma.

Collapse

Rossini R, Oshaghi M, Nekrasov M, Bellanger A, Domaschenz R, Dijkwel Y, Abdelhalim M, Collas P, Tremethick D, Paulsen J. Loss of multi-level 3D genome organization during breast cancer progression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.26.568711. [PMID: 38076897 PMCID: PMC10705249 DOI: 10.1101/2023.11.26.568711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]

Pan B, Bruno M, Macfarlan TS, Akera T. Meiosis-specific decoupling of the pericentromere from the kinetochore. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.21.604490. [PMID: 39091844 PMCID: PMC11291024 DOI: 10.1101/2024.07.21.604490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]

Abstract

The primary constriction site of the M-phase chromosome is an established marker for the kinetochore position, often used to determine the karyotype of each species. Underlying this observation is the concept that the kinetochore is spatially linked with the pericentromere where sister-chromatids are most tightly cohered. Here, we found an unconventional pericentromere specification with sister chromatids mainly cohered at a chromosome end, spatially separated from the kinetochore in Peromyscus mouse oocytes. This distal locus enriched cohesin protectors, such as the Chromosomal Passenger Complex (CPC) and PP2A, at a higher level compared to its centromere/kinetochore region, acting as the primary site for sister-chromatid cohesion. Chromosomes with the distal cohesion site exhibited enhanced cohesin protection at anaphase I compared to those without it, implying that these distal cohesion sites may have evolved to ensure sister-chromatid cohesion during meiosis. In contrast, mitotic cells enriched CPC only near the kinetochore and the distal locus was not cohered between sister chromatids, suggesting a meiosis-specific mechanism to protect cohesin at this distal locus. We found that this distal locus corresponds to an additional centromeric satellite block, located far apart from the centromeric satellite block that builds the kinetochore. Several Peromyscus species carry chromosomes with two such centromeric satellite blocks. Analyses on three Peromyscus species revealed that the internal satellite consistently assembles the kinetochore in both mitosis and meiosis, whereas the distal satellite selectively enriches cohesin protectors in meiosis to promote sister-chromatid cohesion at that site. Thus, our study demonstrates that pericentromere specification is remarkably flexible and can control chromosome segregation in a cell-type and context dependent manner.

Collapse

Aguado-Puig Q, Doblas M, Matzoros C, Espinosa A, Moure JC, Marco-Sola S, Moreto M. WFA-GPU: gap-affine pairwise read-alignment using GPUs. Bioinformatics 2023;39:btad701. [PMID: 37975878 PMCID: PMC10697739 DOI: 10.1093/bioinformatics/btad701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 11/09/2023] [Accepted: 11/16/2023] [Indexed: 11/19/2023] Open

Abstract

MOTIVATION

Advances in genomics and sequencing technologies demand faster and more scalable analysis methods that can process longer sequences with higher accuracy. However, classical pairwise alignment methods, based on dynamic programming (DP), impose impractical computational requirements to align long and noisy sequences like those produced by PacBio and Nanopore technologies. The recently proposed wavefront alignment (WFA) algorithm paves the way for more efficient alignment tools, improving time and memory complexity over previous methods. However, high-performance computing (HPC) platforms require efficient parallel algorithms and tools to exploit the computing resources available on modern accelerator-based architectures.

RESULTS

This paper presents WFA-GPU, a GPU (graphics processing unit)-accelerated tool to compute exact gap-affine alignments based on the WFA algorithm. We present the algorithmic adaptations and performance optimizations that allow exploiting the massively parallel capabilities of modern GPU devices to accelerate the alignment computations. In particular, we propose a CPU-GPU co-design capable of performing inter-sequence and intra-sequence parallel sequence alignment, combining a succinct WFA-data representation with an efficient GPU implementation. As a result, we demonstrate that our implementation outperforms the original multi-threaded WFA implementation by up to 4.3× and up to 18.2× when using heuristic methods on long and noisy sequences. Compared to other state-of-the-art tools and libraries, the WFA-GPU is up to 29× faster than other GPU implementations and up to four orders of magnitude faster than other CPU implementations. Furthermore, WFA-GPU is the only GPU solution capable of correctly aligning long reads using a commodity GPU.

AVAILABILITY AND IMPLEMENTATION

WFA-GPU code and documentation are publicly available at https://github.com/quim0/WFA-GPU.

Collapse

Wei ZG, Bu PY, Zhang XD, Liu F, Qian Y, Wu FX. invMap: a sensitive mapping tool for long noisy reads with inversion structural variants. Bioinformatics 2023;39:btad726. [PMID: 38058196 PMCID: PMC11320709 DOI: 10.1093/bioinformatics/btad726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 11/02/2023] [Accepted: 12/05/2023] [Indexed: 12/08/2023] Open

Borozan L, Rojas Ringeling F, Kao SY, Nikonova E, Monteagudo-Mesas P, Matijević D, Spletter ML, Canzar S. Counting pseudoalignments to novel splicing events. Bioinformatics 2023;39:btad419. [PMID: 37432342 PMCID: PMC10348833 DOI: 10.1093/bioinformatics/btad419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 04/21/2023] [Accepted: 07/10/2023] [Indexed: 07/12/2023] Open

Berger B, Yu YW. Navigating bottlenecks and trade-offs in genomic data analysis. Nat Rev Genet 2023;24:235-250. [PMID: 36476810 PMCID: PMC10204111 DOI: 10.1038/s41576-022-00551-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2022] [Indexed: 12/12/2022]

Yu C, Zhao Y, Zhao C, Ma H, Wang G. DiagAF: A More Accurate and Efficient Pre-Alignment Filter for Sequence Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:3404-3415. [PMID: 34780330 DOI: 10.1109/tcbb.2021.3127879] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Swain MT, Vickers M. Interpreting alignment-free sequence comparison: what makes a score a good score? NAR Genom Bioinform 2022;4:lqac062. [PMID: 36071721 PMCID: PMC9442500 DOI: 10.1093/nargab/lqac062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Revised: 07/01/2022] [Accepted: 08/16/2022] [Indexed: 11/13/2022] Open

Galaxy Dnpatterntools for Computational Analysis of Nucleosome Positioning Sequence Patterns. Int J Mol Sci 2022;23:ijms23094869. [PMID: 35563261 PMCID: PMC9102330 DOI: 10.3390/ijms23094869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 04/25/2022] [Accepted: 04/26/2022] [Indexed: 01/25/2023] Open

Patel RS, Romero R, Watson EV, Liang AC, Burger M, Westcott PMK, Mercer KL, Bronson RT, Wooten EC, Bhutkar A, Jacks T, Elledge SJ. A GATA4-regulated secretory program suppresses tumors through recruitment of cytotoxic CD8 T cells. Nat Commun 2022;13:256. [PMID: 35017504 PMCID: PMC8752777 DOI: 10.1038/s41467-021-27731-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 12/06/2021] [Indexed: 12/11/2022] Open

Affiliation(s)

Rupesh S Patel Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA.,Scripps Green Hospital, San Diego, CA, USA
Rodrigo Romero David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA.,Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.,Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Emma V Watson Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
Anthony C Liang Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
Megan Burger David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
Peter M K Westcott David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
Kim L Mercer David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
Roderick T Bronson Harvard Medical School, Boston, MA, USA
Eric C Wooten Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Department of Genetics, Harvard Medical School, Boston, MA, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
Arjun Bhutkar David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
Tyler Jacks David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA.,Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
Stephen J Elledge Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA. .,Department of Genetics, Harvard Medical School, Boston, MA, USA. .,Howard Hughes Medical Institute, Chevy Chase, MD, USA.

Collapse

Hao Y, Conant GC. POInT: A Tool for Modeling Ancient Polyploidies Using Multiple Polyploid Genomes. Methods Mol Biol 2022;2512:81-91. [PMID: 35818001 DOI: 10.1007/978-1-0716-2429-6_6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Flores SC, Alexiou A, Glaros A. Mining the Protein Data Bank to improve prediction of changes in protein-protein binding. PLoS One 2021;16:e0257614. [PMID: 34727109 PMCID: PMC8562805 DOI: 10.1371/journal.pone.0257614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 09/05/2021] [Indexed: 12/23/2022] Open

Uhlitz F, Bischoff P, Peidli S, Sieber A, Trinks A, Lüthen M, Obermayer B, Blanc E, Ruchiy Y, Sell T, Mamlouk S, Arsie R, Wei T, Klotz‐Noack K, Schwarz RF, Sawitzki B, Kamphues C, Beule D, Landthaler M, Sers C, Horst D, Blüthgen N, Morkel M. Mitogen-activated protein kinase activity drives cell trajectories in colorectal cancer. EMBO Mol Med 2021;13:e14123. [PMID: 34409732 PMCID: PMC8495451 DOI: 10.15252/emmm.202114123] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 07/27/2021] [Accepted: 07/30/2021] [Indexed: 01/07/2023] Open

Affiliation(s)

Florian Uhlitz Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany IRI Life SciencesHumboldt University of BerlinBerlinGermany German Cancer Consortium (DKTK) Partner Site BerlinGerman Cancer Research Center (DKFZ)HeidelbergGermany
Philip Bischoff Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany
Stefan Peidli Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany IRI Life SciencesHumboldt University of BerlinBerlinGermany
Anja Sieber Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany IRI Life SciencesHumboldt University of BerlinBerlinGermany
Alexandra Trinks Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany BIH Bioportal Single CellsBerlin Institute of Health at Charité – Universitätsmedizin BerlinBerlinGermany
Mareen Lüthen Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany German Cancer Consortium (DKTK) Partner Site BerlinGerman Cancer Research Center (DKFZ)HeidelbergGermany
Benedikt Obermayer Core Unit Bioinformatics (CUBI)Berlin Institute of Health at Charité Universitätsmedizin – BerlinBerlinGermany
Eric Blanc Core Unit Bioinformatics (CUBI)Berlin Institute of Health at Charité Universitätsmedizin – BerlinBerlinGermany
Yana Ruchiy Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany
Thomas Sell Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany IRI Life SciencesHumboldt University of BerlinBerlinGermany
Soulafa Mamlouk Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany German Cancer Consortium (DKTK) Partner Site BerlinGerman Cancer Research Center (DKFZ)HeidelbergGermany
Roberto Arsie Max Delbrück Center for Molecular MedicineBerlin Institute for Medical Systems Biology (BIMSB)BerlinGermany
Tzu‐Ting Wei Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany Max Delbrück Center for Molecular MedicineBerlin Institute for Medical Systems Biology (BIMSB)BerlinGermany
Kathleen Klotz‐Noack Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany Institute of Medical ImmunologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany
Roland F Schwarz Max Delbrück Center for Molecular MedicineBerlin Institute for Medical Systems Biology (BIMSB)BerlinGermany BIFOLD – Berlin Institute for the Foundations of Learning and DataBerlinGermany
Birgit Sawitzki Institute of Medical ImmunologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany
Carsten Kamphues German Cancer Consortium (DKTK) Partner Site BerlinGerman Cancer Research Center (DKFZ)HeidelbergGermany Department of SurgeryCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany
Dieter Beule Core Unit Bioinformatics (CUBI)Berlin Institute of Health at Charité Universitätsmedizin – BerlinBerlinGermany
Markus Landthaler Max Delbrück Center for Molecular MedicineBerlin Institute for Medical Systems Biology (BIMSB)BerlinGermany
Christine Sers Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany German Cancer Consortium (DKTK) Partner Site BerlinGerman Cancer Research Center (DKFZ)HeidelbergGermany
David Horst Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany German Cancer Consortium (DKTK) Partner Site BerlinGerman Cancer Research Center (DKFZ)HeidelbergGermany
Nils Blüthgen Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany IRI Life SciencesHumboldt University of BerlinBerlinGermany German Cancer Consortium (DKTK) Partner Site BerlinGerman Cancer Research Center (DKFZ)HeidelbergGermany
Markus Morkel Institute of PathologyCharité – Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt‐Universität zu BerlinBerlinGermany German Cancer Consortium (DKTK) Partner Site BerlinGerman Cancer Research Center (DKFZ)HeidelbergGermany BIH Bioportal Single CellsBerlin Institute of Health at Charité – Universitätsmedizin BerlinBerlinGermany

Collapse

Shajii A, Numanagić I, Leighton AT, Greenyer H, Amarasinghe S, Berger B. A Python-based programming language for high-performance computational genomics. Nat Biotechnol 2021;39:1062-1064. [PMID: 34282326 PMCID: PMC8542382 DOI: 10.1038/s41587-021-00985-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Eggertsson HP, Halldorsson BV. read_haps: using read haplotypes to detect same species contamination in DNA sequences. Bioinformatics 2021;37:2215-2217. [PMID: 33135043 DOI: 10.1093/bioinformatics/btaa936] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 09/10/2020] [Accepted: 10/22/2020] [Indexed: 11/12/2022] Open

Smith SR, Normandeau E, Djambazian H, Nawarathna PM, Berube P, Muir AM, Ragoussis J, Penney CM, Scribner KT, Luikart G, Wilson CC, Bernatchez L. A chromosome-anchored genome assembly for Lake Trout (Salvelinus namaycush). Mol Ecol Resour 2021;22:679-694. [PMID: 34351050 PMCID: PMC9291852 DOI: 10.1111/1755-0998.13483] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 07/25/2021] [Accepted: 07/28/2021] [Indexed: 01/23/2023]

Abstract

Here, we present an annotated, chromosome‐anchored, genome assembly for Lake Trout (Salvelinus namaycush) – a highly diverse salmonid species of notable conservation concern and an excellent model for research on adaptation and speciation. We leveraged Pacific Biosciences long‐read sequencing, paired‐end Illumina sequencing, proximity ligation (Hi‐C) sequencing, and a previously published linkage map to produce a highly contiguous assembly composed of 7378 contigs (contig N50 = 1.8 Mb) assigned to 4120 scaffolds (scaffold N50 = 44.975 Mb). Long read sequencing data were generated using DNA from a female double haploid individual. 84.7% of the genome was assigned to 42 chromosome‐sized scaffolds and 93.2% of Benchmarking Universal Single Copy Orthologues were recovered, putting this assembly on par with the best currently available salmonid genomes. Estimates of genome size based on k‐mer frequency analysis were highly similar to the total size of the finished genome, suggesting that the entirety of the genome was recovered. A mitochondrial genome assembly was also produced. Self‐versus‐self synteny analysis allowed us to identify homeologs resulting from the salmonid specific autotetraploid event (Ss4R) as well as regions exhibiting delayed rediploidization. Alignment with three other salmonid genomes and the Northern Pike (Esox lucius) genome also allowed us to identify homologous chromosomes in related taxa. We also generated multiple resources useful for future genomic research on Lake Trout, including a repeat library and a sex‐averaged recombination map. A novel RNA sequencing data set for liver tissue was also generated in order to produce a publicly available set of annotations for 49,668 genes and pseudogenes. Potential applications of these resources to population genetics and the conservation of native populations are discussed.

Collapse

Wang Q, Boenigk S, Boehm V, Gehring NH, Altmueller J, Dieterich C. Single cell transcriptome sequencing on the Nanopore platform with ScNapBar. RNA (NEW YORK, N.Y.) 2021;27:rna.078154.120. [PMID: 33906975 PMCID: PMC8208055 DOI: 10.1261/rna.078154.120] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 04/20/2021] [Indexed: 06/12/2023]

Broseus L, Thomas A, Oldfield AJ, Severac D, Dubois E, Ritchie W. TALC: Transcript-level Aware Long-read Correction. Bioinformatics 2021;36:5000-5006. [PMID: 32910174 DOI: 10.1093/bioinformatics/btaa634] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/08/2020] [Accepted: 07/09/2020] [Indexed: 02/06/2023] Open

Firtina C, Kim JS, Alser M, Senol Cali D, Cicek AE, Alkan C, Mutlu O. Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm. Bioinformatics 2020;36:3669-3679. [PMID: 32167530 DOI: 10.1093/bioinformatics/btaa179] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 12/16/2019] [Accepted: 03/11/2020] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject's genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively.

RESULTS

We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts.

AVAILABILITY AND IMPLEMENTATION

Source code is available at https://github.com/CMU-SAFARI/Apollo.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Schmartz GP, Kern F, Fehlmann T, Wagner V, Fromm B, Keller A. Encyclopedia of tools for the analysis of miRNA isoforms. Brief Bioinform 2020;22:6032629. [PMID: 33313643 DOI: 10.1093/bib/bbaa346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 10/15/2020] [Accepted: 10/29/2020] [Indexed: 12/14/2022] Open

Yang H, Wang Y, Zhang Z, Li H. Identification of KIF18B as a Hub Candidate Gene in the Metastasis of Clear Cell Renal Cell Carcinoma by Weighted Gene Co-expression Network Analysis. Front Genet 2020;11:905. [PMID: 32973873 PMCID: PMC7468490 DOI: 10.3389/fgene.2020.00905] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 07/21/2020] [Indexed: 12/13/2022] Open

Abstract

Background

Clear cell renal cell carcinoma (ccRCC) is a common type of fatal malignancy in the urinary system. As the therapeutic strategies of ccRCC are severely limited at present, the prognosis of patients with metastatic carcinoma is usually not promising. Revealing the pathogenesis and identifying hub candidate genes for prognosis prediction and precise treatment are urgently needed in metastatic ccRCC.

Methods

In the present study, we conducted a series of bioinformatics studies with the gene expression profiles of ccRCC samples from Gene Expression Omnibus (GEO) and the cancer genome atlas (TCGA) database for identifying and validating the hub gene of metastatic ccRCC. We constructed a co-expression network, divided genes into co-expression modules, and identified ccRCC-related modules by weighted gene co-expression network analysis (WGCNA) with data from GEO. Then, we investigated the functions of genes in the ccRCC-related modules by enrichment analyses and built a sub-network accordingly. A hub candidate gene of the metastatic ccRCC was identified by maximal clique centrality (MCC) method. We validate the hub gene by differentially expressed gene analysis, overall survival analysis, and correlation analysis with clinical traits with the external dataset (TCGA). Finally, we explored the function of the hub gene by correlation analysis with targets of precise therapies and single-gene gene set enrichment analysis.

Results

We conducted WGCNA with the expression profiles of GSE73731 from GEO and divided all genes into 8 meaningful co-expression modules. One module is proved to be positively correlated with pathological stage and tumor grade of ccRCC. Genes in the ccRCC-related module were mainly enriched in functions of mitotic cell division and several proverbial tumor related signal pathways. We then identified KIF18B as a hub gene of the metastasis of ccRCC. Validating analyses in external dataset observed the up-regulation of KIF18B in ccRCC and its correlation with worse outcomes. Further analyses found that the expression of KIF18B is related to that of targets of precise therapies.

Conclusion

Our study proposed KIF18B as a hub candidate gene of ccRCC for the first time. Our conclusion may provide a brand-new clue for prognosis evaluating and precise treatment for ccRCC in the future.

Collapse

Lees JA, Mai TT, Galardini M, Wheeler NE, Horsfield ST, Parkhill J, Corander J. Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions. mBio 2020;11:e01344-20. [PMID: 32636251 PMCID: PMC7343994 DOI: 10.1128/mbio.01344-20] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 06/05/2020] [Indexed: 12/19/2022] Open

Abstract

Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially.IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.

Collapse

Wylie DC, Hofmann HA, Zemelman BV. SArKS: de novo discovery of gene expression regulatory motif sites and domains by suffix array kernel smoothing. Bioinformatics 2020;35:3944-3952. [PMID: 30903136 DOI: 10.1093/bioinformatics/btz198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 03/04/2019] [Accepted: 03/20/2019] [Indexed: 11/14/2022] Open

Ghaffaari A, Marschall T. Fully-sensitive seed finding in sequence graphs using a hybrid index. Bioinformatics 2020;35:i81-i89. [PMID: 31510650 PMCID: PMC6612829 DOI: 10.1093/bioinformatics/btz341] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

Rautiainen M, Mäkinen V, Marschall T. Bit-parallel sequence-to-graph alignment. Bioinformatics 2020;35:3599-3607. [PMID: 30851095 PMCID: PMC6761980 DOI: 10.1093/bioinformatics/btz162] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 01/19/2019] [Accepted: 03/07/2019] [Indexed: 01/16/2023] Open

Romero R, Sánchez-Rivera FJ, Westcott PMK, Mercer KL, Bhutkar A, Muir A, González Robles TJ, Lamboy Rodríguez S, Liao LZ, Ng SR, Li L, Colón CI, Naranjo S, Beytagh MC, Lewis CA, Hsu PP, Bronson RT, Vander Heiden MG, Jacks T. Keap1 mutation renders lung adenocarcinomas dependent on Slc33a1. NATURE CANCER 2020;1:589-602. [PMID: 34414377 PMCID: PMC8373048 DOI: 10.1038/s43018-020-0071-1] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 05/01/2020] [Indexed: 12/13/2022]

Affiliation(s)

Rodrigo Romero Koch Institute for Integrative Cancer Research, Cambridge, MA, USA Massachusetts Institute of Technology Department of Biology, Cambridge, MA, USA
Francisco J Sánchez-Rivera Koch Institute for Integrative Cancer Research, Cambridge, MA, USA Massachusetts Institute of Technology Department of Biology, Cambridge, MA, USA Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
Peter M K Westcott Koch Institute for Integrative Cancer Research, Cambridge, MA, USA
Kim L Mercer Koch Institute for Integrative Cancer Research, Cambridge, MA, USA Howard Hughes Medical Institute, Chevy Chase, MD, USA
Arjun Bhutkar Koch Institute for Integrative Cancer Research, Cambridge, MA, USA
Alexander Muir Koch Institute for Integrative Cancer Research, Cambridge, MA, USA Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA
Tania J González Robles Koch Institute for Integrative Cancer Research, Cambridge, MA, USA
Swanny Lamboy Rodríguez Massachusetts Institute of Technology Department of Biology, Cambridge, MA, USA
Laura Z Liao Massachusetts Institute of Technology Department of Biology, Cambridge, MA, USA
Sheng Rong Ng Koch Institute for Integrative Cancer Research, Cambridge, MA, USA Massachusetts Institute of Technology Department of Biology, Cambridge, MA, USA
Leanne Li Koch Institute for Integrative Cancer Research, Cambridge, MA, USA
Caterina I Colón Koch Institute for Integrative Cancer Research, Cambridge, MA, USA
Santiago Naranjo Koch Institute for Integrative Cancer Research, Cambridge, MA, USA Massachusetts Institute of Technology Department of Biology, Cambridge, MA, USA
Mary Clare Beytagh Massachusetts Institute of Technology Department of Biology, Cambridge, MA, USA
Caroline A Lewis Whitehead Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
Peggy P Hsu Koch Institute for Integrative Cancer Research, Cambridge, MA, USA Massachusetts General Hospital Cancer Center, Boston, MA, USA Dana-Farber Cancer Institute, Boston, MA, USA
Roderick T Bronson Tufts University, Boston, MA, USA Harvard Medical School, Boston, MA, USA
Matthew G Vander Heiden Koch Institute for Integrative Cancer Research, Cambridge, MA, USA Massachusetts Institute of Technology Department of Biology, Cambridge, MA, USA Dana-Farber Cancer Institute, Boston, MA, USA
Tyler Jacks Koch Institute for Integrative Cancer Research, Cambridge, MA, USA. Massachusetts Institute of Technology Department of Biology, Cambridge, MA, USA. Howard Hughes Medical Institute, Chevy Chase, MD, USA.

Collapse

Urgese G, Parisi E, Scicolone O, Di Cataldo S, Ficarra E. BioSeqZip: a collapser of NGS redundant reads for the optimization of sequence analysis. Bioinformatics 2020;36:2705-2711. [PMID: 31999333 PMCID: PMC7203750 DOI: 10.1093/bioinformatics/btaa051] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 12/20/2019] [Accepted: 01/22/2020] [Indexed: 01/08/2023] Open

Meyer F, Bagchi S, Chaterji S, Gerlach W, Grama A, Harrison T, Paczian T, Trimble WL, Wilke A. MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis. Brief Bioinform 2020;20:1151-1159. [PMID: 29028869 DOI: 10.1093/bib/bbx105] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 07/21/2017] [Indexed: 11/12/2022] Open

Dietz C, Rueden CT, Helfrich S, Dobson ETA, Horn M, Eglinger J, Evans EL, McLean DT, Novitskaya T, Ricke WA, Sherer NM, Zijlstra A, Berthold MR, Eliceiri KW. Integration of the ImageJ Ecosystem in the KNIME Analytics Platform. FRONTIERS IN COMPUTER SCIENCE 2020;2. [PMID: 32905440 DOI: 10.3389/fcomp.2020.00008] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Abstract

Open-source software tools are often used for analysis of scientific image data due to their flexibility and transparency in dealing with rapidly evolving imaging technologies. The complex nature of image analysis problems frequently requires many tools to be used in conjunction, including image processing and analysis, data processing, machine learning and deep learning, statistical analysis of the results, visualization, correlation to heterogeneous but related data, and more. However, the development, and therefore application, of these computational tools is impeded by a lack of integration across platforms. Integration of tools goes beyond convenience, as it is impractical for one tool to anticipate and accommodate the current and future needs of every user. This problem is emphasized in the field of bioimage analysis, where various rapidly emerging methods are quickly being adopted by researchers. ImageJ is a popular open-source image analysis platform, with contributions from a global community resulting in hundreds of specialized routines for a wide array of scientific tasks. ImageJ's strength lies in its accessibility and extensibility, allowing researchers to easily improve the software to solve their image analysis tasks. However, ImageJ is not designed for development of complex end-to-end image analysis workflows. Scientists are often forced to create highly specialized and hard-to-reproduce scripts to orchestrate individual software fragments and cover the entire life-cycle of an analysis of an image dataset. KNIME Analytics Platform, a user-friendly data integration, analysis, and exploration workflow system, was designed to handle huge amounts of heterogeneous data in a platform-agnostic, computing environment and has been successful in meeting complex end-to-end demands in several communities, such as cheminformatics and mass spectrometry. Similar needs within the bioimage analysis community led to the creation of the KNIME Image Processing extension which integrates ImageJ into KNIME Analytics Platform, enabling researchers to develop reproducible and scalable workflows, integrating a diverse range of analysis tools. Here we present how users and developers alike can leverage the ImageJ ecosystem via the KNIME Image Processing extension to provide robust and extensible image analysis within KNIME workflows. We illustrate the benefits of this integration with examples, as well as representative scientific use cases.

Collapse

Li R, He X, Dai C, Zhu H, Lang X, Chen W, Li X, Zhao D, Zhang Y, Han X, Niu T, Zhao Y, Cao R, He R, Lu Z, Chi X, Li W, Niu B. Gclust: A Parallel Clustering Tool for Microbial Genomic Data. GENOMICS PROTEOMICS & BIOINFORMATICS 2020;17:496-502. [PMID: 31917259 PMCID: PMC7056916 DOI: 10.1016/j.gpb.2018.10.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Revised: 05/29/2018] [Accepted: 10/23/2018] [Indexed: 11/12/2022]

Affiliation(s)

Ruilin Li Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
Xiaoyu He Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
Chuangchuang Dai Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
Haidong Zhu Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
Xianyu Lang Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
Wei Chen Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
Xiaodong Li Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
Dan Zhao Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
Yu Zhang Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
Xinyin Han Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China
Tie Niu Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
Yi Zhao Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
Rongqiang Cao Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
Rong He Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
Zhonghua Lu Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
Xuebin Chi Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China; Center of Scientific Computing Applications & Research, Chinese Academy of Sciences, Beijing 100190, China
Weizhong Li J. Craig Venter Institute, La Jolla, CA 92037, USA.
Beifang Niu Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; University of Chinese Academy of Sciences, Beijing 100190, China; Guizhou University School of Medicine, Guiyang 550025, China.

Collapse

Guo H, Liu B, Guan D, Fu Y, Wang Y. Fast read alignment with incorporation of known genomic variants. BMC Med Inform Decis Mak 2019;19:265. [PMID: 31856811 PMCID: PMC6921400 DOI: 10.1186/s12911-019-0960-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Liao Y, Smyth GK, Shi W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res 2019;47:e47. [PMID: 30783653 PMCID: PMC6486549 DOI: 10.1093/nar/gkz114] [Citation(s) in RCA: 1424] [Impact Index Per Article: 284.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 01/02/2019] [Accepted: 02/13/2019] [Indexed: 11/29/2022] Open

Loka TP, Tausch SH, Renard BY. Reliable variant calling during runtime of Illumina sequencing. Sci Rep 2019;9:16502. [PMID: 31712740 PMCID: PMC6848508 DOI: 10.1038/s41598-019-52991-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 10/16/2019] [Indexed: 02/03/2023] Open

Leimeister CA, Dencker T, Morgenstern B. Accurate multiple alignment of distantly related genome sequences using filtered spaced word matches as anchor points. Bioinformatics 2019;35:211-218. [PMID: 29992260 PMCID: PMC6330006 DOI: 10.1093/bioinformatics/bty592] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 07/09/2018] [Indexed: 01/30/2023] Open

Shajii A, Numanagić I, Baghdadi R, Berger B, Amarasinghe S. Seq: A High-Performance Language for Bioinformatics. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES 2019;3:125. [PMID: 35775031 PMCID: PMC9241673 DOI: 10.1145/3360551] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Abstract

The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100-a factor of over 10⁶-and the amount of data to be analyzed has increased proportionally. Yet, as Moore's Law continues to slow, computational biologists can no longer rely on computing hardware to compensate for the ever-increasing size of biological datasets. In a field where many researchers are primarily focused on biological analysis over computational optimization, the unfortunate solution to this problem is often to simply buy larger and faster machines. Here, we introduce Seq, the first language tailored specifically to bioinformatics, which marries the ease and productivity of Python with C-like performance. Seq starts with a subset of Python-and is in many cases a drop-in replacement-yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. We evaluated Seq on several standard computational genomics tasks like reverse complementation, k-mer manipulation, sequence pattern matching and large genomic index queries. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 160× improvement once domain-specific language features and optimizations are used. With parallelism, we demonstrate up to a 650× improvement. Compared to optimized C++ code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2× improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.

Collapse

Afik S, Raulet G, Yosef N. Reconstructing B-cell receptor sequences from short-read single-cell RNA sequencing with BRAPeS. Life Sci Alliance 2019;2:2/4/e201900371. [PMID: 31451449 PMCID: PMC6709718 DOI: 10.26508/lsa.201900371] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 08/13/2019] [Accepted: 08/14/2019] [Indexed: 12/17/2022] Open

Challenges of big data integration in the life sciences. Anal Bioanal Chem 2019;411:6791-6800. [PMID: 31463515 DOI: 10.1007/s00216-019-02074-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 07/08/2019] [Accepted: 08/06/2019] [Indexed: 10/26/2022]

Siragusa E, Haiminen N, Utro F, Parida L. Linear Time Algorithms to Construct Populations Fitting Multiple Constraint Distributions at Genomic Scales. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:1132-1142. [PMID: 28991752 DOI: 10.1109/tcbb.2017.2760879] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Firtina C, Bar-Joseph Z, Alkan C, Cicek AE. Hercules: a profile HMM-based hybrid error correction algorithm for long reads. Nucleic Acids Res 2019;46:e125. [PMID: 30124947 PMCID: PMC6265270 DOI: 10.1093/nar/gky724] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 08/07/2018] [Indexed: 01/15/2023] Open

Allmer J. Towards an Internet of Science. J Integr Bioinform 2019;16:/j/jib.ahead-of-print/jib-2019-0024/jib-2019-0024.xml. [PMID: 31145694 PMCID: PMC6798852 DOI: 10.1515/jib-2019-0024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 04/25/2019] [Indexed: 11/15/2022] Open

Bayat A, Gaëta B, Ignjatovic A, Parameswaran S. Pairwise alignment of nucleotide sequences using maximal exact matches. BMC Bioinformatics 2019;20:261. [PMID: 31113356 PMCID: PMC6528274 DOI: 10.1186/s12859-019-2827-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Accepted: 04/17/2019] [Indexed: 12/30/2022] Open

Du N, Chen J, Sun Y. Improving the sensitivity of long read overlap detection using grouped short k-mer matches. BMC Genomics 2019;20:190. [PMID: 30967123 PMCID: PMC6456931 DOI: 10.1186/s12864-019-5475-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Abstract

Background

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than second-generation sequencing technologies such as Illumina. The increased read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and characterize the intra-species variations. It also holds the promise to decipher the community structure in complex microbial communities because long reads help metagenomic assembly. One key step in genome assembly using long reads is to quickly identify reads forming overlaps. Because PacBio data has higher sequencing error rate and lower coverage than popular short read sequencing technologies (such as Illumina), efficient detection of true overlaps requires specially designed algorithms. In particular, there is still a need to improve the sensitivity of detecting small overlaps or overlaps with high error rates in both reads. Addressing this need will enable better assembly for metagenomic data produced by third-generation sequencing technologies.

Results

In this work, we designed and implemented an overlap detection program named GroupK, for third-generation sequencing reads based on grouped k-mer hits. While using k-mer hits for detecting reads’ overlaps has been adopted by several existing programs, our method uses a group of short k-mer hits satisfying statistically derived distance constraints to increase the sensitivity of small overlap detection. Grouped k-mer hit was originally designed for homology search. We are the first to apply group hit for long read overlap detection. The experimental results of applying our pipeline to both simulated and real third-generation sequencing data showed that GroupK enables more sensitive overlap detection, especially for datasets of low sequencing coverage.

Conclusions

GroupK is best used for detecting small overlaps for third-generation sequencing data. It provides a useful supplementary tool to existing ones for more sensitive and accurate overlap detection. The source code is freely available at https://github.com/Strideradu/GroupK.

Collapse

Halldorsson BV, Palsson G, Stefansson OA, Jonsson H, Hardarson MT, Eggertsson HP, Gunnarsson B, Oddsson A, Halldorsson GH, Zink F, Gudjonsson SA, Frigge ML, Thorleifsson G, Sigurdsson A, Stacey SN, Sulem P, Masson G, Helgason A, Gudbjartsson DF, Thorsteinsdottir U, Stefansson K. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 2019;363:363/6425/eaau1043. [DOI: 10.1126/science.aau1043] [Citation(s) in RCA: 156] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 05/16/2018] [Accepted: 12/07/2018] [Indexed: 12/14/2022]

Pericard P, Dufresne Y, Couderc L, Blanquart S, Touzet H. MATAM: reconstruction of phylogenetic marker genes from short sequencing reads in metagenomes. Bioinformatics 2018;34:585-591. [PMID: 29040406 DOI: 10.1093/bioinformatics/btx644] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 10/10/2017] [Indexed: 01/18/2023] Open

Single-cell mutation identification via phylogenetic inference. Nat Commun 2018;9:5144. [PMID: 30514897 PMCID: PMC6279798 DOI: 10.1038/s41467-018-07627-7] [Citation(s) in RCA: 52] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 11/15/2018] [Indexed: 12/25/2022] Open

Patil RD, Ellison MJ, Wolff SM, Shearer C, Wright AM, Cockrum RR, Austin KJ, Lamberson WR, Cammack KM, Conant GC. Poor feed efficiency in sheep is associated with several structural abnormalities in the community metabolic network of their ruminal microbes. J Anim Sci 2018;96:2113-2124. [PMID: 29788417 DOI: 10.1093/jas/sky096] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 03/14/2018] [Indexed: 12/19/2022] Open

Abstract

Ruminant animals have a symbiotic relationship with the microorganisms in their rumens. In this relationship, rumen microbes efficiently degrade complex plant-derived compounds into smaller digestible compounds, a process that is very likely associated with host animal feed efficiency. The resulting simpler metabolites can then be absorbed by the host and converted into other compounds by host enzymes. We used a microbial community metabolic network inferred from shotgun metagenomics data to assess how this metabolic system differs between animals that are able to turn ingested feedstuffs into body mass with high efficiency and those that are not. We conducted shotgun sequencing of microbial DNA from the rumen contents of 16 sheep that differed in their residual feed intake (RFI), a measure of feed efficiency. Metagenomic reads from each sheep were mapped onto a database-derived microbial metabolic network, which was linked to the sheep metabolic network by interface metabolites (metabolites transferred from microbes to host). No single enzyme was identified as being significantly different in abundance between the low and high RFI animals (P > 0.05, Wilcoxon test). However, when we analyzed the metabolic network as a whole, we found several differences between efficient and inefficient animals. Microbes from low RFI (efficient) animals use a suite of enzymes closer in network space to the host's reactions than those of the high RFI (inefficient) animals. Similarly, low RFI animals have microbial metabolic networks that, on average, contain reactions using shorter carbon chains than do those of high RFI animals, potentially allowing the host animals to extract metabolites more efficiently. Finally, the efficient animals possess community networks with greater Shannon diversity among their enzymes than do inefficient ones. Thus, our system approach to the ruminal microbiome identified differences attributable to feed efficiency in the structure of the microbes' community metabolic network that were undetected at the level of individual microbial taxa or reactions.

Collapse

Sambo F, Finotello F, Lavezzo E, Baruzzo G, Masi G, Peta E, Falda M, Toppo S, Barzon L, Di Camillo B. Optimizing PCR primers targeting the bacterial 16S ribosomal RNA gene. BMC Bioinformatics 2018;19:343. [PMID: 30268091 PMCID: PMC6162885 DOI: 10.1186/s12859-018-2360-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2017] [Accepted: 09/09/2018] [Indexed: 02/01/2023] Open