1
|
Zhao Y, Yang M, Gong F, Pan Y, Hu M, Peng Q, Lu L, Lyu X, Sun K. Accelerating 3D genomics data analysis with Microcket. Commun Biol 2024; 7:675. [PMID: 38824179 PMCID: PMC11144199 DOI: 10.1038/s42003-024-06382-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Accepted: 05/24/2024] [Indexed: 06/03/2024] Open
Abstract
The three-dimensional (3D) organization of genome is fundamental to cell biology. To explore 3D genome, emerging high-throughput approaches have produced billions of sequencing reads, which is challenging and time-consuming to analyze. Here we present Microcket, a package for mapping and extracting interacting pairs from 3D genomics data, including Hi-C, Micro-C, and derivant protocols. Microcket utilizes a unique read-stitch strategy that takes advantage of the long read cycles in modern DNA sequencers; benchmark evaluations reveal that Microcket runs much faster than the current tools along with improved mapping efficiency, and thus shows high potential in accelerating and enhancing the biological investigations into 3D genome. Microcket is freely available at https://github.com/hellosunking/Microcket .
Collapse
Affiliation(s)
- Yu Zhao
- Molecular Cancer Research Center, School of Medicine, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen, 518107, China
| | - Mengqi Yang
- Institute of Cancer Research, Shenzhen Bay Laboratory, Shenzhen, 518132, China
- Department of Chemical and Biological Engineering, Division of Life Science, Hong Kong University of Science and Technology, Hong Kong SAR, 999077, China
| | - Fanglei Gong
- Institute of Cancer Research, Shenzhen Bay Laboratory, Shenzhen, 518132, China
| | - Yuqi Pan
- Institute of Cancer Research, Shenzhen Bay Laboratory, Shenzhen, 518132, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Minghui Hu
- Molecular Cancer Research Center, School of Medicine, Shenzhen Campus of Sun Yat-sen University, Sun Yat-sen University, Shenzhen, 518107, China
| | - Qin Peng
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, 518132, China
| | - Leina Lu
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Xiaowen Lyu
- State Key Laboratory of Cellular Stress Biology, Fujian Provincial Key Laboratory of Reproductive Health Research, Fujian Provincial Key Laboratory of Organ and Tissue Regeneration, School of Medicine, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, 361102, China
| | - Kun Sun
- Institute of Cancer Research, Shenzhen Bay Laboratory, Shenzhen, 518132, China.
| |
Collapse
|
2
|
Hou Y, Wang L, Pan W. Comparison of Hi-C-Based Scaffolding Tools on Plant Genomes. Genes (Basel) 2023; 14:2147. [PMID: 38136968 PMCID: PMC10742964 DOI: 10.3390/genes14122147] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/03/2023] [Accepted: 11/13/2023] [Indexed: 12/24/2023] Open
Abstract
De novo genome assembly holds paramount significance in the field of genomics. Scaffolding, as a pivotal component within the genome assembly process, is instrumental in determining the orientation and arrangement of contigs, ultimately facilitating the generation of a chromosome-level assembly. Scaffolding is contingent on supplementary linkage information, including paired-end reads, bionano, physical mapping, genetic mapping, and Hi-C (an abbreviation for High-throughput Chromosome Conformation Capture). In recent years, Hi-C has emerged as the predominant source of linkage information in scaffolding, attributed to its capacity to offer long-range signals, leading to the development of numerous Hi-C-based scaffolding tools. However, to the best of our knowledge, there has been a paucity of comprehensive studies assessing and comparing the efficacy of these tools. In order to address this gap, we meticulously selected six tools, namely LACHESIS, pin_hic, YaHS, SALSA2, 3d-DNA, and ALLHiC, and conducted a comparative analysis of their performance across haploid, diploid, and polyploid genomes. This endeavor has yielded valuable insights in advancing the field of genome scaffolding research.
Collapse
Affiliation(s)
- Yuze Hou
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China;
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Li Wang
- College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China;
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
3
|
Erdmann-Pham DD, Batra SS, Turkalo TK, Durbin J, Blanchette M, Yeh I, Shain H, Bastian BC, Song YS, Rokhsar DS, Hockemeyer D. Tracing cancer evolution and heterogeneity using Hi-C. Nat Commun 2023; 14:7111. [PMID: 37932252 PMCID: PMC10628133 DOI: 10.1038/s41467-023-42651-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 10/09/2023] [Indexed: 11/08/2023] Open
Abstract
Chromosomal rearrangements can initiate and drive cancer progression, yet it has been challenging to evaluate their impact, especially in genetically heterogeneous solid cancers. To address this problem we developed HiDENSEC, a new computational framework for analyzing chromatin conformation capture in heterogeneous samples that can infer somatic copy number alterations, characterize large-scale chromosomal rearrangements, and estimate cancer cell fractions. After validating HiDENSEC with in silico and in vitro controls, we used it to characterize chromosome-scale evolution during melanoma progression in formalin-fixed tumor samples from three patients. The resulting comprehensive annotation of the genomic events includes copy number neutral translocations that disrupt tumor suppressor genes such as NF1, whole chromosome arm exchanges that result in loss of CDKN2A, and whole-arm copy-number neutral loss of homozygosity involving PTEN. These findings show that large-scale chromosomal rearrangements occur throughout cancer evolution and that characterizing these events yields insights into drivers of melanoma progression.
Collapse
Affiliation(s)
- Dan Daniel Erdmann-Pham
- Department of Mathematics, University of California, Berkeley, CA, 94720, USA
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Sanjit Singh Batra
- Computer Science Division, University of California, Berkeley, CA, 94720, USA
| | - Timothy K Turkalo
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
| | - James Durbin
- Dovetail Genomics, Enterprise Way, Scotts Valley, CA, 95066, USA
| | - Marco Blanchette
- Dovetail Genomics, Enterprise Way, Scotts Valley, CA, 95066, USA
| | - Iwei Yeh
- Department of Dermatology and Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, 94143, USA
- Department of Pathology, University of California, San Francisco, CA, 94143, USA
| | - Hunter Shain
- Department of Dermatology and Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Boris C Bastian
- Department of Dermatology and Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, 94143, USA
- Department of Pathology, University of California, San Francisco, CA, 94143, USA
| | - Yun S Song
- Computer Science Division, University of California, Berkeley, CA, 94720, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA.
- Department of Statistics, University of California, Berkeley, CA, 94720, USA.
| | - Daniel S Rokhsar
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA.
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA.
- Okinawa Institute for Science and Technology, Tancha, Okinawa, Japan.
| | - Dirk Hockemeyer
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA.
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA.
| |
Collapse
|
4
|
Wang S, Wang M, Chen L, Pan G, Wang Y, Li SC. SpecHLA enables full-resolution HLA typing from sequencing data. CELL REPORTS METHODS 2023; 3:100589. [PMID: 37714157 PMCID: PMC10545945 DOI: 10.1016/j.crmeth.2023.100589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 06/20/2023] [Accepted: 08/21/2023] [Indexed: 09/17/2023]
Abstract
Reconstructing diploid sequences of human leukocyte antigen (HLA) genes, i.e., full-resolution HLA typing, from sequencing data is challenging. The high homogeneity across HLA genes and the high heterogeneity within HLA alleles complicate the identification of genomic source loci for sequencing reads. Here, we present SpecHLA, which utilizes fine-tuned reads binning and local assembly to achieve accurate full-resolution HLA typing. SpecHLA accepts sequencing data from paired-end, 10×-linked-reads, high-throughput chromosome conformation capture (Hi-C), Pacific Biosciences (PacBio), and Oxford Nanopore Technology (ONT). It can also incorporate pedigree data and genotype frequency to refine typing. In 32 Human Genome Structural Variation Consortium, Phase 2 (HGSVC2) samples, SpecHLA achieved 98.6% accuracy for G-group-resolution HLA typing, inferring entire HLA alleles with an average of three mismatches fewer, ten gaps fewer, and 590 bp less edit distance than HISAT-genotype per allele. Additionally, SpecHLA exhibited a 2-field typing accuracy of 98.6% in 875 real samples. Finally, SpecHLA detected HLA loss of heterozygosity with 99.7% specificity and 96.8% sensitivity in simulated samples of cancer cell lines.
Collapse
Affiliation(s)
- Shuai Wang
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Mengyao Wang
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Lingxi Chen
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Guangze Pan
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Yanfei Wang
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong
| | - Shuai Cheng Li
- City University of Hong Kong, Department of Computer Science, Kowloon, Hong Kong.
| |
Collapse
|
5
|
Ouchi S, Kajitani R, Itoh T. GreenHill: a de novo chromosome-level scaffolding and phasing tool using Hi-C. Genome Biol 2023; 24:162. [PMID: 37434204 DOI: 10.1186/s13059-023-03006-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/04/2023] [Indexed: 07/13/2023] Open
Abstract
Chromosome-level haplotype-resolved genome assembly is an important resource in molecular biology. However, current de novo haplotype assemblers require parental data or reference genomes and often fail to provide chromosome-level results. We present GreenHill, a novel scaffolding and phasing tool that considers various assemblers' contigs as input to reconstruct chromosome-level haplotypes using Hi-C without parental or reference data. Its unique functions include new error correction based on Hi-C contacts and the simultaneous use of Hi-C and long reads. Benchmarks reveal that GreenHill outperforms other approaches in contiguity and phasing accuracy, and the majority of chromosome arms are entirely phased.
Collapse
Affiliation(s)
- Shun Ouchi
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-Ku, Tokyo, 152-8550, Japan
| | - Rei Kajitani
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-Ku, Tokyo, 152-8550, Japan
| | - Takehiko Itoh
- School of Life Science and Technology, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-Ku, Tokyo, 152-8550, Japan.
| |
Collapse
|
6
|
McCallum GE, Rossiter AE, Quraishi MN, Iqbal TH, Kuehne SA, van Schaik W. Noise reduction strategies in metagenomic chromosome confirmation capture to link antibiotic resistance genes to microbial hosts. Microb Genom 2023; 9:mgen001030. [PMID: 37272920 PMCID: PMC10327510 DOI: 10.1099/mgen.0.001030] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 04/11/2023] [Indexed: 06/06/2023] Open
Abstract
The gut microbiota is a reservoir for antimicrobial resistance genes (ARGs). With current sequencing methods, it is difficult to assign ARGs to their microbial hosts, particularly if these ARGs are located on plasmids. Metagenomic chromosome conformation capture approaches (meta3C and Hi-C) have recently been developed to link bacterial genes to phylogenetic markers, thus potentially allowing the assignment of ARGs to their hosts on a microbiome-wide scale. Here, we generated a meta3C dataset of a human stool sample and used previously published meta3C and Hi-C datasets to investigate bacterial hosts of ARGs in the human gut microbiome. Sequence reads mapping to repetitive elements were found to cause problematic noise in, and may importantly skew interpretation of, meta3C and Hi-C data. We provide a strategy to improve the signal-to-noise ratio by discarding reads that map to insertion sequence elements and to the end of contigs. We also show the importance of using spike-in controls to quantify whether the cross-linking step in meta3C and Hi-C protocols has been successful. After filtering to remove artefactual links, 87 ARGs were assigned to their bacterial hosts across all datasets, including 27 ARGs in the meta3C dataset we generated. We show that commensal gut bacteria are an important reservoir for ARGs, with genes coding for aminoglycoside and tetracycline resistance being widespread in anaerobic commensals of the human gut.
Collapse
Affiliation(s)
- Gregory E. McCallum
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Amanda E. Rossiter
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | | | - Tariq H. Iqbal
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Sarah A. Kuehne
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- School of Dentistry, Institute of Clinical Sciences, University of Birmingham, Birmingham, UK
| | - Willem van Schaik
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| |
Collapse
|
7
|
Fan S, Dang D, Ye Y, Zhang SW, Gao L, Zhang S. scHi-CSim: a flexible simulator that generates high-fidelity single-cell Hi-C data for benchmarking. J Mol Cell Biol 2023; 15:mjad003. [PMID: 36708167 PMCID: PMC10308180 DOI: 10.1093/jmcb/mjad003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 09/18/2022] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells. However, high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis. Here, we developed a single-cell Hi-C simulator (scHi-CSim) that generates high-fidelity data for benchmarking. scHi-CSim merges neighboring cells to overcome the sparseness of data, samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells, and estimates the empirical distribution of restriction fragments to generate simulated data. We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data. Furthermore, scHi-CSim is flexible to change sequencing depth and the number of simulated replicates. We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains. We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.
Collapse
Affiliation(s)
- Shichen Fan
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Dachang Dang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
8
|
Taylor D, Branco MR. Inferring Protein-DNA Binding Profiles at Interspersed Repeats Using HiChIP and PAtChER. Methods Mol Biol 2023; 2607:199-214. [PMID: 36449165 DOI: 10.1007/978-1-0716-2883-6_11] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Alignment of short-read sequencing data to interspersed genomic repeats, such as transposable elements, can be problematic. This is especially true for evolutionarily young elements, which have not sufficiently diverged from each other to produce distinct and uniquely mappable reads. Mapping difficulties pose a challenge for studying the portfolio of epigenetic modifications and other chromatin regulators that bind to transposons and dictate their activity, which are typically studied using chromatin immunoprecipitation followed by sequencing (ChIP-seq). Since ChIP-seq requires chromatin fragmentation to achieve appropriate resolution, longer reads do not appreciably improve mappability. Here, we present an experimental and computational protocol that couples ChIP-seq with 3D genome folding information to produce protein binding profiles with dramatically increased coverage at interspersed repeats.
Collapse
Affiliation(s)
- Darren Taylor
- Blizard Institute, Faculty of Medicine and Dentistry, QMUL, London, UK
| | - Miguel R Branco
- Faculty of Medicine and Dentistry, Blizard Institute, Queen Mary University of London, London, UK.
| |
Collapse
|
9
|
Ivanova V, Chernevskaya E, Vasiluev P, Ivanov A, Tolstoganov I, Shafranskaya D, Ulyantsev V, Korobeynikov A, Razin SV, Beloborodova N, Ulianov SV, Tyakht A. Hi-C Metagenomics in the ICU: Exploring Clinically Relevant Features of Gut Microbiome in Chronically Critically Ill Patients. Front Microbiol 2022; 12:770323. [PMID: 35185811 PMCID: PMC8851603 DOI: 10.3389/fmicb.2021.770323] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 11/25/2021] [Indexed: 01/02/2023] Open
Abstract
Gut microbiome in critically ill patients shows profound dysbiosis. The most vulnerable is the subgroup of chronically critically ill (CCI) patients – those suffering from long-term dependence on support systems in intensive care units. It is important to investigate their microbiome as a potential reservoir of opportunistic taxa causing co-infections and a morbidity factor. We explored dynamics of microbiome composition in the CCI patients by combining “shotgun” metagenomics with chromosome conformation capture (Hi-C). Stool samples were collected at 2 time points from 2 patients with severe brain injury with different outcomes within a 1–2-week interval. The metagenome-assembled genomes (MAGs) were reconstructed based on the Hi-C data using a novel hicSPAdes method (along with the bin3c method for comparison), as well as independently of the Hi-C using MetaBAT2. The resistomes of the samples were derived using a novel assembly graph-based approach. Links of bacteria to antibiotic resistance genes, plasmids and viruses were analyzed using Hi-C-based networks. The gut community structure was enriched in opportunistic microorganisms. The binning using hicSPAdes was superior to the conventional WGS-based binning as well as to the bin3c in terms of the number, completeness and contamination of the reconstructed MAGs. Using Klebsiella pneumoniae as an example, we showed how chromosome conformation capture can aid comparative genomic analysis of clinically important pathogens. Diverse associations of resistome with antimicrobial therapy from the level of assembly graphs to gene content were discovered. Analysis of Hi-C networks suggested multiple “host-plasmid” and “host-phage” links. Hi-C metagenomics is a promising technique for investigating clinical microbiome samples. It provides a community composition profile with increased details on bacterial gene content and mobile genetic elements compared to conventional metagenomics. The ability of Hi-C binning to encompass the MAG’s plasmid content facilitates metagenomic evaluation of virulence and drug resistance dynamics in clinically relevant opportunistic pathogens. These findings will help to identify the targets for developing cost-effective and rapid tests for assessing microbiome-related health risks.
Collapse
Affiliation(s)
- Valeriia Ivanova
- Institute of Gene Biology Russian Academy of Sciences, Moscow, Russia
| | - Ekaterina Chernevskaya
- Institute of Gene Biology Russian Academy of Sciences, Moscow, Russia
- Federal Research and Clinical Center of Intensive Care Medicine and Rehabilitology, Moscow, Russia
| | - Petr Vasiluev
- Institute of Gene Biology Russian Academy of Sciences, Moscow, Russia
- Research Centre for Medical Genetics, Moscow, Russia
| | - Artem Ivanov
- Computer Technologies Laboratory, ITMO University, Saint Petersburg, Russia
| | - Ivan Tolstoganov
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Daria Shafranskaya
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Vladimir Ulyantsev
- Computer Technologies Laboratory, ITMO University, Saint Petersburg, Russia
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnologies, Saint Petersburg State University, Saint Petersburg, Russia
| | - Sergey V. Razin
- Institute of Gene Biology Russian Academy of Sciences, Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Natalia Beloborodova
- Federal Research and Clinical Center of Intensive Care Medicine and Rehabilitology, Moscow, Russia
| | - Sergey V. Ulianov
- Institute of Gene Biology Russian Academy of Sciences, Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Alexander Tyakht
- Institute of Gene Biology Russian Academy of Sciences, Moscow, Russia
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Institute of Gene Biology Russian Academy of Sciences, Moscow, Russia
- *Correspondence: Alexander Tyakht,
| |
Collapse
|
10
|
Lindsly S, Jia W, Chen H, Liu S, Ronquist S, Chen C, Wen X, Stansbury C, Dotson GA, Ryan C, Rehemtulla A, Omenn GS, Wicha M, Li SC, Muir L, Rajapakse I. Functional organization of the maternal and paternal human 4D Nucleome. iScience 2021; 24:103452. [PMID: 34877507 PMCID: PMC8633971 DOI: 10.1016/j.isci.2021.103452] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 10/16/2021] [Accepted: 11/09/2021] [Indexed: 11/19/2022] Open
Abstract
Every human somatic cell inherits a maternal and a paternal genome, which work together to give rise to cellular phenotypes. However, the allele-specific relationship between gene expression and genome structure through the cell cycle is largely unknown. By integrating haplotype-resolved genome-wide chromosome conformation capture, mature and nascent mRNA, and protein binding data from a B lymphoblastoid cell line, we investigate this relationship both globally and locally. We introduce the maternal and paternal 4D Nucleome, enabling detailed analysis of the mechanisms and dynamics of genome structure and gene function for diploid organisms. Our analyses find significant coordination between allelic expression biases and local genome conformation, and notably absent expression bias in universally essential cell cycle and glycolysis genes. We propose a model in which coordinated biallelic expression reflects prioritized preservation of essential gene sets.
Collapse
Affiliation(s)
- Stephen Lindsly
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wenlong Jia
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Haiming Chen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Sijia Liu
- MIT-IBM Watson AI Lab, IBM Research, Cambridge, MA 02142, USA
| | - Scott Ronquist
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Can Chen
- Department of Mathematics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xingzhao Wen
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Cooper Stansbury
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gabrielle A. Dotson
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Charles Ryan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Medical Scientist Training Program, University of Michigan, Ann Arbor, MI 48109, USA
- Program in Cellular and Molecular Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Alnawaz Rehemtulla
- Department of Hematology/Oncology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Internal Medicine, Human Genetics, and School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | - Max Wicha
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Hematology/Oncology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Lindsey Muir
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Indika Rajapakse
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Mathematics, University of Michigan, Ann Arbor, MI 48109, USA
- Corresponding author
| |
Collapse
|
11
|
Taylor D, Lowe R, Philippe C, Cheng KCL, Grant OA, Zabet NR, Cristofari G, Branco MR. Locus-specific chromatin profiling of evolutionarily young transposable elements. Nucleic Acids Res 2021; 50:e33. [PMID: 34908129 PMCID: PMC8989514 DOI: 10.1093/nar/gkab1232] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 11/15/2021] [Accepted: 12/02/2021] [Indexed: 01/13/2023] Open
Abstract
Despite a vast expansion in the availability of epigenomic data, our knowledge of the chromatin landscape at interspersed repeats remains highly limited by difficulties in mapping short-read sequencing data to these regions. In particular, little is known about the locus-specific regulation of evolutionarily young transposable elements (TEs), which have been implicated in genome stability, gene regulation and innate immunity in a variety of developmental and disease contexts. Here we propose an approach for generating locus-specific protein-DNA binding profiles at interspersed repeats, which leverages information on the spatial proximity between repetitive and non-repetitive genomic regions. We demonstrate that the combination of HiChIP and a newly developed mapping tool (PAtChER) yields accurate protein enrichment profiles at individual repetitive loci. Using this approach, we reveal previously unappreciated variation in the epigenetic profiles of young TE loci in mouse and human cells. Insights gained using our method will be invaluable for dissecting the molecular determinants of TE regulation and their impact on the genome.
Collapse
Affiliation(s)
- Darren Taylor
- Blizard Institute, Barts and The London School of Medicine and Dentistry, QMUL, London E1 2AT, UK
| | - Robert Lowe
- Blizard Institute, Barts and The London School of Medicine and Dentistry, QMUL, London E1 2AT, UK
| | | | - Kevin C L Cheng
- Blizard Institute, Barts and The London School of Medicine and Dentistry, QMUL, London E1 2AT, UK
| | - Olivia A Grant
- Blizard Institute, Barts and The London School of Medicine and Dentistry, QMUL, London E1 2AT, UK.,School of Life Sciences, University of Essex, Colchester, CO4 3SQ, UK
| | - Nicolae Radu Zabet
- Blizard Institute, Barts and The London School of Medicine and Dentistry, QMUL, London E1 2AT, UK
| | | | - Miguel R Branco
- Blizard Institute, Barts and The London School of Medicine and Dentistry, QMUL, London E1 2AT, UK
| |
Collapse
|
12
|
Hill BM, Bisht K, Atkins GR, Gomez AA, Rumbaugh KP, Wakeman CA, Brown AMV. Lysis-Hi-C as a method to study polymicrobial communities and eDNA. Mol Ecol Resour 2021; 22:1029-1042. [PMID: 34669257 PMCID: PMC9215119 DOI: 10.1111/1755-0998.13535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 10/06/2021] [Accepted: 10/11/2021] [Indexed: 11/30/2022]
Abstract
Microbes interact in natural communities in a spatially structured manner, particularly in biofilms and polymicrobial infections. While next generation sequencing approaches provide powerful insights into diversity, metabolic capacity, and mutational profiles of these communities, they generally fail to recover in situ spatial proximity between distinct genotypes in the interactome. Hi‐C is a promising method that has assisted in analysing complex microbiomes, by creating chromatin cross‐links in cells, that aid in identifying adjacent DNA, to improve de novo assembly. This study explored a modified Hi‐C approach involving an initial lysis phase prior to DNA cross‐linking, to test whether adjacent cell chromatin can be cross‐linked, anticipating that this could provide a new avenue for study of spatial‐mutational dynamics in structured microbial communities. An artificial polymicrobial mixture of Pseudomonas aeruginosa, Staphylococcus aureus, and Escherichia coli was lysed for 1–18 h, then prepared for Hi‐C. A murine biofilm infection model was treated with sonication, mechanical lysis, or chemical lysis before Hi‐C. Bioinformatic analyses of resulting Hi‐C interspecies chromatin links showed that while microbial species differed from one another, generally lysis significantly increased links between species and increased the distance of Hi‐C links within species, while also increasing novel plasmid‐chromosome links. The success of this modified lysis‐Hi‐C protocol in creating extracellular DNA links is a promising first step toward a new lysis‐Hi‐C based method to recover genotypic microgeography in polymicrobial communities, with potential future applications in diseases with localized resistance, such as cystic fibrosis lung infections and chronic diabetic ulcers.
Collapse
Affiliation(s)
- Bravada M Hill
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Karishma Bisht
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Georgia Rae Atkins
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Amy A Gomez
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Kendra P Rumbaugh
- Department of Surgery, School of Medicine, Texas Tech Health Sciences Center, Lubbock, Texas, USA
| | - Catherine A Wakeman
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Amanda M V Brown
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| |
Collapse
|
13
|
DeMaere MZ, Darling AE. qc3C: Reference-free quality control for Hi-C sequencing data. PLoS Comput Biol 2021; 17:e1008839. [PMID: 34634030 PMCID: PMC8530316 DOI: 10.1371/journal.pcbi.1008839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 10/21/2021] [Accepted: 09/16/2021] [Indexed: 11/19/2022] Open
Abstract
Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have—thus far—relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods. The Hi-C sequencing technique offers the potential for significant scientific insight about the spatial arrangement of DNA, however achieving such outcomes is highly dependent on the quality of the resulting sequencing library. Unlike conventional next-gen sequencing, only a fraction of a given Hi-C library contains this useful spatial information (the signal) with the remainder being effectively noise. As Hi-C remains a challenging laboratory technique, signal strength of resulting libraries can vary greatly. As a quality metric, the quantification a library’s signal content is an essential asset in any quality mitigation strategy. Quality assessment of Hi-C data has until now relied on access to a (ideally refined) reference sequence, by which indirect indicators of quality are determined. Here we describe qc3C, a software tool capable of the direct, reference-free estimation of the signal content of a Hi-C library. In doing so, not only can researchers make informed decisions on how to progress based on library information content, but eliminating the reference also enables Hi-C quality management for non-model organism and metagenomics researchers.
Collapse
Affiliation(s)
- Matthew Z. DeMaere
- The iThree Institute, University of Technology Sydney, Ultimo, NSW, Australia
- * E-mail:
| | - Aaron E. Darling
- The iThree Institute, University of Technology Sydney, Ultimo, NSW, Australia
| |
Collapse
|
14
|
Magnitov MD, Kuznetsova VS, Ulianov SV, Razin SV, Tyakht AV. Benchmark of software tools for prokaryotic chromosomal interaction domain identification. Bioinformatics 2020; 36:4560-4567. [PMID: 32492116 PMCID: PMC7653553 DOI: 10.1093/bioinformatics/btaa555] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 05/26/2020] [Accepted: 05/29/2020] [Indexed: 01/01/2023] Open
Abstract
Motivation The application of genome-wide chromosome conformation capture (3C) methods to prokaryotes provided insights into the spatial organization of their genomes and identified patterns conserved across the tree of life, such as chromatin compartments and contact domains. Prokaryotic genomes vary in GC content and the density of restriction sites along the chromosome, suggesting that these properties should be considered when planning experiments and choosing appropriate software for data processing. Diverse algorithms are available for the analysis of eukaryotic chromatin contact maps, but their potential application to prokaryotic data has not yet been evaluated. Results Here, we present a comparative analysis of domain calling algorithms using available single-microbe experimental data. We evaluated the algorithms’ intra-dataset reproducibility, concordance with other tools and sensitivity to coverage and resolution of contact maps. Using RNA-seq as an example, we showed how orthogonal biological data can be utilized to validate the reliability and significance of annotated domains. We also suggest that in silico simulations of contact maps can be used to choose optimal restriction enzymes and estimate theoretical map resolutions before the experiment. Our results provide guidelines for researchers investigating microbes and microbial communities using high-throughput 3C assays such as Hi-C and 3C-seq. Availability and implementation The code of the analysis is available at https://github.com/magnitov/prokaryotic_cids. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikhail D Magnitov
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine.,Group of Genome Spatial Organization, Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia.,Department of Biological and Medical Physics, Moscow Institute of Physics and Technology (National Research University), Dolgoprudny 141700, Russia
| | - Veronika S Kuznetsova
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology (National Research University), Dolgoprudny 141700, Russia.,Group of Bioinformatics
| | - Sergey V Ulianov
- Laboratory of Structural and Functional Organization of Chromosomes, Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia.,Department of Biology, Moscow State University, Moscow 119234, Russia
| | - Sergey V Razin
- Laboratory of Structural and Functional Organization of Chromosomes, Institute of Gene Biology, Russian Academy of Sciences, Moscow 119334, Russia.,Department of Biology, Moscow State University, Moscow 119234, Russia
| | - Alexander V Tyakht
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine.,Group of Bioinformatics
| |
Collapse
|
15
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
16
|
Rieber L, Mahony S. Joint inference and alignment of genome structures enables characterization of compartment-independent reorganization across cell types. Epigenetics Chromatin 2019; 12:61. [PMID: 31594535 PMCID: PMC6784335 DOI: 10.1186/s13072-019-0308-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2019] [Accepted: 09/25/2019] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Comparisons of Hi-C data sets between cell types and conditions have revealed differences in topologically associated domains (TADs) and A/B compartmentalization, which are correlated with differences in gene regulation. However, previous comparisons have focused on known forms of 3D organization while potentially neglecting other functionally relevant differences. We aimed to create a method to quantify all locus-specific differences between two Hi-C data sets. RESULTS We developed MultiMDS to jointly infer and align 3D chromosomal structures from two Hi-C data sets, thereby enabling a new way to comprehensively quantify relocalization of genomic loci between cell types. We demonstrate this approach by comparing Hi-C data across a variety of cell types. We consistently find relocalization of loci with minimal difference in A/B compartment score. For example, we identify compartment-independent relocalizations between GM12878 and K562 cells that involve loci displaying enhancer-associated histone marks in one cell type and polycomb-associated histone marks in the other. CONCLUSIONS MultiMDS is the first tool to identify all loci that relocalize between two Hi-C data sets. Our method can identify 3D localization differences that are correlated with cell-type-specific regulatory activities and which cannot be identified using other methods.
Collapse
Affiliation(s)
- Lila Rieber
- Department of Biochemistry and Molecular Biology and Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802 USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology and Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802 USA
| |
Collapse
|
17
|
Z/I1 Hybrid Virulence Plasmids Carrying Antimicrobial Resistance genes in S. Typhimurium from Australian Food Animal Production. Microorganisms 2019; 7:microorganisms7090299. [PMID: 31470501 PMCID: PMC6780720 DOI: 10.3390/microorganisms7090299] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 08/22/2019] [Accepted: 08/25/2019] [Indexed: 12/29/2022] Open
Abstract
Knowledge of mobile genetic elements that capture and disseminate antimicrobial resistance genes between diverse environments, particularly across human-animal boundaries, is key to understanding the role anthropogenic activities have in the evolution of antimicrobial resistance. Plasmids that circulate within the Enterobacteriaceae and the Proteobacteria more broadly are well placed to acquire resistance genes sourced from separate niche environments and provide a platform for smaller mobile elements such as IS26 to assemble these genes into large, complex genomic structures. Here, we characterised two atypical Z/I1 hybrid plasmids, pSTM32-108 and pSTM37-118, hosting antimicrobial resistance and virulence associated genes within endemic pathogen Salmonella enterica serovar Typhimurium 1,4,[5],12:i:-, sourced from Australian swine production facilities during 2013. We showed that the plasmids found in S. Typhimurium 1,4,[5],12:i:- are close relatives of two plasmids identified from Escherichia coli of human and bovine origin in Australia circa 1998. The older plasmids, pO26-CRL125 and pO111-CRL115, encoded a putative serine protease autotransporter and were host to a complex resistance region composed of a hybrid Tn21-Tn1721 mercury resistance transposon and composite IS26 transposon Tn6026. This gave a broad antimicrobial resistance profile keyed towards first generation antimicrobials used in Australian agriculture but also included a class 1 integron hosting the trimethoprim resistance gene dfrA5. Genes encoding resistance to ampicillin, trimethoprim, sulphonamides, streptomycin, aminoglycosides, tetracyclines and mercury were a feature of these plasmids. Phylogenetic analyses showed very little genetic drift in the sequences of these plasmids over the past 15 years; however, some alterations within the complex resistance regions present on each plasmid have led to the loss of various resistance genes, presumably as a result of the activity of IS26. These alterations may reflect the specific selective pressures placed on the host strains over time. Our studies suggest that these plasmids and variants of them are endemic in Australian food production systems.
Collapse
|
18
|
DeMaere MZ, Darling AE. bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biol 2019; 20:46. [PMID: 30808380 PMCID: PMC6391755 DOI: 10.1186/s13059-019-1643-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 01/29/2019] [Indexed: 11/10/2022] Open
Abstract
Most microbes cannot be easily cultured, and metagenomics provides a means to study them. Current techniques aim to resolve individual genomes from metagenomes, so-called metagenome-assembled genomes (MAGs). Leading approaches depend upon time series or transect studies, the efficacy of which is a function of community complexity, target abundance, and sequencing depth. We describe an unsupervised method that exploits the hierarchical nature of Hi-C interaction rates to resolve MAGs using a single time point. We validate the method and directly compare against a recently announced proprietary service, ProxiMeta. bin3C is an open-source pipeline and makes use of the Infomap clustering algorithm ( https://github.com/cerebis/bin3C ).
Collapse
Affiliation(s)
- Matthew Z. DeMaere
- The ithree institute, University of Technology Sydney, 15 Broadway, Ultimo, 2007 NSW Australia
| | - Aaron E. Darling
- The ithree institute, University of Technology Sydney, 15 Broadway, Ultimo, 2007 NSW Australia
| |
Collapse
|
19
|
DeMaere MZ, Darling AE. Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies. Gigascience 2018; 7:4628124. [PMID: 29149264 PMCID: PMC5827349 DOI: 10.1093/gigascience/gix103] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Accepted: 10/23/2017] [Indexed: 02/02/2023] Open
Abstract
Background Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype. Findings We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. Conclusions We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.
Collapse
Affiliation(s)
- Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2077, Australia
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2077, Australia
| |
Collapse
|