101
|
Espejo Valle-Inclan J, Besselink NJ, de Bruijn E, Cameron DL, Ebler J, Kutzera J, van Lieshout S, Marschall T, Nelen M, Priestley P, Renkens I, Roemer MG, van Roosmalen MJ, Wenger AM, Ylstra B, Fijneman RJ, Kloosterman WP, Cuppen E. A multi-platform reference for somatic structural variation detection. CELL GENOMICS 2022; 2:100139. [PMID: 36778136 PMCID: PMC9903816 DOI: 10.1016/j.xgen.2022.100139] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 05/06/2021] [Accepted: 05/06/2022] [Indexed: 10/18/2022]
Abstract
Accurate detection of somatic structural variation (SV) in cancer genomes remains a challenging problem. This is in part due to the lack of high-quality, gold-standard datasets that enable the benchmarking of experimental approaches and bioinformatic analysis pipelines. Here, we performed somatic SV analysis of the paired melanoma and normal lymphoblastoid COLO829 cell lines using four different sequencing technologies. Based on the evidence from multiple technologies combined with extensive experimental validation, we compiled a comprehensive set of carefully curated and validated somatic SVs, comprising all SV types. We demonstrate the utility of this resource by determining the SV detection performance as a function of tumor purity and sequence depth, highlighting the importance of assessing these parameters in cancer genomics projects. The truth somatic SV dataset as well as the underlying raw multi-platform sequencing data are freely available and are an important resource for community somatic benchmarking efforts.
Collapse
Affiliation(s)
| | - Nicolle J.M. Besselink
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | | | - Daniel L. Cameron
- Hartwig Medical Foundation, Amsterdam, the Netherlands,Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Joachim Kutzera
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | | | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Marcel Nelen
- Department of Human Genetics, Radboud UMC, Nijmegen, the Netherlands
| | | | - Ivo Renkens
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | - Margaretha G.M. Roemer
- Department of Pathology, Amsterdam UMC, Vrije Universiteit Amsterdam, Cancer Center Amsterdam, Amsterdam, the Netherlands
| | | | | | - Bauke Ylstra
- Department of Pathology, Amsterdam UMC, Vrije Universiteit Amsterdam, Cancer Center Amsterdam, Amsterdam, the Netherlands
| | - Remond J.A. Fijneman
- Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Wigard P. Kloosterman
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands,Corresponding author
| | - Edwin Cuppen
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands,Hartwig Medical Foundation, Amsterdam, the Netherlands,Corresponding author
| |
Collapse
|
102
|
Linked-read whole-genome sequencing resolves common and private structural variants in multiple myeloma. Blood Adv 2022; 6:5009-5023. [PMID: 35675515 PMCID: PMC9631623 DOI: 10.1182/bloodadvances.2021006720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 05/31/2022] [Indexed: 01/18/2023] Open
Abstract
Linked-read WGS can be performed without DNA purification and allows for resolution of the diverse structural variants found in MM. Linked-read WGS can, as a standalone assay, provide comprehensive genetics in myeloma and other diseases with complex genomes.
Multiple myeloma (MM) is an incurable and aggressive plasma cell malignancy characterized by a complex karyotype with multiple structural variants (SVs) and copy-number variations (CNVs). Linked-read whole-genome sequencing (lrWGS) allows for refined detection and reconstruction of SVs by providing long-range genetic information from standard short-read sequencing. This makes lrWGS an attractive solution for capturing the full genomic complexity of MM. Here we show that high-quality lrWGS data can be generated from low numbers of cells subjected to fluorescence-activated cell sorting (FACS) without DNA purification. Using this protocol, we analyzed MM cells after FACS from 37 patients with MM using lrWGS. We found high concordance between lrWGS and fluorescence in situ hybridization (FISH) for the detection of recurrent translocations and CNVs. Outside of the regions investigated by FISH, we identified >150 additional SVs and CNVs across the cohort. Analysis of the lrWGS data allowed for resolution of the structure of diverse SVs affecting the MYC and t(11;14) loci, causing the duplication of genes and gene regulatory elements. In addition, we identified private SVs causing the dysregulation of genes recurrently involved in translocations with the IGH locus and show that these can alter the molecular classification of MM. Overall, we conclude that lrWGS allows for the detection of aberrations critical for MM prognostics and provides a feasible route for providing comprehensive genetics. Implementing lrWGS could provide more accurate clinical prognostics, facilitate genomic medicine initiatives, and greatly improve the stratification of patients included in clinical trials.
Collapse
|
103
|
Chiu R, Rajan-Babu IS, Birol I, Friedman JM. Linked-read sequencing for detecting short tandem repeat expansions. Sci Rep 2022; 12:9352. [PMID: 35672336 PMCID: PMC9174224 DOI: 10.1038/s41598-022-13024-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/19/2022] [Indexed: 11/09/2022] Open
Abstract
Detection of short tandem repeat (STR) expansions with standard short-read sequencing is challenging due to the difficulty in mapping multicopy repeat sequences. In this study, we explored how the long-range sequence information of barcode linked-read sequencing (BLRS) can be leveraged to improve repeat-read detection. We also devised a novel algorithm using BLRS barcodes for distance estimation and evaluated its application for STR genotyping. Both approaches were designed for genotyping large expansions (> 1 kb) that cannot be sized accurately by existing methods. Using simulated and experimental data of genomes with STR expansions from multiple BLRS platforms, we validated the utility of barcode and phasing information in attaining better STR genotypes compared to standard short-read sequencing. Although the coverage bias of extremely GC-rich STRs is an important limitation of BLRS, BLRS is an effective strategy for genotyping many other STR loci.
Collapse
Affiliation(s)
- Readman Chiu
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Indhu-Shree Rajan-Babu
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.,Department of Medical and Molecular Genetics, King's College London, Strand, London, WC2R 2LS, UK
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.
| | - Jan M Friedman
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.,BC Children's Hospital Research Institute, Vancouver, BC, V5Z 4H4, Canada
| |
Collapse
|
104
|
Smith SE, Huang W, Tiamani K, Unterer M, Khan Mirzaei M, Deng L. Emerging technologies in the study of the virome. Curr Opin Virol 2022; 54:101231. [DOI: 10.1016/j.coviro.2022.101231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 04/16/2022] [Accepted: 04/19/2022] [Indexed: 11/03/2022]
|
105
|
Bhat GR, Sethi I, Rah B, Kumar R, Afroze D. Innovative in Silico Approaches for Characterization of Genes and Proteins. Front Genet 2022; 13:865182. [PMID: 35664302 PMCID: PMC9159363 DOI: 10.3389/fgene.2022.865182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 04/11/2022] [Indexed: 11/13/2022] Open
Abstract
Bioinformatics is an amalgamation of biology, mathematics and computer science. It is a science which gathers the information from biology in terms of molecules and applies the informatic techniques to the gathered information for understanding and organizing the data in a useful manner. With the help of bioinformatics, the experimental data generated is stored in several databases available online like nucleotide database, protein databases, GENBANK and others. The data stored in these databases is used as reference for experimental evaluation and validation. Till now several online tools have been developed to analyze the genomic, transcriptomic, proteomics, epigenomics and metabolomics data. Some of them include Human Splicing Finder (HSF), Exonic Splicing Enhancer Mutation taster, and others. A number of SNPs are observed in the non-coding, intronic regions and play a role in the regulation of genes, which may or may not directly impose an effect on the protein expression. Many mutations are thought to influence the splicing mechanism by affecting the existing splice sites or creating a new sites. To predict the effect of mutation (SNP) on splicing mechanism/signal, HSF was developed. Thus, the tool is helpful in predicting the effect of mutations on splicing signals and can provide data even for better understanding of the intronic mutations that can be further validated experimentally. Additionally, rapid advancement in proteomics have steered researchers to organize the study of protein structure, function, relationships, and dynamics in space and time. Thus the effective integration of all of these technological interventions will eventually lead to steering up of next-generation systems biology, which will provide valuable biological insights in the field of research, diagnostic, therapeutic and development of personalized medicine.
Collapse
Affiliation(s)
- Gh. Rasool Bhat
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Itty Sethi
- Institute of Human Genetics, University of Jammu, Jammu, India
| | - Bilal Rah
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| | - Rakesh Kumar
- School of Biotechnology, Shri Mata Vaishno Devi University, Katra, India
| | - Dil Afroze
- Advanced Centre for Human Genetics, Sher-I- Kashmir Institute of Medical Sciences, Soura, India
| |
Collapse
|
106
|
Gao Y, Ma L, Liu GE. Initial Analysis of Structural Variation Detections in Cattle Using Long-Read Sequencing Methods. Genes (Basel) 2022; 13:828. [PMID: 35627213 PMCID: PMC9142105 DOI: 10.3390/genes13050828] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/01/2022] [Accepted: 05/04/2022] [Indexed: 02/01/2023] Open
Abstract
Structural variations (SVs), as a great source of genetic variation, are widely distributed in the genome. SVs involve longer genomic sequences and potentially have stronger effects than SNPs, but they are not well captured by short-read sequencing owing to their size and relevance to repeats. Improved characterization of SVs can provide more advanced insight into complex traits. With the availability of long-read sequencing, it has become feasible to uncover the full range of SVs. Here, we sequenced one cattle individual using 10× Genomics (10 × G) linked read, Pacific Biosciences (PacBio) continuous long reads (CLR) and circular consensus sequencing (CCS), as well as Oxford Nanopore Technologies (ONT) PromethION. We evaluated the ability of various methods for SV detection. We identified 21,164 SVs, which amount to 186 Mb covering 7.07% of the whole genome. The number of SVs inferred from long-read-based inferences was greater than that from short reads. The PacBio CLR identified the most of large SVs and covered the most genomes. SVs called with PacBio CCS and ONT data showed high uniformity. The one with the most overlap with the results obtained by short-read data was PB CCS. Together, we found that long reads outperformed short reads in terms of SV detections.
Collapse
Affiliation(s)
- Yahui Gao
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, MD 20705, USA;
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA;
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA;
| | - George E. Liu
- Animal Genomics and Improvement Laboratory, Beltsville Agricultural Research Center, Agricultural Research Service, U.S. Department of Agriculture, Beltsville, MD 20705, USA;
| |
Collapse
|
107
|
Deng S, Feng Y, Pauklin S. 3D chromatin architecture and transcription regulation in cancer. J Hematol Oncol 2022; 15:49. [PMID: 35509102 PMCID: PMC9069733 DOI: 10.1186/s13045-022-01271-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 04/21/2022] [Indexed: 12/18/2022] Open
Abstract
Chromatin has distinct three-dimensional (3D) architectures important in key biological processes, such as cell cycle, replication, differentiation, and transcription regulation. In turn, aberrant 3D structures play a vital role in developing abnormalities and diseases such as cancer. This review discusses key 3D chromatin structures (topologically associating domain, lamina-associated domain, and enhancer-promoter interactions) and corresponding structural protein elements mediating 3D chromatin interactions [CCCTC-binding factor, polycomb group protein, cohesin, and Brother of the Regulator of Imprinted Sites (BORIS) protein] with a highlight of their associations with cancer. We also summarise the recent development of technologies and bioinformatics approaches to study the 3D chromatin interactions in gene expression regulation, including crosslinking and proximity ligation methods in the bulk cell population (ChIA-PET and HiChIP) or single-molecule resolution (ChIA-drop), and methods other than proximity ligation, such as GAM, SPRITE, and super-resolution microscopy techniques.
Collapse
Affiliation(s)
- Siwei Deng
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Old Road, Headington, Oxford, OX3 7LD, UK
| | - Yuliang Feng
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Old Road, Headington, Oxford, OX3 7LD, UK
| | - Siim Pauklin
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Old Road, Headington, Oxford, OX3 7LD, UK.
| |
Collapse
|
108
|
Shearman JR, Naktang C, Sonthirod C, Kongkachana W, U-Thoomporn S, Jomchai N, Maknual C, Yamprasai S, Promchoo W, Ruang-Areerate P, Pootakham W, Tangphatsornruang S. Assembly of a hybrid mangrove, Bruguiera hainesii, and its two ancestral contributors, Bruguiera cylindrica and Bruguiera gymnorhiza. Genomics 2022; 114:110382. [PMID: 35526741 DOI: 10.1016/j.ygeno.2022.110382] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 04/19/2022] [Accepted: 05/02/2022] [Indexed: 01/14/2023]
Abstract
Mangroves are plants that live in tropical and subtropical coastal regions of the world, they are adapted to high salt environments and cyclic tidal flooding. Mangroves play important ecological roles, including acting as breeding grounds for many fish species and to prevent coastal erosion. The genomes of three mangrove species, Bruguiera gymnorhiza, Bruguiera cylindrica, and a hybrid of the two, Bruguiera hainesii were sequenced, assembled and annotated. The two progenitor species, B. gymnorhiza and B. cylindrica, were found to be highly similar to each other and sufficiently similar to B. parviflora to allow it to be used for reference based scaffolding to generate chromosome level scaffolds. The two subgenomes of B. hainesii were independently assembled and scaffolded. Analysis of B. hainesii confirms that it is a hybrid and the hybridisation event was estimated at 2.4 to 3.5 million years ago using a Bayesian Relaxed Molecular Clock approach.
Collapse
Affiliation(s)
- Jeremy R Shearman
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Chaiwat Naktang
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Chutima Sonthirod
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Wasitthee Kongkachana
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Sonicha U-Thoomporn
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Nukoon Jomchai
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Chatree Maknual
- Department of Marine and Coastal Resources, 120 The Government Complex, Chaengwatthana Rd., Thung Song Hong, Bangkok 10210, Thailand
| | - Suchart Yamprasai
- Department of Marine and Coastal Resources, 120 The Government Complex, Chaengwatthana Rd., Thung Song Hong, Bangkok 10210, Thailand
| | - Waratthaya Promchoo
- Department of Marine and Coastal Resources, 120 The Government Complex, Chaengwatthana Rd., Thung Song Hong, Bangkok 10210, Thailand
| | - Panthita Ruang-Areerate
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Wirulda Pootakham
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand
| | - Sithichoke Tangphatsornruang
- National Omics Center, National Science and Technology Development Agency, 111 Thailand Science Park, Paholyothin Road, Khlong Nueng, Khlong Luang, Pathumthani 12120, Thailand.
| |
Collapse
|
109
|
Chen J, Zhong J, He X, Li X, Ni P, Safner T, Šprem N, Han J. The de novo assembly of a European wild boar genome revealed unique patterns of chromosomal structural variations and segmental duplications. Anim Genet 2022; 53:281-292. [PMID: 35238061 PMCID: PMC9314987 DOI: 10.1111/age.13181] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 02/12/2022] [Accepted: 02/12/2022] [Indexed: 02/05/2023]
Abstract
The rapid progress of sequencing technology has greatly facilitated the de novo genome assembly of pig breeds. However, the assembly of the wild boar genome is still lacking, hampering our understanding of chromosomal and genomic evolution during domestication from wild boars into domestic pigs. Here, we sequenced and de novo assembled a European wild boar genome (ASM2165605v1) using the long‐range information provided by 10× Linked‐Reads sequencing. We achieved a high‐quality assembly with contig N50 of 26.09 Mb. Additionally, 1.64% of the contigs (222) with lengths from 107.65 kb to 75.36 Mb covered 90.3% of the total genome size of ASM2165605v1 (~2.5 Gb). Mapping analysis revealed that the contigs can fill 24.73% (93/376) of the gaps present in the orthologous regions of the updated pig reference genome (Sscrofa11.1). We further improved the contigs into chromosome level with a reference‐assistant scaffolding method. Using the ‘assembly‐to‐assembly’ approach, we identified intra‐chromosomal large structural variations (SVs, length >1 kb) between ASM2165605v1 and Sscrofa11.1 assemblies. Interestingly, we found that the number of SV events on the X chromosome deviated significantly from the linear models fitting autosomes (R2 > 0.64, p < 0.001). Specifically, deletions and insertions were deficient on the X chromosome by 66.14 and 58.41% respectively, whereas duplications and inversions were excessive on the X chromosome by 71.96 and 107.61% respectively. We further used the large segmental duplications (SDs, >1 kb) events as a proxy to understand the large‐scale inter‐chromosomal evolution, by resolving parental‐derived relationships for SD pairs. We revealed a significant excess of SD movements from the X chromosome to autosomes (p < 0.001), consistent with the expectation of meiotic sex chromosome inactivation. Enrichment analyses indicated that the genes within derived SD copies on autosomes were significantly related to biological processes involving nervous system, lipid biosynthesis and sperm motility (p < 0.01). Together, our analyses of the de novo assembly of ASM2165605v1 provides insight into the SVs between European wild boar and domestic pig, in addition to the ongoing process of meiotic sex chromosome inactivation in driving inter‐chromosomal interaction between the sex chromosome and autosomes.
Collapse
Affiliation(s)
- Jianhai Chen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Jie Zhong
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Xuefei He
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Xiaoyu Li
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, China
| | - Pan Ni
- Animal Husbandry and Veterinary Institute of Keqiao District, Shaoxing, Zhejiang, China
| | - Toni Safner
- Faculty of Agriculture, University of Zagreb, Zagreb, Croatia.,Centre of Excellence for Biodiversity and Molecular Plant Breeding, (CoE CroP-BioDiv), Zagreb, Croatia
| | - Nikica Šprem
- Faculty of Agriculture, University of Zagreb, Zagreb, Croatia
| | - Jianlin Han
- International Livestock Research Institute, Nairobi, Kenya.,CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
110
|
Leng Z, Li L, Zhou X, Dong G, Chen S, Shang G, Kou H, Yang B, Liu H. Novel Insights into the Stemness and Immune Privilege of Mesenchymal Stem Cells from Human Wharton Jelly by Single-Cell RNA Sequencing. Med Sci Monit 2022; 28:e934660. [PMID: 35153292 PMCID: PMC8855628 DOI: 10.12659/msm.934660] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Accepted: 10/24/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Fundamental and clinical interest in mesenchymal stem cells (MSCs) has risen dramatically over the past 3 decades. The immunomodulatory and differentiation abilities are the main mechanisms in vitro and in vivo. However, increasing evidence casts doubt on the stemness and immunogenicity of MSCs. MATERIAL AND METHODS We conducted a high-throughput 10x RNA sequencing and Smart-seq2 scRNA-seq analysis to reveal gene expression of Wharton jelly MSCs (WJ-MSCs) at a single-cell level. Multipotent differentiation, subpopulations, marker genes, human leucocyte antigen (HLA) gene expression, and cell cluster trajectory analysis were evaluated. RESULTS The WJ-MSCs had considerable heterogeneity between cells in terms of gene expression. They highly, partially, and hardly expressed genes related to mesodermal differentiation, endodermal differentiation, and ectodermal differentiation, respectively. Some cells seem to be bipotent or unipotent stem cells. Further, Monocle and cell cluster trajectory analysis demonstrated that 1 of the 3 divided clusters performed as stem cells, accounting for 12.6% of the population. The marker genes for a stem cell cluster were CRIM1, GLS, PLOD2, NEXN, ACTR2, FN1, MBNL1, LMOD1, COL3A1, NCL, SEC62, EPRS, COL5A2, COL8A1, and VCAN. In addition, the MSCs also highly, partially, and hardly expressed HLA-I antigen genes, HLA-II genes, and the HLA-G gene, respectively, indicating that MSCs probably have immunogenicity. A Kyoto Encyclopedia of Genes and Genomes pathway analysis of the 3 clusters demonstrated that they were mainly connected with viral infectious diseases, cancer, and endocrine and metabolic disorders. The most expressed transcription factors were zf-C2H2, HMG/HMGY, and Homeobox. CONCLUSIONS We found that only a subpopulation of WJ-MSCs are real stem cells and WJ-MSCs probably do not have immune privilege.
Collapse
Affiliation(s)
- Zikuan Leng
- Department of Orthopedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, PR China
| | - Longyu Li
- Department of Orthopedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, PR China
| | - Xiang Zhou
- Department of Orthopedics, The Third Affiliated Hospital of Southern Medical University, Guangzhou, Guangdong, PR China
| | - Guangyao Dong
- Department of Obstetrics, Kaifeng Maternal and Child Health Hospital, Kaifeng, Henan, PR China
| | - Songfeng Chen
- Department of Orthopedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, PR China
| | - Guowei Shang
- Department of Orthopedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, PR China
| | - Hongwei Kou
- Department of Orthopedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, PR China
| | - Bo Yang
- Department of Neurosurgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, PR China
| | - Hongjian Liu
- Department of Orthopedics, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, PR China
| |
Collapse
|
111
|
Mueller JC, Botero-Delgadillo E, Espíndola-Hernández P, Gilsenan C, Ewels P, Gruselius J, Kempenaers B. Local selection signals in the genome of Blue tits emphasize regulatory and neuronal evolution. Mol Ecol 2022; 31:1504-1514. [PMID: 34995389 DOI: 10.1111/mec.16345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 11/18/2021] [Accepted: 12/15/2021] [Indexed: 11/30/2022]
Abstract
Understanding the genomic landscape of adaptation is central to the understanding of microevolution in wild populations. Genomic targets of selection and the underlying genomic mechanisms of adaptation can be elucidated by genome-wide scans for past selective sweeps or by scans for direct fitness associations. We sequenced and assembled 150 haplotypes of 75 Blue tits (Cyanistes caeruleus) of a single central-European population by a linked-read technology. We used these genome data in combination with coalescent simulations (1) to estimate an historical effective population size of ~250,000, which recently declined to ~10,000, and (2) to identify genome-wide distributed selective sweeps of beneficial variants most likely originating from standing genetic variation (soft sweeps). The genes linked to these soft sweeps, but also the ones linked to hard sweeps based on new beneficial mutants, showed a significant enrichment for functions associated with gene expression and transcription regulation. This emphasizes the importance of regulatory evolution in the population's adaptive history. Soft sweeps were further enriched for genes related to axon and synapse development, indicating the significance of neuronal connectivity changes in the brain potentially linked to behavioural adaptations. A previous scan of heterozygosity-fitness correlations revealed a consistent negative effect on arrival date at the breeding site for a single microsatellite in the MDGA2 gene. Here, we used the haplotype structure around this microsatellite to explain the effect as a local and direct outbreeding effect of a gene involved in synapse development.
Collapse
Affiliation(s)
- Jakob C Mueller
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Esteban Botero-Delgadillo
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Pamela Espíndola-Hernández
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Carol Gilsenan
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Phil Ewels
- Science for Life Laboratory (SciLifeLab), Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Joel Gruselius
- Science for Life Laboratory, Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden.,current address: Vanadis Diagnostics, PerkinElmer, Sollentuna, Sweden
| | - Bart Kempenaers
- Department of Behavioural Ecology and Evolutionary Genetics, Max Planck Institute for Ornithology, Seewiesen, Germany
| |
Collapse
|
112
|
Yuan Y. Applications of Optical Mapping for Plant Genome Assembly and Structural Variation Detection. Methods Mol Biol 2022; 2443:245-257. [PMID: 35037210 DOI: 10.1007/978-1-0716-2067-0_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Optical mapping plays an important role in plant genomics, particularly in plant genome assembly and large-scale structural variation detection. While DNA sequencing provides base-by-base nucleotide information, optical mapping shows the physical locations of selected enzyme restriction sites in a genome. The long single-molecule maps produced by optical mapping make it a useful auxiliary technique to DNA sequencing, which generally cannot span large and complex genomic regions. Although optical mapping, therefore, offers unique advantages to researchers, there are few dedicated tools to assist in optical mapping analyses. In this chapter, we present runBNG2, a successor of runBNG to help optical-mapping data analysis for diverse datasets.
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Life Sciences, The Chinese University of Hong Kong, Hong Kong, SAR, China.
- State Key Laboratory for Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong, SAR, China.
| |
Collapse
|
113
|
Abstract
The recent emergence of "third-generation" sequencing platforms which address shortcomings of standard short reads has allowed the resolution of complex genomic regions during genome assembly. However, sequencing costs for third-generation platforms continue to be high. Novel approaches that leverage the low cost of short-read sequencing while capturing long-range information have been developed. In this chapter, we focus on one such approach, the 10x Genomics' Chromium system. We demonstrate the assembly of the B73 maize reference genome using the Supernova assembler. We also offer suggestions on how one might improve the resulting assembly through analysis of assembly metrics.
Collapse
Affiliation(s)
- Paul Visendi
- Centre for Agriculture and the Bioeconomy, Queensland University of Technology, Brisbane, QLD, Australia.
| |
Collapse
|
114
|
Tran TM, Kim SC, Modavi C, Abate AR. Robotic automation of droplet microfluidics. BIOMICROFLUIDICS 2022; 16:014102. [PMID: 35145570 PMCID: PMC8816516 DOI: 10.1063/5.0064265] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 11/23/2021] [Indexed: 06/14/2023]
Abstract
Droplet microfluidics enables powerful analytic capabilities but often requires workflows involving macro- and microfluidic processing steps that are cumbersome to perform manually. Here, we demonstrate the automation of droplet microfluidics with commercial fluid-handling robotics. The workflows incorporate common microfluidic devices including droplet generators, mergers, and sorters and utilize the robot's native capabilities for thermal control, incubation, and plate scanning. The ability to automate microfluidic devices using commercial fluid handling will speed up the integration of these methods into biological workflows.
Collapse
Affiliation(s)
- Tuan M. Tran
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California 94158, USA
| | - Samuel C. Kim
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California 94158, USA
| | - Cyrus Modavi
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California 94158, USA
| | | |
Collapse
|
115
|
Athanasopoulou K, Boti MA, Adamopoulos PG, Skourou PC, Scorilas A. Third-Generation Sequencing: The Spearhead towards the Radical Transformation of Modern Genomics. Life (Basel) 2021; 12:life12010030. [PMID: 35054423 PMCID: PMC8780579 DOI: 10.3390/life12010030] [Citation(s) in RCA: 68] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/20/2021] [Accepted: 12/23/2021] [Indexed: 12/14/2022] Open
Abstract
Although next-generation sequencing (NGS) technology revolutionized sequencing, offering a tremendous sequencing capacity with groundbreaking depth and accuracy, it continues to demonstrate serious limitations. In the early 2010s, the introduction of a novel set of sequencing methodologies, presented by two platforms, Pacific Biosciences (PacBio) and Oxford Nanopore Sequencing (ONT), gave birth to third-generation sequencing (TGS). The innovative long-read technologies turn genome sequencing into an ease-of-handle procedure by greatly reducing the average time of library construction workflows and simplifying the process of de novo genome assembly due to the generation of long reads. Long sequencing reads produced by both TGS methodologies have already facilitated the decipherment of transcriptional profiling since they enable the identification of full-length transcripts without the need for assembly or the use of sophisticated bioinformatics tools. Long-read technologies have also provided new insights into the field of epitranscriptomics, by allowing the direct detection of RNA modifications on native RNA molecules. This review highlights the advantageous features of the newly introduced TGS technologies, discusses their limitations and provides an in-depth comparison regarding their scientific background and available protocols as well as their potential utility in research and clinical applications.
Collapse
|
116
|
Prunier J, Carrier A, Gilbert I, Poisson W, Albert V, Taillon J, Bourret V, Côté SD, Droit A, Robert C. CNVs with adaptive potential in Rangifer tarandus: genome architecture and new annotated assembly. Life Sci Alliance 2021; 5:5/3/e202101207. [PMID: 34911809 PMCID: PMC8711850 DOI: 10.26508/lsa.202101207] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 11/29/2021] [Accepted: 11/29/2021] [Indexed: 01/13/2023] Open
Abstract
Rangifer tarandus has experienced recent drastic population size reductions throughout its circumpolar distribution and preserving the species implies genetic diversity conservation. To facilitate genomic studies of the species populations, we improved the genome assembly by combining long read and linked read and obtained a new highly accurate and contiguous genome assembly made of 13,994 scaffolds (L90 = 131 scaffolds). Using de novo transcriptome assembly of RNA-sequencing reads and similarity with annotated human gene sequences, 17,394 robust gene models were identified. As copy number variations (CNVs) likely play a role in adaptation, we additionally investigated these variations among 20 genomes representing three caribou ecotypes (migratory, boreal and mountain). A total of 1,698 large CNVs (length > 1 kb) showing a genome distribution including hotspots were identified. 43 large CNVs were particularly distinctive of the migratory and sedentary ecotypes and included genes annotated for functions likely related to the expected adaptations. This work includes the first publicly available annotation of the caribou genome and the first assembly allowing genome architecture analyses, including the likely adaptive CNVs reported here.
Collapse
Affiliation(s)
- Julien Prunier
- Département de Médecine Moléculaire, Faculté de Médecine, Université Laval, Quebec City, Canada
| | - Alexandra Carrier
- Département des sciences animales, Faculté des Sciences de l'Agriculture et de l'Alimentation, Université Laval, Quebec City, Canada
| | - Isabelle Gilbert
- Département des sciences animales, Faculté des Sciences de l'Agriculture et de l'Alimentation, Université Laval, Quebec City, Canada
| | - William Poisson
- Département des sciences animales, Faculté des Sciences de l'Agriculture et de l'Alimentation, Université Laval, Quebec City, Canada
| | - Vicky Albert
- Ministère des Forêts, de la Faune et des Parcs du Québec, Quebec City, Canada
| | - Joëlle Taillon
- Ministère des Forêts, de la Faune et des Parcs du Québec, Quebec City, Canada
| | - Vincent Bourret
- Ministère des Forêts, de la Faune et des Parcs du Québec, Quebec City, Canada
| | - Steeve D Côté
- Caribou Ungava, département de biologie, Faculté des Sciences et de Génie, Université Laval, Quebec City, Canada
| | - Arnaud Droit
- Département de Médecine Moléculaire, Faculté de Médecine, Université Laval, Quebec City, Canada
| | - Claude Robert
- Département des sciences animales, Faculté des Sciences de l'Agriculture et de l'Alimentation, Université Laval, Quebec City, Canada
| |
Collapse
|
117
|
Xiong X, Kelkar YD, Geden CJ, Zhang C, Wang Y, Jongepier E, Martinson EO, Verhulst EC, Gadau J, Werren JH, Wang X. Long-Read Assembly and Annotation of the Parasitoid Wasp Muscidifurax raptorellus, a Biological Control Agent for Filth Flies. Front Genet 2021; 12:748135. [PMID: 34868218 PMCID: PMC8633841 DOI: 10.3389/fgene.2021.748135] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 10/04/2021] [Indexed: 12/30/2022] Open
Abstract
The parasitoid wasp Muscidifurax raptorellus (Hymenoptera: Pteromalidae) is a gregarious species that has received extensive attention for its potential in biological pest control against house fly, stable fly, and other filth flies. It has a high reproductive capacity and can be reared easily. However, genome assembly is not available for M. raptorellus or any other species in this genus. Previously, we assembled a complete circular mitochondrial genome with a length of 24,717 bp. Here, we assembled and annotated a high-quality nuclear genome of M. raptorellus, using a combination of long-read (104× genome coverage) and short-read (326× genome coverage) sequencing technologies. The assembled genome size is 314 Mbp in 226 contigs, with a 97.9% BUSCO completeness score and a contig N50 of 4.67 Mb, suggesting excellent continuity of this assembly. Our assembly builds the foundation for comparative and evolutionary genomic analysis in the genus of Muscidifurax and possible future biocontrol applications.
Collapse
Affiliation(s)
- Xiao Xiong
- Department of Pathobiology, College of Veterinary Medicine, Auburn University, Auburn, AL, United States.,School of Life Sciences and Technology, Tongji University, Shanghai, China
| | - Yogeshwar D Kelkar
- Department of Biology, University of Rochester, Rochester, NY, United States
| | - Chris J Geden
- Center for Medical, Agricultural and Veterinary Entomology, USDA Agricultural Research Service, Gainesville, FL, United States
| | - Chao Zhang
- Department of Plastic and Reconstructive Surgery, Shanghai Ninth People's Hospital, Shanghai Institute of Precision Medicine, Shanghai JiaoTong University School of Medicine, Shanghai, China
| | - Yidong Wang
- Laboratory of Entomology, Wageningen University, Wageningen, Netherlands
| | - Evelien Jongepier
- Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, Netherlands
| | - Ellen O Martinson
- Department of Biology, University of Rochester, Rochester, NY, United States.,Department of Biology, University of New Mexico, Albuquerque, NM, United States
| | - Eveline C Verhulst
- Laboratory of Entomology, Wageningen University, Wageningen, Netherlands
| | - Jürgen Gadau
- Institute for Evolution & Biodiversity, University of Münster, Münster, Germany
| | - John H Werren
- Department of Biology, University of Rochester, Rochester, NY, United States
| | - Xu Wang
- Department of Pathobiology, College of Veterinary Medicine, Auburn University, Auburn, AL, United States.,Alabama Agricultural Experiment Station, Center for Advanced Science, Innovation and Commerce, Auburn, AL, United States.,HudsonAlpha Institute for Biotechnology, Huntsville, AL, United States
| |
Collapse
|
118
|
Tarabichi M, Demeulemeester J, Verfaillie A, Flanagan AM, Van Loo P, Konopka T. A pan-cancer landscape of somatic mutations in non-unique regions of the human genome. Nat Biotechnol 2021; 39:1589-1596. [PMID: 34282324 PMCID: PMC7612106 DOI: 10.1038/s41587-021-00971-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 06/02/2021] [Indexed: 12/27/2022]
Abstract
A substantial fraction of the human genome displays high sequence similarity with at least one other genomic sequence, posing a challenge for the identification of somatic mutations from short-read sequencing data. Here we annotate genomic variants in 2,658 cancers from the Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort with links to similar sites across the human genome. We train a machine learning model to use signals distributed over multiple genomic sites to call somatic events in non-unique regions and validate the data against linked-read sequencing in an independent dataset. Using this approach, we uncover previously hidden mutations in ~1,700 coding sequences and in thousands of regulatory elements, including in known cancer genes, immunoglobulins and highly mutated gene families. Mutations in non-unique regions are consistent with mutations in unique regions in terms of mutation burden and substitution profiles. The analysis provides a systematic summary of the mutation events in non-unique regions at a genome-wide scale across multiple human cancers.
Collapse
Affiliation(s)
- Maxime Tarabichi
- The Francis Crick Institute, London, UK.
- Institute for Interdisciplinary Research, Université Libre de Bruxelles, Brussels, Belgium.
| | - Jonas Demeulemeester
- The Francis Crick Institute, London, UK
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Adrienne M Flanagan
- Research Department of Pathology, Cancer Institute, University College London, London, UK
- Department of Cellular and Molecular Pathology, Royal National Orthopaedic Hospital NHS Trust, Stanmore, UK
| | | | - Tomasz Konopka
- The Francis Crick Institute, London, UK.
- William Harvey Research Institute, Queen Mary University of London, London, UK.
| |
Collapse
|
119
|
Miller DB, Piccolo SR. trioPhaser: using Mendelian inheritance logic to improve genomic phasing of trios. BMC Bioinformatics 2021; 22:559. [PMID: 34809557 PMCID: PMC8607709 DOI: 10.1186/s12859-021-04470-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 11/08/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND When analyzing DNA sequence data of an individual, knowing which nucleotide was inherited from each parent can be beneficial when trying to identify certain types of DNA variants. Mendelian inheritance logic can be used to accurately phase (haplotype) the majority (67-83%) of an individual's heterozygous nucleotide positions when genotypes are available for both parents (trio). However, when all members of a trio are heterozygous at a position, Mendelian inheritance logic cannot be used to phase. For such positions, a computational phasing algorithm can be used. Existing phasing algorithms use a haplotype reference panel, sequencing reads, and/or parental genotypes to phase an individual; however, they are limited in that they can only phase certain types of variants, require a specific genotype build, require large amounts of storage capacity, and/or require long run times. We created trioPhaser to address these challenges. RESULTS trioPhaser uses gVCF files from an individual and their parents as initial input, and then outputs a phased VCF file. Input trio data are first phased using Mendelian inheritance logic. Then, the positions that cannot be phased using inheritance information alone are phased by the SHAPEIT4 phasing algorithm. Using whole-genome sequencing data of 52 trios, we show that trioPhaser, on average, increases the total number of phased positions by 21.0% and 10.5%, respectively, when compared to the number of positions that SHAPEIT4 or Mendelian inheritance logic can phase when either is used alone. In addition, we show that the accuracy of the phased calls output by trioPhaser are similar to linked-read and read-backed phasing. CONCLUSION trioPhaser is a containerized software tool that uses both Mendelian inheritance logic and SHAPEIT4 to phase trios when gVCF files are available. By implementing both phasing methods, more variant positions are phased compared to what either method is able to phase alone.
Collapse
Affiliation(s)
- Dustin B Miller
- Department of Biology, Brigham Young University, Provo, UT, 84602, USA
| | - Stephen R Piccolo
- Department of Biology, Brigham Young University, Provo, UT, 84602, USA.
| |
Collapse
|
120
|
Wu C, Yin Y, Zhu L, Zhang Y, Li YZ. Metagenomic sequencing-driven multidisciplinary approaches to shed light on the untapped microbial natural products. Drug Discov Today 2021; 27:730-742. [PMID: 34775105 DOI: 10.1016/j.drudis.2021.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 10/07/2021] [Accepted: 11/08/2021] [Indexed: 11/17/2022]
Abstract
The advantage of metagenomics over the culture-based natural product (NP) discovery pipeline is the ability to access the biosynthetic potential of uncultivable microbes. Advances in DNA sequencing are revolutionizing conventional metagenomics approaches for microbial NP discovery. The genomes of (in)cultivable bugs can be resolved straightforwardly from environmental samples, enabling in situ prediction of biosynthetic gene clusters (BGCs). The predicted chemical diversities could be realized not only by heterologous expression of gene clusters originating from DNA synthesis or direct cloning, but also potentially by bioinformatic-directed organic synthesis or chemoenzymatic total synthesis. In this review, we suggest that metagenomic sequencing in tandem with multidisciplinary approaches will form a versatile platform to shed light on a plethora of microbial 'dark matter'.
Collapse
Affiliation(s)
- Changsheng Wu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China.
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Lele Zhu
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Yue-Zhong Li
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China.
| |
Collapse
|
121
|
Yu Y, Chen L, Miao X, Li SC. SpecHap: a diploid phasing algorithm based on spectral graph theory. Nucleic Acids Res 2021; 49:e114. [PMID: 34403470 PMCID: PMC8565328 DOI: 10.1093/nar/gkab709] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 07/25/2021] [Accepted: 08/02/2021] [Indexed: 11/30/2022] Open
Abstract
Haplotype phasing plays an important role in understanding the genetic data of diploid eukaryotic organisms. Different sequencing technologies (such as next-generation sequencing or third-generation sequencing) produce various genetic data that require haplotype assembly. Although multiple diploid haplotype phasing algorithms exist, only a few will work equally well across all sequencing technologies. In this work, we propose SpecHap, a novel haplotype assembly tool that leverages spectral graph theory. On both in silico and whole-genome sequencing datasets, SpecHap consumed less memory and required less CPU time, yet achieved comparable accuracy with state-of-art methods across all the test instances, which comprises sequencing data from next-generation sequencing, linked-reads, high-throughput chromosome conformation capture, PacBio single-molecule real-time, and Oxford Nanopore long-reads. Furthermore, SpecHap successfully phased an individual Ambystoma mexicanum, a species with gigantic diploid genomes, within 6 CPU hours and 945MB peak memory usage, while other tools failed to yield results either due to memory overflow (40GB) or time limit exceeded (5 days). Our results demonstrated that SpecHap is scalable, efficient, and accurate for diploid phasing across many sequencing platforms.
Collapse
Affiliation(s)
- Yonghan Yu
- Computer Science, City University of Hong Kong, Kowloon, Hong Kong 999077, China
| | - Lingxi Chen
- Computer Science, City University of Hong Kong, Kowloon, Hong Kong 999077, China
| | - Xinyao Miao
- Computer Science, City University of Hong Kong, Kowloon, Hong Kong 999077, China
| | - Shuai Cheng Li
- Computer Science, City University of Hong Kong, Kowloon, Hong Kong 999077, China
| |
Collapse
|
122
|
Jia W, Xu C, Li SC. Resolving complex structures at oncovirus integration loci with conjugate graph. Brief Bioinform 2021; 22:6359003. [PMID: 34463709 DOI: 10.1093/bib/bbab359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/10/2021] [Accepted: 08/12/2021] [Indexed: 01/10/2023] Open
Abstract
Oncovirus integrations cause copy number variations and complex structural variations (SVs) on host genomes. However, the understanding of how inserted viral DNA impacts the local genome remains limited. The linear structure of the oncovirus integrated local genomic map (LGM) will lay the foundations to understand how oncovirus integrations emerge and compromise the host genome's functioning. We propose a conjugate graph model to reconstruct the rearranged LGM at integrated loci. Simulation tests prove the reliability and credibility of the algorithm. Applications of the algorithm to whole-genome sequencing data of human papillomavirus (HPV) and hepatitis B virus (HBV)-infected cancer samples gained biological insights on oncovirus integrations. We observed four affection patterns of oncovirus integrations from the HPV and HBV-integrated cancer samples, including the coding-frame truncation, hyper-amplification of tumor gene, the viral cis-regulation inserted at the single intron and at the intergenic region. We found that the focal duplicates and host SVs are frequent in the HPV-integrated LGMs, while the focal deletions are prevalent in HBV-integrated LGMs. Furthermore, with the results yields from our method, we found the enhanced microhomology-mediated end joining might lead to both HPV and HBV integrations and conjectured that the HPV integrations might mainly occur during the DNA replication process. The conjugate graph algorithm code and LGM construction pipeline, available at https://github.com/deepomicslab/FuseSV.
Collapse
Affiliation(s)
- Wenlong Jia
- Department of Computer Science, City University of Hong Kong, Hong Kong
| | - Chang Xu
- Department of Computer Science, City University of Hong Kong, Hong Kong
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Hong Kong
| |
Collapse
|
123
|
Abbasi A, Alexandrov LB. Significance and limitations of the use of next-generation sequencing technologies for detecting mutational signatures. DNA Repair (Amst) 2021; 107:103200. [PMID: 34411908 PMCID: PMC9478565 DOI: 10.1016/j.dnarep.2021.103200] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 07/30/2021] [Accepted: 08/03/2021] [Indexed: 12/13/2022]
Abstract
Next generation sequencing technologies (NGS) have been critical in characterizing the genomic landscape and untangling the genetic heterogeneity of human cancer. Since its advent, NGS has played a pivotal role in identifying the patterns of somatic mutations imprinted on cancer genomes and in deciphering the signatures of the mutational processes that have generated these patterns. Mutational signatures serve as phenotypic molecular footprints of exposures to environmental factors as well as deficiency and infidelity of DNA replication and repair pathways. Since the first roadmap of mutational signatures in human cancer was generated from whole-genome and whole-exome sequencing data, there has been a growing interest to extract mutational signatures from other NGS technologies such as targeted panel sequencing, RNA sequencing, single-cell sequencing, duplex sequencing, reduced representation sequencing, and long-read sequencing. Many of these technologies have their inherent sequencing biases and produce technical artifacts that can confound the extraction of reliable and interpretable mutational signatures. In this review, we highlight the relevance, limitations, and prospects of using different NGS technologies for examining mutational patterns and for deciphering mutational signatures.
Collapse
Affiliation(s)
- Ammal Abbasi
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA; Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA; Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA
| | - Ludmil B Alexandrov
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, 92093, USA; Department of Bioengineering, UC San Diego, La Jolla, CA, 92093, USA; Moores Cancer Center, UC San Diego, La Jolla, CA, 92037, USA.
| |
Collapse
|
124
|
Bodrug-Schepers A, Stralis-Pavese N, Buerstmayr H, Dohm JC, Himmelbauer H. Quinoa genome assembly employing genomic variation for guided scaffolding. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:3577-3594. [PMID: 34365519 PMCID: PMC8519820 DOI: 10.1007/s00122-021-03915-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 07/06/2021] [Indexed: 06/13/2023]
Abstract
We propose to use the natural variation between individuals of a population for genome assembly scaffolding. In today's genome projects, multiple accessions get sequenced, leading to variant catalogs. Using such information to improve genome assemblies is attractive both cost-wise as well as scientifically, because the value of an assembly increases with its contiguity. We conclude that haplotype information is a valuable resource to group and order contigs toward the generation of pseudomolecules. Quinoa (Chenopodium quinoa) has been under cultivation in Latin America for more than 7500 years. Recently, quinoa has gained increasing attention due to its stress resistance and its nutritional value. We generated a novel quinoa genome assembly for the Bolivian accession CHEN125 using PacBio long-read sequencing data (assembly size 1.32 Gbp, initial N50 size 608 kbp). Next, we re-sequenced 50 quinoa accessions from Peru and Bolivia. This set of accessions differed at 4.4 million single-nucleotide variant (SNV) positions compared to CHEN125 (1.4 million SNV positions on average per accession). We show how to exploit variation in accessions that are distantly related to establish a genome-wide ordered set of contigs for guided scaffolding of a reference assembly. The method is based on detecting shared haplotypes and their expected continuity throughout the genome (i.e., the effect of linkage disequilibrium), as an extension of what is expected in mapping populations where only a few haplotypes are present. We test the approach using Arabidopsis thaliana data from different populations. After applying the method on our CHEN125 quinoa assembly we validated the results with mate-pairs, genetic markers, and another quinoa assembly originating from a Chilean cultivar. We show consistency between these information sources and the haplotype-based relations as determined by us and obtain an improved assembly with an N50 size of 1079 kbp and ordered contig groups of up to 39.7 Mbp. We conclude that haplotype information in distantly related individuals of the same species is a valuable resource to group and order contigs according to their adjacency in the genome toward the generation of pseudomolecules.
Collapse
Affiliation(s)
- Alexandrina Bodrug-Schepers
- Institute of Computational Biology, Department of Biotechnology, Universität für Bodenkultur, Vienna, Austria
| | - Nancy Stralis-Pavese
- Institute of Computational Biology, Department of Biotechnology, Universität für Bodenkultur, Vienna, Austria
| | - Hermann Buerstmayr
- Institute of Biotechnology in Plant Production, Department of Agrobiotechnology and Department of Crop Sciences, Universität für Bodenkultur, Tulln, Austria
| | - Juliane C Dohm
- Institute of Computational Biology, Department of Biotechnology, Universität für Bodenkultur, Vienna, Austria.
| | - Heinz Himmelbauer
- Institute of Computational Biology, Department of Biotechnology, Universität für Bodenkultur, Vienna, Austria.
| |
Collapse
|
125
|
Hill BM, Bisht K, Atkins GR, Gomez AA, Rumbaugh KP, Wakeman CA, Brown AMV. Lysis-Hi-C as a method to study polymicrobial communities and eDNA. Mol Ecol Resour 2021; 22:1029-1042. [PMID: 34669257 PMCID: PMC9215119 DOI: 10.1111/1755-0998.13535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 10/06/2021] [Accepted: 10/11/2021] [Indexed: 11/30/2022]
Abstract
Microbes interact in natural communities in a spatially structured manner, particularly in biofilms and polymicrobial infections. While next generation sequencing approaches provide powerful insights into diversity, metabolic capacity, and mutational profiles of these communities, they generally fail to recover in situ spatial proximity between distinct genotypes in the interactome. Hi‐C is a promising method that has assisted in analysing complex microbiomes, by creating chromatin cross‐links in cells, that aid in identifying adjacent DNA, to improve de novo assembly. This study explored a modified Hi‐C approach involving an initial lysis phase prior to DNA cross‐linking, to test whether adjacent cell chromatin can be cross‐linked, anticipating that this could provide a new avenue for study of spatial‐mutational dynamics in structured microbial communities. An artificial polymicrobial mixture of Pseudomonas aeruginosa, Staphylococcus aureus, and Escherichia coli was lysed for 1–18 h, then prepared for Hi‐C. A murine biofilm infection model was treated with sonication, mechanical lysis, or chemical lysis before Hi‐C. Bioinformatic analyses of resulting Hi‐C interspecies chromatin links showed that while microbial species differed from one another, generally lysis significantly increased links between species and increased the distance of Hi‐C links within species, while also increasing novel plasmid‐chromosome links. The success of this modified lysis‐Hi‐C protocol in creating extracellular DNA links is a promising first step toward a new lysis‐Hi‐C based method to recover genotypic microgeography in polymicrobial communities, with potential future applications in diseases with localized resistance, such as cystic fibrosis lung infections and chronic diabetic ulcers.
Collapse
Affiliation(s)
- Bravada M Hill
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Karishma Bisht
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Georgia Rae Atkins
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Amy A Gomez
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Kendra P Rumbaugh
- Department of Surgery, School of Medicine, Texas Tech Health Sciences Center, Lubbock, Texas, USA
| | - Catherine A Wakeman
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| | - Amanda M V Brown
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas, USA
| |
Collapse
|
126
|
Dias GB, Aldossary AM, El-Shafie HAF, Alhoshani FM, Al-Fageeh MB, Bergman CM, Manee MM. Complete mitochondrial genome of the longhorn date palm stem borer Jebusaea hammerschmidtii (Reiche, 1878). Mitochondrial DNA B Resour 2021; 6:3214-3216. [PMID: 34676292 PMCID: PMC8525966 DOI: 10.1080/23802359.2021.1989334] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 08/05/2021] [Indexed: 12/02/2022] Open
Abstract
The 15,619 bp mitochondrial genome of Jebusaea hammerschmidtii was assembled from short reads, annotated, and compared to the genomes of other longhorn beetles (Cerambycidae). Gene content was typical of animal mitochondrial genomes and contained 13 protein-coding, 22 tRNA, and 2 rRNA genes. Gene organization was identical to that of other longhorn beetles. Phylogenetic analysis placed J. hammerschmidtii within the subfamily Cerambycinae, and strongly supported the monophyly of the Cerambycinae, Lamiinae, and Prioninae subfamilies.
Collapse
Affiliation(s)
- Guilherme B Dias
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Ahmad M Aldossary
- National Center for Biotechnology, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | | | - Fahad M Alhoshani
- National Center for Biotechnology, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | - Mohamed B Al-Fageeh
- Life Sciences and Environment Research Institute, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | - Casey M Bergman
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Manee M Manee
- National Center for Bioinformatics, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| |
Collapse
|
127
|
Miller DB, Robison R, Piccolo SR. Toward a methodology for evaluating DNA variants in nuclear families. PLoS One 2021; 16:e0258375. [PMID: 34624066 PMCID: PMC8500447 DOI: 10.1371/journal.pone.0258375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 09/27/2021] [Indexed: 11/22/2022] Open
Abstract
The genetic underpinnings of most pediatric-cancer cases are unknown. Population-based studies use large sample sizes but have accounted for only a small proportion of the estimated heritability of pediatric cancers. Pedigree-based studies are infeasible for most human populations. One alternative is to collect genetic data from a single nuclear family and use inheritance patterns within the family to filter candidate variants. This approach can be applied to common and rare variants, including those that are private to a given family or to an affected individual. We evaluated this approach using genetic data from three nuclear families with 5, 4, and 7 children, respectively. Only one child in each nuclear family had been diagnosed with cancer, and neither parent had been affected. Diagnoses for the affected children were benign low-grade astrocytoma, Wilms tumor (stage 2), and Burkitt's lymphoma, respectively. We used whole-genome sequencing to profile normal cells from each family member and a linked-read technology for genomic phasing. For initial variant filtering, we used global minor allele frequencies, deleteriousness scores, and functional-impact annotations. Next, we used genetic variation in the unaffected siblings as a guide to filter the remaining variants. As a way to evaluate our ability to detect variant(s) that may be relevant to disease status, the corresponding author blinded the primary author to affected status; the primary author then assigned a risk score to each child. Based on this evidence, the primary author predicted which child had been affected in each family. The primary author's prediction was correct for the child who had been diagnosed with a Wilms tumor; the child with Burkitt's lymphoma had the second-highest risk score among the seven children in that family. This study demonstrates a methodology for filtering and evaluating candidate genomic variants and genes within nuclear families that may merit further exploration.
Collapse
Affiliation(s)
- Dustin B. Miller
- Department of Biology, Brigham Young University, Provo, UT, United States of America
| | - Reid Robison
- Department of Biology, Brigham Young University, Provo, UT, United States of America
- Department of Psychiatry, University of Utah, Salt Lake City, UT, United States of America
| | - Stephen R. Piccolo
- Department of Biology, Brigham Young University, Provo, UT, United States of America
| |
Collapse
|
128
|
Arias CF, Dikow RB, McMillan WO, De León LF. De Novo Genome Assembly of the Electric Fish Brachyhypopomus occidentalis (Hypopomidae, Gymnotiformes). Genome Biol Evol 2021; 13:6377337. [PMID: 34581791 PMCID: PMC8536545 DOI: 10.1093/gbe/evab223] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/21/2021] [Indexed: 11/20/2022] Open
Abstract
The bluntnose knifefish Brachyhypopomus occidentalis is a primary freshwater fish from north-western South America and Lower Central America. Like other Gymnotiformes, it has an electric organ that generates electric discharges used for both communication and electrolocation. We assembled a high-quality reference genome sequence of B. occidentalis by combining Oxford Nanopore and 10X Genomics linked-reads technologies. We also describe its demographic history in the context of the rise of the Isthmus of Panama. The size of the assembled genome is 540.3 Mb with an N50 scaffold length of 5.4 Mb, which includes 93.8% complete, 0.7% fragmented, and 5.5% of missing vertebrate/Actinoterigie Benchmarking Universal Single-Copy Orthologs. Repetitive elements account for 11.04% of the genome, and 34,347 protein-coding genes were predicted, of which 23,935 have been functionally annotated. Demographic analysis suggests a rapid effective population expansion between 3 and 5 Myr, corresponding to the final closure of the Isthmus of Panama (2.8–3.5 Myr). This event was followed by a sudden and constant population decline during the last 1 Myr, likely associated with strong shifts in both precipitation and sea level during the Pleistocene glacial-interglacial cycles. The de novo genome assembly of B. occidentalis will provide novel insights into the molecular basis of both electric signal productions and detection and will be fundamental for understanding the processes that have shaped the diversity of Neotropical freshwater environments.
Collapse
Affiliation(s)
- Carlos F Arias
- Department of Biology, University of Massachusetts, Boston, Massachusetts, USA.,Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, District of Columbia, USA.,Smithsonian Tropical Research Institute, Panamá, Panamá
| | - Rebecca B Dikow
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, District of Columbia, USA
| | | | - Luis F De León
- Department of Biology, University of Massachusetts, Boston, Massachusetts, USA.,Smithsonian Tropical Research Institute, Panamá, Panamá.,Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT-AIP), Panamá, Panamá
| |
Collapse
|
129
|
Westfall AK, Telemeco RS, Grizante MB, Waits DS, Clark AD, Simpson DY, Klabacka RL, Sullivan AP, Perry GH, Sears MW, Cox CL, Cox RM, Gifford ME, John-Alder HB, Langkilde T, Angilletta MJ, Leaché AD, Tollis M, Kusumi K, Schwartz TS. A chromosome-level genome assembly for the eastern fence lizard (Sceloporus undulatus), a reptile model for physiological and evolutionary ecology. Gigascience 2021; 10:6380105. [PMID: 34599334 PMCID: PMC8486681 DOI: 10.1093/gigascience/giab066] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2020] [Revised: 04/16/2021] [Accepted: 09/07/2021] [Indexed: 12/15/2022] Open
Abstract
Background High-quality genomic resources facilitate investigations into behavioral ecology, morphological and physiological adaptations, and the evolution of genomic architecture. Lizards in the genus Sceloporus have a long history as important ecological, evolutionary, and physiological models, making them a valuable target for the development of genomic resources. Findings We present a high-quality chromosome-level reference genome assembly, SceUnd1.0 (using 10X Genomics Chromium, HiC, and Pacific Biosciences data), and tissue/developmental stage transcriptomes for the eastern fence lizard, Sceloporus undulatus. We performed synteny analysis with other snake and lizard assemblies to identify broad patterns of chromosome evolution including the fusion of micro- and macrochromosomes. We also used this new assembly to provide improved reference-based genome assemblies for 34 additional Sceloporus species. Finally, we used RNAseq and whole-genome resequencing data to compare 3 assemblies, each representing an increased level of cost and effort: Supernova Assembly with data from 10X Genomics Chromium, HiRise Assembly that added data from HiC, and PBJelly Assembly that added data from Pacific Biosciences sequencing. We found that the Supernova Assembly contained the full genome and was a suitable reference for RNAseq and single-nucleotide polymorphism calling, but the chromosome-level scaffolds provided by the addition of HiC data allowed synteny and whole-genome association mapping analyses. The subsequent addition of PacBio data doubled the contig N50 but provided negligible gains in scaffold length. Conclusions These new genomic resources provide valuable tools for advanced molecular analysis of an organism that has become a model in physiology and evolutionary ecology.
Collapse
Affiliation(s)
- Aundrea K Westfall
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| | - Rory S Telemeco
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA.,Department of Biology, California State University Fresno, Fresno, CA 93740, USA
| | | | - Damien S Waits
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| | - Amanda D Clark
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| | - Dasia Y Simpson
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| | - Randy L Klabacka
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| | - Alexis P Sullivan
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - George H Perry
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA.,Department of Anthropology, Pennsylvania State University, University Park, PA 16802, USA.,Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Michael W Sears
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA
| | - Christian L Cox
- Department of Biology, Georgia Southern University, Statesboro, GA 30460, USA.,Department of Biological Sciences, Florida International University, Miami, FL 33199, USA
| | - Robert M Cox
- Department of Biology, University of Virginia, Charlottesville, VA 22904, USA
| | - Matthew E Gifford
- Department of Biology, University of Central Arkansas, Conway, AR 72035, USA
| | - Henry B John-Alder
- Department of Ecology, Evolution, and Natural Resources, Rutgers University, New Brunswick, NJ 08901, USA
| | - Tracy Langkilde
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | | | - Adam D Leaché
- Department of Biology, University of Washington, Seattle, WA 98195, USA.,Burke Museum of Natural History and Culture, University of Washington, Seattle, WA 98195, USA
| | - Marc Tollis
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA.,School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Kenro Kusumi
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Tonia S Schwartz
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
130
|
Morisse P, Lemaitre C, Legeai F. LRez: a C++ API and toolkit for analyzing and managing Linked-Reads data. BIOINFORMATICS ADVANCES 2021; 1:vbab022. [PMID: 36700107 PMCID: PMC9710615 DOI: 10.1093/bioadv/vbab022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/09/2021] [Accepted: 09/20/2021] [Indexed: 01/28/2023]
Abstract
Motivation Linked-Reads technologies combine both the high quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. Results We introduce LRez, a C++ API and toolkit that allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. Availability and implementation LRez is implemented in C++, supported on Unix-based platforms and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Pierre Morisse
- Univ Rennes, Inria, CNRS, IRISA, Rennes 35000, France,To whom correspondence should be addressed.
| | | | - Fabrice Legeai
- Univ Rennes, Inria, CNRS, IRISA, Rennes 35000, France,IGEPP, INRAE, Institut Agro, Univ Rennes, Rennes 35000, France
| |
Collapse
|
131
|
Freire R, Weisweiler M, Guerreiro R, Baig N, Hüttel B, Obeng-Hinneh E, Renner J, Hartje S, Muders K, Truberg B, Rosen A, Prigge V, Bruckmüller J, Lübeck J, Stich B. Chromosome-scale reference genome assembly of a diploid potato clone derived from an elite variety. G3-GENES GENOMES GENETICS 2021; 11:6371871. [PMID: 34534288 PMCID: PMC8664475 DOI: 10.1093/g3journal/jkab330] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 09/08/2021] [Indexed: 01/27/2023]
Abstract
Potato (Solanum tuberosum L.) is one of the most important crops with a worldwide production of 370 million metric tons. The objectives of this study were (1) to create a high-quality consensus sequence across the two haplotypes of a diploid clone derived from a tetraploid elite variety and assess the sequence divergence from the available potato genome assemblies, as well as among the two haplotypes; (2) to evaluate the new assembly’s usefulness for various genomic methods; and (3) to assess the performance of phasing in diploid and tetraploid clones, using linked-read sequencing technology. We used PacBio long reads coupled with 10x Genomics reads and proximity ligation scaffolding to create the dAg1_v1.0 reference genome sequence. With a final assembly size of 812 Mb, where 750 Mb are anchored to 12 chromosomes, our assembly is larger than other available potato reference sequences and high proportions of properly paired reads were observed for clones unrelated by pedigree to dAg1. Comparisons of the new dAg1_v1.0 sequence to other potato genome sequences point out the high divergence between the different potato varieties and illustrate the potential of using dAg1_v1.0 sequence in breeding applications.
Collapse
Affiliation(s)
- Ruth Freire
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Marius Weisweiler
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Ricardo Guerreiro
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Nadia Baig
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Bruno Hüttel
- Max Planck-Genome-centre Cologne, Max Planck Institute for Plant Breeding, Carl-von-Linne-Weg 10, 50829 Köln, Germany
| | - Evelyn Obeng-Hinneh
- Böhm-Nordkartoffel Agrarproduktion GmbH & Co. OHG, Strehlow 19, 17111 Hohenmocker, Germany
| | - Juliane Renner
- Böhm-Nordkartoffel Agrarproduktion GmbH & Co. OHG, Strehlow 19, 17111 Hohenmocker, Germany
| | - Stefanie Hartje
- Böhm-Nordkartoffel Agrarproduktion GmbH & Co. OHG, Strehlow 19, 17111 Hohenmocker, Germany
| | - Katja Muders
- Nordring- Kartoffelzucht- und Vermehrungs- GmbH, Parkweg 4, 18190 Sanitz, Germany
| | - Bernd Truberg
- Nordring- Kartoffelzucht- und Vermehrungs- GmbH, Parkweg 4, 18190 Sanitz, Germany
| | - Arne Rosen
- Nordring- Kartoffelzucht- und Vermehrungs- GmbH, Parkweg 4, 18190 Sanitz, Germany
| | - Vanessa Prigge
- SaKa Pflanzenzucht GmbH & Co. KG, Zuchtstation Windeby, Eichenallee 9, 24340 Windeby, Germany
| | | | - Jens Lübeck
- Solana Research GmbH, Eichenallee 9, 24340 Windeby, Germany
| | - Benjamin Stich
- Institute for Quantitative Genetics and Genomics of Plants, Universitätsstraße 1, 40225 Düsseldorf, Germany.,Cluster of Excellence on Plant Sciences, From Complex Traits towards Synthetic Modules, Universitätsstraße 1, 40225 Düsseldorf, Germany
| |
Collapse
|
132
|
Sène MA, Kiesslich S, Djambazian H, Ragoussis J, Xia Y, Kamen AA. Haplotype-resolved de novo assembly of the Vero cell line genome. NPJ Vaccines 2021; 6:106. [PMID: 34417462 PMCID: PMC8379168 DOI: 10.1038/s41541-021-00358-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 07/12/2021] [Indexed: 01/13/2023] Open
Abstract
The Vero cell line is the most used continuous cell line for viral vaccine manufacturing with more than 40 years of accumulated experience in the vaccine industry. Additionally, the Vero cell line has shown a high affinity for infection by MERS-CoV, SARS-CoV, and recently SARS-CoV-2, emerging as an important discovery and screening tool to support the global research and development efforts in this COVID-19 pandemic. However, the lack of a reference genome for the Vero cell line has limited our understanding of host–virus interactions underlying such affinity of the Vero cell towards key emerging pathogens, and more importantly our ability to redesign high-yield vaccine production processes using Vero genome editing. In this paper, we present an annotated highly contiguous 2.9 Gb assembly of the Vero cell genome. In addition, several viral genome insertions, including Adeno-associated virus serotypes 3, 4, 7, and 8, have been identified, giving valuable insights into quality control considerations for cell-based vaccine production systems. Variant calling revealed that, in addition to interferon, chemokines, and caspases-related genes lost their functions. Surprisingly, the ACE2 gene, which was previously identified as the host cell entry receptor for SARS-CoV and SARS-CoV-2, also lost function in the Vero genome due to structural variations.
Collapse
Affiliation(s)
| | - Sascha Kiesslich
- Department of Bioengineering, McGill University, Montreal, QC, Canada
| | | | | | - Yu Xia
- Department of Bioengineering, McGill University, Montreal, QC, Canada
| | - Amine A Kamen
- Department of Bioengineering, McGill University, Montreal, QC, Canada.
| |
Collapse
|
133
|
Gao X, Mo W, Shi J, Song N, Liang P, Chen J, Shi Y, Guo W, Li X, Yang X, Xin B, Zhao H, Song W, Lai J. HITAC-seq enables high-throughput cost-effective sequencing of plasmids and DNA fragments with identity. J Genet Genomics 2021; 48:671-680. [PMID: 34417123 DOI: 10.1016/j.jgg.2021.05.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 05/03/2021] [Accepted: 05/13/2021] [Indexed: 01/13/2023]
Abstract
DNA sequencing is vital for many aspects of biological research and diagnostics. Despite the development of second and third generation sequencing technologies, Sanger sequencing has long been the only choice when required to precisely track each sequenced plasmids or DNA fragments. Here, we report a complete set of novel barcoding and assembling system, Highly-parallel Indexed Tagmentation-reads Assembled Consensus sequencing (HITAC-seq), that could massively sequence and track the identities of each individual sequencing sample. With the cost of much less than that of single read of Sanger sequencing, HITAC-seq can generate high-quality contiguous sequences of up to 10 kilobases or longer. The capability of HITAC-seq was confirmed through large-scale sequencing of thousands of plasmid clones and hundreds of amplicon fragments using approximately 100 pg of input DNAs. Due to its long synthetic length, HITAC-seq was effective in detecting relatively large structural variations, as demonstrated by the identification of a ∼1.3 kb Copia retrotransposon insertion in the upstream of a likely maize domestication gene. Besides being a practical alternative to traditional Sanger sequencing, HITAC-seq is suitable for many high-throughput sequencing and genotyping applications.
Collapse
Affiliation(s)
- Xiang Gao
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Weipeng Mo
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Junpeng Shi
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Ning Song
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Pei Liang
- Department of Microbiology and Immunology, College of Biological Sciences, China Agricultural University, Beijing 100193, PR China
| | - Jian Chen
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Yiting Shi
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, PR China
| | - Weilong Guo
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, PR China
| | - Xinchen Li
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Xiaohong Yang
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, PR China
| | - Beibei Xin
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Haiming Zhao
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Weibin Song
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China
| | - Jinsheng Lai
- State Key Laboratory of Plant Physiology and Biochemistry and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, PR China; Center for Crop Functional Genomics and Molecular Breeding, China Agricultural University, Beijing 100193, PR China.
| |
Collapse
|
134
|
Kwak SH, Powe CE, Jang SS, Callahan MJ, Bernstein SN, Lee SM, Kang S, Park KS, Jang HC, Florez JC, Kim JI, Chae JH. Sequencing Cell-free Fetal DNA in Pregnant Women With GCK-MODY. J Clin Endocrinol Metab 2021; 106:2678-2689. [PMID: 34406393 PMCID: PMC8660061 DOI: 10.1210/clinem/dgab265] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Indexed: 11/19/2022]
Abstract
CONTEXT Individuals with monogenic diabetes due to inactivating glucokinase (GCK) variants typically do not require treatment, except potentially during pregnancy. In pregnancy, fetal GCK genotype determines whether treatment is indicated, but noninvasive methods are not clinically available. OBJECTIVE This work aims to develop a method to determine fetal GCK genotype noninvasively using maternal cell-free fetal DNA. METHODS This was a proof-of-concept study involving 3 pregnant women with a causal GCK variant that used information from 1) massive parallel sequencing of maternal plasma cell-free DNA, 2) direct haplotype sequences of maternal genomic DNA, and 3) the paternal genotypes to estimate relative haplotype dosage of the pathogenic variant-linked haplotype. Statistical testing of variant inheritance was performed using a sequential probability ratio test (SPRT). RESULTS In each of the 3 cases, plasma cell-free DNA was extracted once between gestational weeks 24 and 36. The fetal fraction of cell-free DNA ranged from 21.8% to 23.0%. Paternal homozygous alleles that were identical to the maternal GCK variant-linked allele were not overrepresented in the cell-free DNA. Paternal homozygous alleles that were identical to the maternal wild-type-linked allele were significantly overrepresented. Based on the SPRT, we predicted that all 3 cases did not inherit the GCK variant. Postnatal infant genotyping confirmed our prediction in each case. CONCLUSION We have successfully implemented a noninvasive method to predict fetal GCK genotype using cell-free DNA in 3 pregnant women carrying an inactivating GCK variant. This method could guide tailoring of hyperglycemia treatment in pregnancies of women with GCK monogenic diabetes.
Collapse
Affiliation(s)
- Soo Heon Kwak
- Department of Internal Medicine, Seoul National University Hospital, Seoul 03080, Korea
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Camille E Powe
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, MA 02114-2696, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Se Song Jang
- Department of Pediatrics, Seoul National University Children’s Hospital, Seoul 03080, Korea
| | - Michael J Callahan
- Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, MA 02114-2696, USA
| | - Sarah N Bernstein
- Harvard Medical School, Boston, MA 02115, USA
- Department of Obstetrics and Gynecology, Division of Maternal Fetal Medicine, Massachusetts General Hospital, Boston, MA 02114-2696, USA
| | - Seung Mi Lee
- Department of Obstetrics and Gynecology, Seoul National University Hospital, Seoul 03080, Korea
| | - Sunyoung Kang
- Department of Internal Medicine, Seoul National University Hospital, Seoul 03080, Korea
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul 03080, Korea
| | - Kyong Soo Park
- Department of Internal Medicine, Seoul National University Hospital, Seoul 03080, Korea
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul 03080, Korea
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul 03080, Korea
| | - Hak C Jang
- Department of Internal Medicine, Seoul National University College of Medicine, Seoul 03080, Korea
- Department of Internal Medicine, Seoul National University Bundang Hospital, Seongnam 13620, Korea
| | - Jose C Florez
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, MA 02114-2696, USA
- Harvard Medical School, Boston, MA 02115, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114-2696, USA
| | - Jong-Il Kim
- Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul 03080, Korea
- Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul 03080, Korea
| | - Jong Hee Chae
- Department of Pediatrics, Seoul National University Children’s Hospital, Seoul 03080, Korea
- Department of Genomic Medicine, Seoul National University Hospital, Seoul 03080, Korea
| |
Collapse
|
135
|
Hiltunen M, Ryberg M, Johannesson H. ARBitR: an overlap-aware genome assembly scaffolder for linked reads. Bioinformatics 2021; 37:2203-2205. [PMID: 33216122 PMCID: PMC8352505 DOI: 10.1093/bioinformatics/btaa975] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 10/22/2020] [Accepted: 11/10/2020] [Indexed: 12/02/2022] Open
Abstract
Summary Linked genomic sequencing reads contain information that can be used to join sequences together into scaffolds in draft genome assemblies. Existing software for this purpose performs the scaffolding by joining sequences with a gap between them, not considering potential overlaps of contigs. We developed ARBitR to create scaffolds where overlaps are taken into account and show that it can accurately recreate regions where draft assemblies are broken. Availability and implementation ARBitR is written and implemented in Python3 for Unix-based operative systems. All source code is available at https://github.com/markhilt/ARBitR under the GNU General Public License v3. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Markus Hiltunen
- Department of Organismal Biology, Uppsala University, 75236 Uppsala, Sweden
| | - Martin Ryberg
- Department of Organismal Biology, Uppsala University, 75236 Uppsala, Sweden
| | - Hanna Johannesson
- Department of Organismal Biology, Uppsala University, 75236 Uppsala, Sweden
| |
Collapse
|
136
|
Tedersoo L, Albertsen M, Anslan S, Callahan B. Perspectives and Benefits of High-Throughput Long-Read Sequencing in Microbial Ecology. Appl Environ Microbiol 2021; 87:e0062621. [PMID: 34132589 PMCID: PMC8357291 DOI: 10.1128/aem.00626-21] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Short-read, high-throughput sequencing (HTS) methods have yielded numerous important insights into microbial ecology and function. Yet, in many instances short-read HTS techniques are suboptimal, for example, by providing insufficient phylogenetic resolution or low integrity of assembled genomes. Single-molecule and synthetic long-read (SLR) HTS methods have successfully ameliorated these limitations. In addition, nanopore sequencing has generated a number of unique analysis opportunities, such as rapid molecular diagnostics and direct RNA sequencing, and both Pacific Biosciences (PacBio) and nanopore sequencing support detection of epigenetic modifications. Although initially suffering from relatively low sequence quality, recent advances have greatly improved the accuracy of long-read sequencing technologies. In spite of great technological progress in recent years, the long-read HTS methods (PacBio and nanopore sequencing) are still relatively costly, require large amounts of high-quality starting material, and commonly need specific solutions in various analysis steps. Despite these challenges, long-read sequencing technologies offer high-quality, cutting-edge alternatives for testing hypotheses about microbiome structure and functioning as well as assembly of eukaryote genomes from complex environmental DNA samples.
Collapse
Affiliation(s)
- Leho Tedersoo
- Mycology and Microbiology Center, University of Tartu, Tartu, Estonia
| | - Mads Albertsen
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | - Sten Anslan
- Mycology and Microbiology Center, University of Tartu, Tartu, Estonia
- Braunschweig University of Technology, Zoological Institute, Braunschweig, Germany
| | - Benjamin Callahan
- Department of Population Health and Pathobiology, College of Veterinary Medicine and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| |
Collapse
|
137
|
Sakamoto Y, Zaha S, Suzuki Y, Seki M, Suzuki A. Application of long-read sequencing to the detection of structural variants in human cancer genomes. Comput Struct Biotechnol J 2021; 19:4207-4216. [PMID: 34527193 PMCID: PMC8350331 DOI: 10.1016/j.csbj.2021.07.030] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 07/20/2021] [Accepted: 07/25/2021] [Indexed: 01/02/2023] Open
Abstract
In recent years, the so-called long-read sequencing technology has had a substantial impact on various aspects of genome sciences. Here, we introduce recent studies of cancerous structural variants (SVs) using long-read sequencing technologies, namely Pacific Biosciences (PacBio) sequencers, Oxford Nanopore Technologies (ONT) sequencers, and linked-read methods. By taking advantage of long-read lengths, these technologies have enabled the precise detection of SVs, including long insertions by transposable elements, such as LINE-1. In addition to SV detection, the epigenome status (including DNA methylation and haplotype information) surrounding SV loci has also been unveiled by long-read sequencing technologies, to identify the effects of SVs. Among the various research fields in which long-read sequencing has been applied, cancer genomics has shown the most remarkable advances. In fact, many studies are beginning to shed light on the detection of SVs and the elucidation of their complex structures in various types of cancer. In the particular case of cancers, we summarize the technical limitations of the application of this technology to the analysis of clinical samples. We will introduce recent achievements from this viewpoint. However, a similar approach will be started for other applications in the near future. Therefore, by complementing the current short-read sequencing analysis, long-read sequencing should reveal the complex nature of human genomes in their healthy and disease states, which will open a new opportunity for a better understanding of disease development and for a novel strategy for drug development.
Collapse
Affiliation(s)
- Yoshitaka Sakamoto
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Suzuko Zaha
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| | - Ayako Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
| |
Collapse
|
138
|
Musunuri R, Arora K, Corvelo A, Shah M, Shelton J, Zody MC, Narzisi G. Somatic variant analysis of linked-reads sequencing data with Lancet. Bioinformatics 2021; 37:1918-1919. [PMID: 33241313 DOI: 10.1093/bioinformatics/btaa888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 09/03/2020] [Accepted: 10/02/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY We present a new version of the popular somatic variant caller, Lancet, that supports the analysis of linked-reads sequencing data. By seamlessly integrating barcodes and haplotype read assignments within the colored De Bruijn graph local-assembly framework, Lancet computes a barcode-aware coverage and identifies variants that disagree with the local haplotype structure. AVAILABILITY AND IMPLEMENTATION Lancet is implemented in C++ and available for academic and non-commercial research purposes as an open-source package at https://github.com/nygenome/lancet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rajeeva Musunuri
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Kanika Arora
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - André Corvelo
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Minita Shah
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Jennifer Shelton
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Michael C Zody
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| | - Giuseppe Narzisi
- Computational Biology Lab, New York Genome Center, New York, NY 10013, USA
| |
Collapse
|
139
|
Xu Z, Dixon JR. Genome reconstruction and haplotype phasing using chromosome conformation capture methodologies. Brief Funct Genomics 2021; 19:139-150. [PMID: 31875884 DOI: 10.1093/bfgp/elz026] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 09/06/2019] [Accepted: 09/15/2019] [Indexed: 12/22/2022] Open
Abstract
Genomic analysis of individuals or organisms is predicated on the availability of high-quality reference and genotype information. With the rapidly dropping costs of high-throughput DNA sequencing, this is becoming readily available for diverse organisms and for increasingly large populations of individuals. Despite these advances, there are still aspects of genome sequencing that remain challenging for existing sequencing methods. This includes the generation of long-range contiguity during genome assembly, identification of structural variants in both germline and somatic tissues, the phasing of haplotypes in diploid organisms and the resolution of genome sequence for organisms derived from complex samples. These types of information are valuable for understanding the role of genome sequence and genetic variation on genome function, and numerous approaches have been developed to address them. Recently, chromosome conformation capture (3C) experiments, such as the Hi-C assay, have emerged as powerful tools to aid in these challenges for genome reconstruction. We will review the current use of Hi-C as a tool for aiding in genome sequencing, addressing the applications, strengths, limitations and potential future directions for the use of 3C data in genome analysis. We argue that unique features of Hi-C experiments make this data type a powerful tool to address challenges in genome sequencing, and that future integration of Hi-C data with alternative sequencing assays will facilitate the continuing revolution in genomic analysis and genome sequencing.
Collapse
|
140
|
Tan KT, Kim H, Carrot-Zhang J, Zhang Y, Kim WJ, Kugener G, Wala JA, Howard TP, Chi YY, Beroukhim R, Li H, Ha G, Alper SL, Perlman EJ, Mullen EA, Hahn WC, Meyerson M, Hong AL. Haplotype-resolved germline and somatic alterations in renal medullary carcinomas. Genome Med 2021; 13:114. [PMID: 34261517 PMCID: PMC8281718 DOI: 10.1186/s13073-021-00929-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 06/25/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Renal medullary carcinomas (RMCs) are rare kidney cancers that occur in adolescents and young adults of African ancestry. Although RMC is associated with the sickle cell trait and somatic loss of the tumor suppressor, SMARCB1, the ancestral origins of RMC remain unknown. Further, characterization of structural variants (SVs) involving SMARCB1 in RMC remains limited. METHODS We used linked-read genome sequencing to reconstruct germline and somatic haplotypes in 15 unrelated patients with RMC registered on the Children's Oncology Group (COG) AREN03B2 study between 2006 and 2017 or from our prior study. We performed fine-mapping of the HBB locus and assessed the germline for cancer predisposition genes. Subsequently, we assessed the tumor samples for mutations outside of SMARCB1 and integrated RNA sequencing to interrogate the structural variants at the SMARCB1 locus. RESULTS We find that the haplotype of the sickle cell mutation in patients with RMC originated from three geographical regions in Africa. In addition, fine-mapping of the HBB locus identified the sickle cell mutation as the sole candidate variant. We further identify that the SMARCB1 structural variants are characterized by blunt or 1-bp homology events. CONCLUSIONS Our findings suggest that RMC does not arise from a single founder population and that the HbS allele is a strong candidate germline allele which confers risk for RMC. Furthermore, we find that the SVs that disrupt SMARCB1 function are likely repaired by non-homologous end-joining. These findings highlight how haplotype-based analyses using linked-read genome sequencing can be applied to identify potential risk variants in small and rare disease cohorts and provide nucleotide resolution to structural variants.
Collapse
Affiliation(s)
- Kar-Tong Tan
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Hyunji Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jian Carrot-Zhang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yuxiang Zhang
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Won Jun Kim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jeremiah A Wala
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Thomas P Howard
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yueh-Yun Chi
- Department of Pediatrics, University of Southern California, Los Angeles, CA, USA
| | - Rameen Beroukhim
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Heng Li
- Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Gavin Ha
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Seth L Alper
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | | | - Elizabeth A Mullen
- Department of Hematology and Oncology, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - William C Hahn
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Matthew Meyerson
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
| | - Andrew L Hong
- Department of Pediatrics, Emory University, Atlanta, GA, USA.
- Aflac Center for Cancer and Blood Disorders, Children's Healthcare of Atlanta, Atlanta, GA, USA.
| |
Collapse
|
141
|
Comparative Genomics of Clinical Isolates of the Emerging Tick-Borne Pathogen Neoehrlichia mikurensis. Microorganisms 2021; 9:microorganisms9071488. [PMID: 34361922 PMCID: PMC8303192 DOI: 10.3390/microorganisms9071488] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 07/07/2021] [Accepted: 07/08/2021] [Indexed: 11/17/2022] Open
Abstract
Tick-borne ‘Neoehrlichia (N.) mikurensis’ is the cause of neoehrlichiosis, an infectious vasculitis of humans. This strict intracellular pathogen is a member of the family Anaplasmataceae and has been unculturable until recently. The only available genetic data on this new pathogen are six partially sequenced housekeeping genes. The aim of this study was to advance the knowledge regarding ‘N. mikurensis’ genomic relatedness with other Anaplasmataceae members, intra-species genotypic variability and potential virulence factors explaining its tropism for vascular endothelium. Here, we present the de novo whole-genome sequences of three ‘N. mikurensis’ strains derived from Swedish patients diagnosed with neoehrlichiosis. The genomes were obtained by extraction of DNA from patient plasma, library preparation using 10× Chromium technology, and sequencing by Illumina Hiseq-4500. ‘N. mikurensis’ was found to have the next smallest genome of the Anaplasmataceae family (1.1 Mbp with 27% GC contents) consisting of 845 protein-coding genes, every third of which with unknown function. Comparative genomic analyses revealed that ‘N. mikurensis’ was more closely related to Ehrlichia chaffeensis than to Ehrlichia ruminantium, the opposite of what 16SrRNA sequence-based phylogenetic analyses determined. The genetic variability of the three whole-genome-sequenced ‘N. mikurensis’ strains was extremely low, between 0.14 and 0.22‰, a variation that was associated with geographic origin. No protein-coding genes exclusively shared by N. mikurensis and E. ruminantium were identified to explain their common tropism for vascular endothelium.
Collapse
|
142
|
Lin B, Hui J, Mao H. Nanopore Technology and Its Applications in Gene Sequencing. BIOSENSORS-BASEL 2021; 11:bios11070214. [PMID: 34208844 PMCID: PMC8301755 DOI: 10.3390/bios11070214] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 06/22/2021] [Accepted: 06/25/2021] [Indexed: 12/14/2022]
Abstract
In recent years, nanopore technology has become increasingly important in the field of life science and biomedical research. By embedding a nano-scale hole in a thin membrane and measuring the electrochemical signal, nanopore technology can be used to investigate the nucleic acids and other biomacromolecules. One of the most successful applications of nanopore technology, the Oxford Nanopore Technology, marks the beginning of the fourth generation of gene sequencing technology. In this review, the operational principle and the technology for signal processing of the nanopore gene sequencing are documented. Moreover, this review focuses on the applications using nanopore gene sequencing technology, including the diagnosis of cancer, detection of viruses and other microbes, and the assembly of genomes. These applications show that nanopore technology is promising in the field of biological and biomedical sensing.
Collapse
Affiliation(s)
- Bo Lin
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China; (B.L.); (J.H.)
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jianan Hui
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China; (B.L.); (J.H.)
| | - Hongju Mao
- State Key Laboratory of Transducer Technology, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China; (B.L.); (J.H.)
- Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
- Correspondence: ; Tel.: +86-21-62511070-8707
| |
Collapse
|
143
|
Meier JI, Salazar PA, Kučka M, Davies RW, Dréau A, Aldás I, Box Power O, Nadeau NJ, Bridle JR, Rolian C, Barton NH, McMillan WO, Jiggins CD, Chan YF. Haplotype tagging reveals parallel formation of hybrid races in two butterfly species. Proc Natl Acad Sci U S A 2021; 118:e2015005118. [PMID: 34155138 PMCID: PMC8237668 DOI: 10.1073/pnas.2015005118] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Genetic variation segregates as linked sets of variants or haplotypes. Haplotypes and linkage are central to genetics and underpin virtually all genetic and selection analysis. Yet, genomic data often omit haplotype information due to constraints in sequencing technologies. Here, we present "haplotagging," a simple, low-cost linked-read sequencing technique that allows sequencing of hundreds of individuals while retaining linkage information. We apply haplotagging to construct megabase-size haplotypes for over 600 individual butterflies (Heliconius erato and H. melpomene), which form overlapping hybrid zones across an elevational gradient in Ecuador. Haplotagging identifies loci controlling distinctive high- and lowland wing color patterns. Divergent haplotypes are found at the same major loci in both species, while chromosome rearrangements show no parallelism. Remarkably, in both species, the geographic clines for the major wing-pattern loci are displaced by 18 km, leading to the rise of a novel hybrid morph in the center of the hybrid zone. We propose that shared warning signaling (Müllerian mimicry) may couple the cline shifts seen in both species and facilitate the parallel coemergence of a novel hybrid morph in both comimetic species. Our results show the power of efficient haplotyping methods when combined with large-scale sequencing data from natural populations.
Collapse
Affiliation(s)
- Joana I Meier
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom
- St. John's College, University of Cambridge, Cambridge CB2 1TP, United Kingdom
| | - Patricio A Salazar
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, United Kingdom
| | - Marek Kučka
- Friedrich Miescher Laboratory of the Max Planck Society, 72076 Tübingen, Germany
| | | | - Andreea Dréau
- Friedrich Miescher Laboratory of the Max Planck Society, 72076 Tübingen, Germany
| | | | - Olivia Box Power
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom
| | - Nicola J Nadeau
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield S10 2TN, United Kingdom
| | - Jon R Bridle
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom
| | - Campbell Rolian
- Department of Comparative Biology and Experimental Medicine, Faculty of Veterinary Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Nicholas H Barton
- Institute of Science and Technology Austria, 3400 Klosterneuburg, Austria
| | - W Owen McMillan
- Smithsonian Tropical Research Institute, Panamá, Apartado Postal 0843-00153, República de Panamá
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom;
- Smithsonian Tropical Research Institute, Panamá, Apartado Postal 0843-00153, República de Panamá
| | - Yingguang Frank Chan
- Friedrich Miescher Laboratory of the Max Planck Society, 72076 Tübingen, Germany;
| |
Collapse
|
144
|
Liu YH, Grubbs GL, Zhang L, Fang X, Dill DL, Sidow A, Zhou X. Aquila_stLFR: diploid genome assembly based structural variant calling package for stLFR linked-reads. BIOINFORMATICS ADVANCES 2021; 1:vbab007. [PMID: 36700103 PMCID: PMC9710574 DOI: 10.1093/bioadv/vbab007] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/07/2021] [Accepted: 06/14/2021] [Indexed: 01/28/2023]
Abstract
Motivation Identifying structural variants (SVs) is critical in health and disease, however, detecting them remains a challenge. Several linked-read sequencing technologies, including 10X Genomics, TELL-Seq and single tube long fragment read (stLFR), have been recently developed as cost-effective approaches to reconstruct multi-megabase haplotypes (phase blocks) from sequence data of a single sample. These technologies provide an optimal sequencing platform to characterize SVs, though few computational algorithms can utilize them. Thus, we developed Aquila_stLFR, an approach that resolves SVs through haplotype-based assembly of stLFR linked-reads. Results Aquila_stLFR first partitions long fragment reads into two haplotype-specific blocks with the assistance of the high-quality reference genome, by taking advantage of the potential phasing ability of the linked-read itself. Each haplotype is then assembled independently, to achieve a complete diploid assembly to finally reconstruct the genome-wide SVs. We benchmarked Aquila_stLFR on a well-studied sample, NA24385, and showed Aquila_stLFR can detect medium to large size deletions (50 bp-10 kb) with high sensitivity and medium-size insertions (50 bp-1 kb) with high specificity. Availability and implementation Source code and documentation are available on https://github.com/maiziex/Aquila_stLFR. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Yichen Henry Liu
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, USA
| | - Griffin L Grubbs
- Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| | | | - David L Dill
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Arend Sidow
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Xin Zhou
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, USA,Department of Biomedical Engineering, Vanderbilt University, Nashville, TN 37235, USA,To whom correspondence should be addressed.
| |
Collapse
|
145
|
Yang Y, Huang L, Xu C, Qi L, Wu Z, Li J, Chen H, Wu Y, Fu T, Zhu H, Saand MA, Li J, Liu L, Fan H, Zhou H, Qin W. Chromosome-scale genome assembly of areca palm (Areca catechu). Mol Ecol Resour 2021; 21:2504-2519. [PMID: 34133844 DOI: 10.1111/1755-0998.13446] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 06/08/2021] [Accepted: 06/11/2021] [Indexed: 11/28/2022]
Abstract
Areca palm (Areca catechu L.; family Arecaceae) is an important tropical medicinal crop and is also used for masticatory and religious purposes in Asia. Improvements to areca properties made by traditional breeding tools have been very slow, and further advances in its cultivation and practical use require genomic information, which is still unavailable. Here, we present a chromosome-scale reference genome assembly for areca by combining Illumina and PacBio data with Hi-C mapping technologies, covering the predicted A. catechu genome length (2.59 Gb, variety "Reyan#1") to an estimated 240× read depth. The assembly was 2.51 Gb in length with a scaffold N50 of 1.7Mb. The scaffolds were then further assembled into 16 pseudochromosomes, with an N50 of 172 Mb. Transposable elements comprised 80.37% of the areca genome, and 68.68% of them were long-terminal repeat retrotransposon elements. The areca palm genome was predicted to harbour 31,571 protein-coding genes and overall, 92.92% of genes were functionally annotated, including enriched and expanded families of genes responsible for biosynthesis of flavonoid, anthocyanin, monoterpenoid and their derivatives. Comparative analyses indicated that A. catechu probably diverged from its close relatives Elaeis guineensis and Cocos nucifera approximately 50.3 million years ago (Ma). Two whole genome duplication events in areca palm were found to be shared by palms and monocots, respectively. This genome assembly and associated resources represents an important addition to the palm genomics community and will be a valuable resource that will facilitate areca palm breeding and improve our understanding of areca palm biology and evolution.
Collapse
Affiliation(s)
- Yaodong Yang
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Liyun Huang
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Chunyan Xu
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
| | - Lan Qi
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | | | - Jia Li
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | | | - Yi Wu
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Tao Fu
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
| | - Hui Zhu
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Mumtaz Ali Saand
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Jing Li
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Liyun Liu
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Haikou Fan
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Huanqi Zhou
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| | - Weiquan Qin
- Hainan Key Laboratory of Tropical Oil Crops Biology/Coconut Research Institute, Chinese Academy of Tropical Agricultural Sciences, Wenchang, China
| |
Collapse
|
146
|
Callahan BJ, Grinevich D, Thakur S, Balamotis MA, Yehezkel TB. Ultra-accurate microbial amplicon sequencing with synthetic long reads. MICROBIOME 2021; 9:130. [PMID: 34090540 PMCID: PMC8179091 DOI: 10.1186/s40168-021-01072-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 04/06/2021] [Indexed: 05/08/2023]
Abstract
BACKGROUND Out of the many pathogenic bacterial species that are known, only a fraction are readily identifiable directly from a complex microbial community using standard next generation DNA sequencing. Long-read sequencing offers the potential to identify a wider range of species and to differentiate between strains within a species, but attaining sufficient accuracy in complex metagenomes remains a challenge. METHODS Here, we describe and analytically validate LoopSeq, a commercially available synthetic long-read (SLR) sequencing technology that generates highly accurate long reads from standard short reads. RESULTS LoopSeq reads are sufficiently long and accurate to identify microbial genes and species directly from complex samples. LoopSeq perfectly recovered the full diversity of 16S rRNA genes from known strains in a synthetic microbial community. Full-length LoopSeq reads had a per-base error rate of 0.005%, which exceeds the accuracy reported for other long-read sequencing technologies. 18S-ITS and genomic sequencing of fungal and bacterial isolates confirmed that LoopSeq sequencing maintains that accuracy for reads up to 6 kb in length. LoopSeq full-length 16S rRNA reads could accurately classify organisms down to the species level in rinsate from retail meat samples, and could differentiate strains within species identified by the CDC as potential foodborne pathogens. CONCLUSIONS The order-of-magnitude improvement in length and accuracy over standard Illumina amplicon sequencing achieved with LoopSeq enables accurate species-level and strain identification from complex- to low-biomass microbiome samples. The ability to generate accurate and long microbiome sequencing reads using standard short read sequencers will accelerate the building of quality microbial sequence databases and removes a significant hurdle on the path to precision microbial genomics. Video abstract.
Collapse
Affiliation(s)
- Benjamin J. Callahan
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC USA
| | - Dmitry Grinevich
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC USA
| | - Siddhartha Thakur
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC USA
| | | | | |
Collapse
|
147
|
Sun W, Modica S, Dong H, Wolfrum C. Plasticity and heterogeneity of thermogenic adipose tissue. Nat Metab 2021; 3:751-761. [PMID: 34158657 DOI: 10.1038/s42255-021-00417-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 05/19/2021] [Indexed: 12/13/2022]
Abstract
The perception of adipose tissue, both in the scientific community and in the general population, has changed dramatically in the past 20 years. While adipose tissue was thought for a long time to be a rather simple lipid storage entity, it is now recognized as a highly heterogeneous organ and a critical regulator of systemic metabolism, composed of many different subtypes of cells, with important endocrine functions. Additionally, adipose tissue is nowadays recognized to contribute to energy turnover, due to the presence of specialized thermogenic adipocytes, which can be found in many adipose depots. This review discusses the unprecedented insights that we have gained into the heterogeneity of thermogenic adipocytes and their respective precursors due to the technical developments in single-cell and nucleus technologies. These methodological advances have increased our understanding of how adipose tissue catabolic function is influenced by developmental and intercellular communication events.
Collapse
Affiliation(s)
- Wenfei Sun
- Institute of Food, Nutrition and Health, ETH Zurich, Schwerzenbach, Switzerland
| | - Salvatore Modica
- Institute of Food, Nutrition and Health, ETH Zurich, Schwerzenbach, Switzerland
| | - Hua Dong
- Institute of Food, Nutrition and Health, ETH Zurich, Schwerzenbach, Switzerland
| | - Christian Wolfrum
- Institute of Food, Nutrition and Health, ETH Zurich, Schwerzenbach, Switzerland.
| |
Collapse
|
148
|
Srivastava K, Fratzscher AS, Lan B, Flegel WA. Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database. BMC Bioinformatics 2021; 22:273. [PMID: 34039276 PMCID: PMC8150616 DOI: 10.1186/s12859-021-04169-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 05/04/2021] [Indexed: 12/18/2022] Open
Abstract
Background Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we identified long reference sequences by analyzing the homozygous genomic regions in this online database, a concept that has rarely been reported since next generation sequencing data became available. Study design and methods Phased genotype data for a 80.6 kb region of chromosome 1 was downloaded for all 2,504 unrelated individuals of the 1000 Genome Project Phase 3 cohort. The data was centered on the ACKR1 gene and bordered by the CADM3 and FCER1A genes. Individuals with heterozygosity at a single site or with complete homozygosity allowed unambiguous assignment of an ACKR1 haplotype. A computer algorithm was developed for extracting these haplotypes from the 1000 Genome Project in an automated fashion. A manual analysis validated the data extracted by the algorithm. Results We confirmed 902 ACKR1 haplotypes of varying lengths, the longest at 80,584 nucleotides and shortest at 1,901 nucleotides. The combined length of haplotype sequences comprised 19,895,388 nucleotides with a median of 16,014 nucleotides. Based on our approach, all haplotypes can be considered experimentally confirmed and not affected by the known errors of computerized genotype phasing. Conclusions Tracts of homozygosity can provide definitive reference sequences for any gene. They are particularly useful when observed in unrelated individuals of large scale sequence databases. As a proof of principle, we explored the 1000 Genomes Project database for ACKR1 gene data and mined long haplotypes. These haplotypes are useful for high throughput analysis with next generation sequencing. Our approach is scalable, using automated bioinformatics tools, and can be applied to any gene. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04169-6.
Collapse
Affiliation(s)
- Kshitij Srivastava
- Laboratory Services Section, Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anne-Sophie Fratzscher
- Laboratory Services Section, Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Bo Lan
- Laboratory Services Section, Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Willy Albert Flegel
- Laboratory Services Section, Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
149
|
Mukhtar M, Sargazi S, Barani M, Madry H, Rahdar A, Cucchiarini M. Application of Nanotechnology for Sensitive Detection of Low-Abundance Single-Nucleotide Variations in Genomic DNA: A Review. NANOMATERIALS (BASEL, SWITZERLAND) 2021; 11:1384. [PMID: 34073904 PMCID: PMC8225127 DOI: 10.3390/nano11061384] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 05/20/2021] [Accepted: 05/21/2021] [Indexed: 01/02/2023]
Abstract
Single-nucleotide polymorphisms (SNPs) are the simplest and most common type of DNA variations in the human genome. This class of attractive genetic markers, along with point mutations, have been associated with the risk of developing a wide range of diseases, including cancer, cardiovascular diseases, autoimmune diseases, and neurodegenerative diseases. Several existing methods to detect SNPs and mutations in body fluids have faced limitations. Therefore, there is a need to focus on developing noninvasive future polymerase chain reaction (PCR)-free tools to detect low-abundant SNPs in such specimens. The detection of small concentrations of SNPs in the presence of a large background of wild-type genes is the biggest hurdle. Hence, the screening and detection of SNPs need efficient and straightforward strategies. Suitable amplification methods are being explored to avoid high-throughput settings and laborious efforts. Therefore, currently, DNA sensing methods are being explored for the ultrasensitive detection of SNPs based on the concept of nanotechnology. Owing to their small size and improved surface area, nanomaterials hold the extensive capacity to be used as biosensors in the genotyping and highly sensitive recognition of single-base mismatch in the presence of incomparable wild-type DNA fragments. Different nanomaterials have been combined with imaging and sensing techniques and amplification methods to facilitate the less time-consuming and easy detection of SNPs in different diseases. This review aims to highlight some of the most recent findings on the aspects of nanotechnology-based SNP sensing methods used for the specific and ultrasensitive detection of low-concentration SNPs and rare mutations.
Collapse
Affiliation(s)
- Mahwash Mukhtar
- Faculty of Pharmacy, Institute of Pharmaceutical Technology and Regulatory Affairs, University of Szeged, 6720 Szeged, Hungary;
| | - Saman Sargazi
- Cellular and Molecular Research Center, Resistant Tuberculosis Institute, Zahedan University of Medical Sciences, Zahedan 98167-43463, Iran;
| | - Mahmood Barani
- Department of Chemistry, Shahid Bahonar University of Kerman, Kerman 76169-14111, Iran;
| | - Henning Madry
- Center of Experimental Orthopaedics, Saarland University Medical Center, D-66421 Homburg/Saar, Germany;
| | - Abbas Rahdar
- Department of Physics, Faculty of Science, University of Zabol, Zabol 538-98615, Iran
| | - Magali Cucchiarini
- Center of Experimental Orthopaedics, Saarland University Medical Center, D-66421 Homburg/Saar, Germany;
| |
Collapse
|
150
|
Wu CY, Lau BT, Kim HS, Sathe A, Grimes SM, Ji HP, Zhang NR. Integrative single-cell analysis of allele-specific copy number alterations and chromatin accessibility in cancer. Nat Biotechnol 2021; 39:1259-1269. [PMID: 34017141 DOI: 10.1038/s41587-021-00911-w] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 04/01/2021] [Indexed: 12/12/2022]
Abstract
Cancer progression is driven by both somatic copy number aberrations (CNAs) and chromatin remodeling, yet little is known about the interplay between these two classes of events in shaping the clonal diversity of cancers. We present Alleloscope, a method for allele-specific copy number estimation that can be applied to single-cell DNA- and/or transposase-accessible chromatin-sequencing (scDNA-seq, ATAC-seq) data, enabling combined analysis of allele-specific copy number and chromatin accessibility. On scDNA-seq data from gastric, colorectal and breast cancer samples, with validation using matched linked-read sequencing, Alleloscope finds pervasive occurrence of highly complex, multiallelic CNAs, in which cells that carry varying allelic configurations adding to the same total copy number coevolve within a tumor. On scATAC-seq from two basal cell carcinoma samples and a gastric cancer cell line, Alleloscope detected multiallelic copy number events and copy-neutral loss-of-heterozygosity, enabling dissection of the contributions of chromosomal instability and chromatin remodeling to tumor evolution.
Collapse
Affiliation(s)
- Chi-Yun Wu
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA.,Department of Statistics, University of Pennsylvania, Philadelphia, PA, USA
| | - Billy T Lau
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.,Stanford Genome Technology Center, Stanford University, Palo Alto, CA, USA
| | - Heon Seok Kim
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Anuja Sathe
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Susan M Grimes
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA. .,Stanford Genome Technology Center, Stanford University, Palo Alto, CA, USA.
| | - Nancy R Zhang
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA. .,Department of Statistics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|