1
|
Miljkovic M, Seguin A, Jia X, Cox JE, Catrow JL, Bergonia H, Phillips JD, Stephens WZ, Ward DM. Loss of the mitochondrial protein Abcb10 results in altered arginine metabolism in MEL and K562 cells and nutrient stress signaling through ATF4. J Biol Chem 2023; 299:104877. [PMID: 37269954 PMCID: PMC10316008 DOI: 10.1016/j.jbc.2023.104877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/11/2023] [Accepted: 05/25/2023] [Indexed: 06/05/2023] Open
Abstract
Abcb10 is a mitochondrial membrane protein involved in hemoglobinization of red cells. Abcb10 topology and ATPase domain localization suggest it exports a substrate, likely biliverdin, out of mitochondria that is necessary for hemoglobinization. In this study, we generated Abcb10 deletion cell lines in both mouse murine erythroleukemia and human erythroid precursor human myelogenous leukemia (K562) cells to better understand the consequences of Abcb10 loss. Loss of Abcb10 resulted in an inability to hemoglobinize upon differentiation in both K562 and mouse murine erythroleukemia cells with reduced heme and intermediate porphyrins and decreased levels of aminolevulinic acid synthase 2 activity. Metabolomic and transcriptional analyses revealed that Abcb10 loss gave rise to decreased cellular arginine levels, increased transcripts for cationic and neutral amino acid transporters with reduced levels of the citrulline to arginine converting enzymes argininosuccinate synthetase and argininosuccinate lyase. The reduced arginine levels in Abcb10-null cells gave rise to decreased proliferative capacity. Arginine supplementation improved both Abcb10-null proliferation and hemoglobinization upon differentiation. Abcb10-null cells showed increased phosphorylation of eukaryotic translation initiation factor 2 subunit alpha, increased expression of nutrient sensing transcription factor ATF4 and downstream targets DNA damage inducible transcript 3 (Chop), ChaC glutathione specific gamma-glutamylcyclotransferase 1 (Chac1), and arginyl-tRNA synthetase 1 (Rars). These results suggest that when the Abcb10 substrate is trapped in the mitochondria, the nutrient sensing machinery is turned on remodeling transcription to block protein synthesis necessary for proliferation and hemoglobin biosynthesis in erythroid models.
Collapse
Affiliation(s)
- Marisa Miljkovic
- Division of Microbiology and Immunology, Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - Alexandra Seguin
- Division of Microbiology and Immunology, Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - Xuan Jia
- Division of Microbiology and Immunology, Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - James E Cox
- Department of Biochemistry, University of Utah School of Medicine, Salt Lake City, Utah, USA; Metabolomics Core Research Facility, University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - Jonathan Leon Catrow
- Metabolomics Core Research Facility, University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - Hector Bergonia
- Iron and Heme Core Research Facility, University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - John D Phillips
- Division of Hematology, Department of Medicine, University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - W Zac Stephens
- Division of Microbiology and Immunology, Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah, USA
| | - Diane M Ward
- Division of Microbiology and Immunology, Department of Pathology, University of Utah School of Medicine, Salt Lake City, Utah, USA.
| |
Collapse
|
2
|
Song W, Podicheti R, Rusch DB, Tracey WD. Transcriptome-wide analysis of pseudouridylation in Drosophila melanogaster. G3 (BETHESDA, MD.) 2023; 13:jkac333. [PMID: 36534986 PMCID: PMC9997552 DOI: 10.1093/g3journal/jkac333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 12/10/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022]
Abstract
Pseudouridine (Psi) is one of the most frequent post-transcriptional modification of RNA. Enzymatic Psi modification occurs on rRNA, snRNA, snoRNA, tRNA, and non-coding RNA and has recently been discovered on mRNA. Transcriptome-wide detection of Psi (Psi-seq) has yet to be performed for the widely studied model organism Drosophila melanogaster. Here, we optimized Psi-seq analysis for this species and have identified thousands of Psi modifications throughout the female fly head transcriptome. We find that Psi is widespread on both cellular and mitochondrial rRNAs. In addition, more than a thousand Psi sites were found on mRNAs. When pseudouridylated, mRNAs frequently had many Psi sites. Many mRNA Psi sites are present in genes encoding for ribosomal proteins, and many are found in mitochondrial encoded RNAs, further implicating the importance of pseudouridylation for ribosome and mitochondrial function. The 7SLRNA of the signal recognition particle is the non-coding RNA most enriched for Psi. The 3 mRNAs most enriched for Psi encode highly expressed yolk proteins (Yp1, Yp2, and Yp3). By comparing the pseudouridine profiles in the RluA-2 mutant and the w1118 control genotype, we identified Psi sites that were missing in the mutant RNA as potential RluA-2 targets. Finally, differential gene expression analysis of the mutant transcriptome indicates a major impact of loss of RluA-2 on the ribosome and translational machinery.
Collapse
Affiliation(s)
- Wan Song
- Gill Center for Biomolecular Research, Indiana University, Bloomington, IN 47405, USA
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Ram Podicheti
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
| | - Douglas B Rusch
- Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47405, USA
| | - William Daniel Tracey
- Gill Center for Biomolecular Research, Indiana University, Bloomington, IN 47405, USA
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
3
|
Albarqi MMY, Ryder SP. The endogenous mex-3 3´UTR is required for germline repression and contributes to optimal fecundity in C. elegans. PLoS Genet 2021; 17:e1009775. [PMID: 34424904 PMCID: PMC8412283 DOI: 10.1371/journal.pgen.1009775] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 09/02/2021] [Accepted: 08/11/2021] [Indexed: 11/18/2022] Open
Abstract
RNA regulation is essential to successful reproduction. Messenger RNAs delivered from parent to progeny govern early embryonic development. RNA-binding proteins (RBPs) are the key effectors of this process, regulating the translation and stability of parental transcripts to control cell fate specification events prior to zygotic gene activation. The KH-domain RBP MEX-3 is conserved from nematode to human. It was first discovered in Caenorhabditis elegans, where it is essential for anterior cell fate and embryo viability. Here, we show that loss of the endogenous mex-3 3´UTR disrupts its germline expression pattern. An allelic series of 3´UTR deletion variants identify repressing regions of the UTR and demonstrate that repression is not precisely coupled to reproductive success. We also show that several RBPs regulate mex-3 mRNA through its 3´UTR to define its unique germline spatiotemporal expression pattern. Additionally, we find that both poly(A) tail length control and the translation initiation factor IFE-3 contribute to its expression pattern. Together, our results establish the importance of the mex-3 3´UTR to reproductive health and its expression in the germline. Our results suggest that additional mechanisms control MEX-3 function when 3´UTR regulation is compromised. In sexually reproducing organisms, germ cells undergo meiosis and differentiate to form oocytes or sperm. Coordination of this process requires a gene regulatory program that acts while the genome is undergoing chromatin condensation. As such, RNA regulatory pathways are an important contributor. The germline of the nematode Caenorhabditis elegans is a suitable model system to study germ cell differentiation. Several RNA-binding proteins (RBPs) coordinate each transition in the germline such as the transition from mitosis to meiosis. MEX-3 is a conserved RNA-binding protein found in most animals including humans. In C. elegans, MEX-3 displays a highly restricted pattern of expression. Here, we define the importance of the 3´UTR in regulating MEX-3 expression pattern in vivo and characterize the RNA-binding proteins involved in this regulation. Our results show that deleting various mex-3 3´UTR regions alter the pattern of expression in the germline in various ways. These mutations also reduced—but did not eliminate—reproductive capacity. Finally, we demonstrate that multiple post-transcriptional mechanisms control MEX-3 levels in different domains of the germline. Our data suggest that coordination of MEX-3 activity requires multiple layers of regulation to ensure reproductive robustness.
Collapse
Affiliation(s)
- Mennatallah M. Y. Albarqi
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Sean P. Ryder
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
4
|
Li RY, Guan J, Zhou S. Boosting scRNA-seq data clustering by cluster-aware feature weighting. BMC Bioinformatics 2021; 22:130. [PMID: 34078287 PMCID: PMC8171019 DOI: 10.1186/s12859-021-04033-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 02/16/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The rapid development of single-cell RNA sequencing (scRNA-seq) enables the exploration of cell heterogeneity, which is usually done by scRNA-seq data clustering. The essence of scRNA-seq data clustering is to group cells by measuring the similarities among genes/transcripts of cells. And the selection of features for cell similarity evaluation is of great importance, which will significantly impact clustering effectiveness and efficiency. RESULTS In this paper, we propose a novel method called CaFew to select genes based on cluster-aware feature weighting. By optimizing the clustering objective function, CaFew obtains a feature weight matrix, which is further used for feature selection. The genes have large weights in at least one cluster or the genes whose weights vary greatly in different clusters are selected. Experiments on 8 real scRNA-seq datasets show that CaFew can obviously improve the clustering performance of existing scRNA-seq data clustering methods. Particularly, the combination of CaFew with SC3 achieves the state-of-art performance. Furthermore, CaFew also benefits the visualization of scRNA-seq data. CONCLUSION CaFew is an effective scRNA-seq data clustering method due to its gene selection mechanism based on cluster-aware feature weighting, and it is a useful tool for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Rui-Yi Li
- Department of Computer Science and Technology, Tongji University, 4800 Caoan Road, Shanghai, 201804 China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, 4800 Caoan Road, Shanghai, 201804 China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 220 Handan Road, Shanghai, 200433 China
| |
Collapse
|
5
|
Mariani M, Zimmerman C, Rodriguez P, Hasenohr E, Aimola G, Gerrard DL, Richman A, Dest A, Flamand L, Kaufer B, Frietze S. Higher-Order Chromatin Structures of Chromosomally Integrated HHV-6A Predict Integration Sites. Front Cell Infect Microbiol 2021; 11:612656. [PMID: 33718266 PMCID: PMC7953476 DOI: 10.3389/fcimb.2021.612656] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/20/2021] [Indexed: 12/31/2022] Open
Abstract
Human herpesvirus -6A and 6B (HHV-6A/B) can integrate their genomes into the telomeres of human chromosomes. Viral integration can occur in several cell types, including germinal cells, resulting in individuals that harbor the viral genome in every cell of their body. The integrated genome is efficiently silenced but can sporadically reactivate resulting in various clinical symptoms. To date, the integration mechanism and the subsequent silencing of HHV-6A/B genes remains poorly understood. Here we investigate the genome-wide chromatin contacts of the integrated HHV-6A in latently-infected cells. We show that HHV-6A becomes transcriptionally silent upon infection of these cells over the course of seven days. In addition, we established an HHV-6-specific 4C-seq approach, revealing that the HHV-6A 3D interactome is associated with quiescent chromatin states in cells harboring integrated virus. Furthermore, we observed that the majority of virus chromatin interactions occur toward the distal ends of specific human chromosomes. Exploiting this finding, we established a 4C-seq method that accurately detects the chromosomal integration sites. We further implement long-read minION sequencing in the 4C-seq assay and developed a method to identify HHV-6A/B integration sites in clinical samples.
Collapse
Affiliation(s)
- Michael Mariani
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Cosima Zimmerman
- Institute of Virology, Freie Universität Berlin, Berlin, Germany
| | - Princess Rodriguez
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Ellie Hasenohr
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Giulia Aimola
- Institute of Virology, Freie Universität Berlin, Berlin, Germany
| | - Diana Lea Gerrard
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Alyssa Richman
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Andrea Dest
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States
| | - Louis Flamand
- Division of Infectious Disease and Immunity, CHU de Québec Research Center-Université Laval, Quebec City, QC, Canada
| | - Benedikt Kaufer
- Institute of Virology, Freie Universität Berlin, Berlin, Germany
| | - Seth Frietze
- Department of Biomedical and Health Sciences, College of Nursing and Health Sciences, University of Vermont, Burlington, VT, United States.,University of Vermont Cancer Center, Burlington, VT, United States
| |
Collapse
|
6
|
Liu Y, Hou T, Miao Y, Liu M, Liu F. IM-c-means: a new clustering algorithm for clusters with skewed distributions. Pattern Anal Appl 2020. [DOI: 10.1007/s10044-020-00932-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
7
|
The ParaHox gene Cdx4 induces acute erythroid leukemia in mice. Blood Adv 2020; 3:3729-3739. [PMID: 31770439 DOI: 10.1182/bloodadvances.2019000761] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Accepted: 10/04/2019] [Indexed: 11/20/2022] Open
Abstract
Acute erythroid leukemia (AEL) is a rare and aggressive form of acute leukemia, the biology of which remains poorly understood. Here we demonstrate that the ParaHox gene CDX4 is expressed in patients with acute erythroid leukemia, and that aberrant expression of Cdx4 induced homogenously a transplantable acute erythroid leukemia in mice. Gene expression analyses demonstrated upregulation of genes involved in stemness and leukemogenesis, with parallel downregulation of target genes of Gata1 and Gata2 responsible for erythroid differentiation. Cdx4 induced a proteomic profile that overlapped with a cluster of proteins previously defined to represent the most primitive human erythroid progenitors. Whole-exome sequencing of diseased mice identified recurrent mutations significantly enriched for transcription factors involved in erythroid lineage specification, as well as TP53 target genes partly identical to the ones reported in patients with AEL. In summary, our data indicate that Cdx4 is able to induce stemness and inhibit terminal erythroid differentiation, leading to the development of AEL in association with co-occurring mutations.
Collapse
|
8
|
Shang J, Sun Y. CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning. Methods 2020; 189:95-103. [PMID: 32454212 PMCID: PMC7255349 DOI: 10.1016/j.ymeth.2020.05.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 05/05/2020] [Accepted: 05/17/2020] [Indexed: 02/07/2023] Open
Abstract
The fast accumulation of viral metagenomic data has contributed significantly to new RNA virus discovery. However, the short read size, complex composition, and large data size can all make taxonomic analysis difficult. In particular, commonly used alignment-based methods are not ideal choices for detecting new viral species. In this work, we present a novel hierarchical classification model named CHEER, which can conduct read-level taxonomic classification from order to genus for new species. By combining k-mer embedding-based encoding, hierarchically organized CNNs, and carefully trained rejection layer, CHEER is able to assign correct taxonomic labels for reads from new species. We tested CHEER on both simulated and real sequencing data. The results show that CHEER can achieve higher accuracy than popular alignment-based and alignment-free taxonomic assignment tools. The source code, scripts, and pre-trained parameters for CHEER are available via GitHub:https://github.com/KennthShang/CHEER.
Collapse
Affiliation(s)
- Jiayu Shang
- Electrical Engineering Dept., City University of Hong Kong, Kowloon, Hong Kong Special Administrative Region
| | - Yanni Sun
- Electrical Engineering Dept., City University of Hong Kong, Kowloon, Hong Kong Special Administrative Region.
| |
Collapse
|
9
|
Li K, Lu Y, Deng L, Wang L, Shi L, Wang Z. Deconvolute individual genomes from metagenome sequences through short read clustering. PeerJ 2020; 8:e8966. [PMID: 32296615 PMCID: PMC7150542 DOI: 10.7717/peerj.8966] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 03/24/2020] [Indexed: 12/17/2022] Open
Abstract
Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads by species before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems. Here we extended our previous read clustering software, SpaRC, by exploiting statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using synthetic and real-world datasets we demonstrated that this method has the potential to cluster almost all of the short reads from genomes with sufficient sequencing coverage. The improved read clustering in turn leads to improved downstream genome assembly quality.
Collapse
Affiliation(s)
- Kexue Li
- School of Mechanics Engineering and Automation, Shanghai University, Shanghai, China.,Shanghai Key Laboratory of Power Station Automation Technology, Shanghai, China
| | - Yakang Lu
- School of Mechanics Engineering and Automation, Shanghai University, Shanghai, China.,Shanghai Key Laboratory of Power Station Automation Technology, Shanghai, China
| | - Li Deng
- School of Mechanics Engineering and Automation, Shanghai University, Shanghai, China.,Shanghai Key Laboratory of Power Station Automation Technology, Shanghai, China.,Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA
| | - Lili Wang
- School of Mechanics Engineering and Automation, Shanghai University, Shanghai, China.,Shanghai Key Laboratory of Power Station Automation Technology, Shanghai, China
| | - Lizhen Shi
- Department of Computer Science, Florida State University, Tallahassee, FL, USA
| | - Zhong Wang
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,School of Natural Sciences, University of California at Merced, Merced, CA, USA
| |
Collapse
|
10
|
Zhou Y, Zhang W, Wu H, Huang K, Jin J. A high-resolution genomic composition-based method with the ability to distinguish similar bacterial organisms. BMC Genomics 2019; 20:754. [PMID: 31638897 PMCID: PMC6805505 DOI: 10.1186/s12864-019-6119-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 09/20/2019] [Indexed: 12/03/2022] Open
Abstract
Background Genomic composition has been found to be species specific and is used to differentiate bacterial species. To date, almost no published composition-based approaches are able to distinguish between most closely related organisms, including intra-genus species and intra-species strains. Thus, it is necessary to develop a novel approach to address this problem. Results Here, we initially determine that the “tetranucleotide-derived z-value Pearson correlation coefficient” (TETRA) approach is representative of other published statistical methods. Then, we devise a novel method called “Tetranucleotide-derived Z-value Manhattan Distance” (TZMD) and compare it with the TETRA approach. Our results show that TZMD reflects the maximal genome difference, while TETRA does not in most conditions, demonstrating in theory that TZMD provides improved resolution. Additionally, our analysis of real data shows that TZMD improves species differentiation and clearly differentiates similar organisms, including similar species belonging to the same genospecies, subspecies and intraspecific strains, most of which cannot be distinguished by TETRA. Furthermore, TZMD is able to determine clonal strains with the TZMD = 0 criterion, which intrinsically encompasses identical composition, high average nucleotide identity and high percentage of shared genomes. Conclusions Our extensive assessment demonstrates that TZMD has high resolution. This study is the first to propose a composition-based method for differentiating bacteria at the strain level and to demonstrate that composition is also strain specific. TZMD is a powerful tool and the first easy-to-use approach for differentiating clonal and non-clonal strains. Therefore, as the first composition-based algorithm for strain typing, TZMD will facilitate bacterial studies in the future.
Collapse
Affiliation(s)
- Yizhuang Zhou
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, People's Republic of China.
| | - Wenting Zhang
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Huixian Wu
- China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Kai Huang
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Junfei Jin
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.
| |
Collapse
|
11
|
Chiara M, Placido A, Picardi E, Ceci LR, Horner DS, Pesole G. A-GAME: improving the assembly of pooled functional metagenomics sequence data. BMC Genomics 2018; 19:44. [PMID: 29329522 PMCID: PMC5767027 DOI: 10.1186/s12864-017-4369-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Accepted: 12/08/2017] [Indexed: 02/06/2023] Open
Abstract
Background Expression screening of environmental DNA (eDNA) libraries is a popular approach for the identification and characterization of novel microbial enzymes with promising biotechnological properties. In such “functional metagenomics” experiments, inserts, selected on the basis of activity assays, are sequenced with high throughput sequencing technologies. Assembly is followed by gene prediction, annotation and identification of candidate genes that are subsequently evaluated for biotechnological applications. Results Here we present A-GAME (A GAlaxy suite for functional MEtagenomics), a web service incorporating state of the art tools and workflows for the analysis of eDNA sequence data. We illustrate the potential of A-GAME workflows using real functional metagenomics data, showing that they outperform alternative metagenomics assemblers. Dedicated tools available in A-GAME allow efficient analysis of pooled libraries and rapid identification of candidate genes, reducing sequencing costs and saving the need for laborious manual annotation. Conclusion In conclusion, we believe A-GAME will constitute a valuable resource for the functional metagenomics community. A-GAME is publicly available at http://beaconlab.it/agame Electronic supplementary material The online version of this article (10.1186/s12864-017-4369-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Matteo Chiara
- Department of Biosciences, University of Milan, via Celoria 26, 20133, Milan, Italy
| | - Antonio Placido
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, via Amendola 165A, 70126, Bari, Italy
| | - Ernesto Picardi
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, via Amendola 165A, 70126, Bari, Italy.,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari "A. Moro", via Orabona, 4, 70126, Bari, Italy
| | - Luigi Ruggiero Ceci
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, via Amendola 165A, 70126, Bari, Italy
| | - David Stephen Horner
- Department of Biosciences, University of Milan, via Celoria 26, 20133, Milan, Italy. .,Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, via Amendola 165A, 70126, Bari, Italy.
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, via Amendola 165A, 70126, Bari, Italy.,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari "A. Moro", via Orabona, 4, 70126, Bari, Italy
| |
Collapse
|
12
|
Lu YY, Chen T, Fuhrman JA, Sun F. COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge. Bioinformatics 2017; 33:791-798. [PMID: 27256312 DOI: 10.1093/bioinformatics/btw290] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 04/29/2016] [Indexed: 02/04/2023] Open
Abstract
Motivation The advent of next-generation sequencing technologies enables researchers to sequence complex microbial communities directly from the environment. Because assembly typically produces only genome fragments, also known as contigs, instead of an entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functional analysis. OTU clustering is also referred to as binning. We present COCACOLA, a general framework automatically bin contigs into OTUs based on sequence composition and coverage across multiple samples. Results The effectiveness of COCACOLA is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, GroopM, MaxBin and MetaBAT. The superior performance of COCACOLA relies on two aspects. One is using L 1 distance instead of Euclidean distance for better taxonomic identification during initialization. More importantly, COCACOLA takes advantage of both hard clustering and soft clustering by sparsity regularization. In addition, the COCACOLA framework seamlessly embraces customized knowledge to facilitate binning accuracy. In our study, we have investigated two types of additional knowledge, the co-alignment to reference genomes and linkage of contigs provided by paired-end reads, as well as the ensemble of both. We find that both co-alignment and linkage information further improve binning in the majority of cases. COCACOLA is scalable and faster than CONCOCT, GroopM, MaxBin and MetaBAT. Availability and implementation The software is available at https://github.com/younglululu/COCACOLA . Contact fsun@usc.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Young Lu
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Ting Chen
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.,Center for Synthetic and Systems Biology, TNLIST, Beijing, China
| | - Jed A Fuhrman
- Department of Biological Sciences and Wrigley Institute for Environmental Studies, University of Southern California, Los Angeles, CA, USA
| | - Fengzhu Sun
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA.,Center for Computational Systems Biology, Fudan University, Shanghai, China
| |
Collapse
|
13
|
Fierst JL, Murdock DA. Decontaminating eukaryotic genome assemblies with machine learning. BMC Bioinformatics 2017; 18:533. [PMID: 29191179 PMCID: PMC5709863 DOI: 10.1186/s12859-017-1941-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 11/14/2017] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND High-throughput sequencing has made it theoretically possible to obtain high-quality de novo assembled genome sequences but in practice DNA extracts are often contaminated with sequences from other organisms. Currently, there are few existing methods for rigorously decontaminating eukaryotic assemblies. Those that do exist filter sequences based on nucleotide similarity to contaminants and risk eliminating sequences from the target organism. RESULTS We introduce a novel application of an established machine learning method, a decision tree, that can rigorously classify sequences. The major strength of the decision tree is that it can take any measured feature as input and does not require a priori identification of significant descriptors. We use the decision tree to classify de novo assembled sequences and compare the method to published protocols. CONCLUSIONS A decision tree performs better than existing methods when classifying sequences in eukaryotic de novo assemblies. It is efficient, readily implemented, and accurately identifies target and contaminant sequences. Importantly, a decision tree can be used to classify sequences according to measured descriptors and has potentially many uses in distilling biological datasets.
Collapse
Affiliation(s)
- Janna L Fierst
- Department of Biological Sciences, University of Alabama, Tuscaloosa, 35487, AL, USA.
| | - Duncan A Murdock
- Department of Biological Sciences, University of Alabama, Tuscaloosa, 35487, AL, USA
| |
Collapse
|
14
|
Liu Y, Hou T, Kang B, Liu F. Unsupervised Binning of Metagenomic Assembled Contigs Using Improved Fuzzy C-Means Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1459-1467. [PMID: 27295684 DOI: 10.1109/tcbb.2016.2576452] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Metagenomic contigs binning is a necessary step of metagenome analysis. After assembly, the number of contigs belonging to different genomes is usually unequal. So a metagenomic contigs dataset is a kind of imbalanced dataset and traditional fuzzy c-means method (FCM) fails to handle it very well. In this paper, we will introduce an improved version of fuzzy c-means method (IFCM) into metagenomic contigs binning. First, tetranucleotide frequencies are calculated for every contig. Second, the number of bins is roughly estimated by the distribution of genome lengths of a complete set of non-draft sequenced microbial genomes from NCBI. Then, IFCM is used to cluster DNA contigs with the estimated result. Finally, a clustering validity function is utilized to determine the binning result. We tested this method on a synthetic and two real datasets and experimental results have showed the effectiveness of this method compared with other tools.
Collapse
|
15
|
Dubinkina VB, Ischenko DS, Ulyantsev VI, Tyakht AV, Alexeev DG. Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis. BMC Bioinformatics 2016; 17:38. [PMID: 26774270 PMCID: PMC4715287 DOI: 10.1186/s12859-015-0875-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 12/14/2015] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND A rapidly increasing flow of genomic data requires the development of efficient methods for obtaining its compact representation. Feature extraction facilitates classification, clustering and model analysis for testing and refining biological hypotheses. "Shotgun" metagenome is an analytically challenging type of genomic data - containing sequences of all genes from the totality of a complex microbial community. Recently, researchers started to analyze metagenomes using reference-free methods based on the analysis of oligonucleotides (k-mers) frequency spectrum previously applied to isolated genomes. However, little is known about their correlation with the existing approaches for metagenomic feature extraction, as well as the limits of applicability. Here we evaluated a metagenomic pairwise dissimilarity measure based on short k-mer spectrum using the example of human gut microbiota, a biomedically significant object of study. RESULTS We developed a method for calculating pairwise dissimilarity (beta-diversity) of "shotgun" metagenomes based on short k-mer spectra (5 ≤ k ≤ 11). The method was validated on simulated metagenomes and further applied to a large collection of human gut metagenomes from the populations of the world (n=281). The k-mer spectrum-based measure was found to behave similarly to one based on mapping to a reference gene catalog, but different from one using a genome catalog. This difference turned out to be associated with a significant presence of viral reads in a number of metagenomes. Simulations showed limited impact of bacterial genetic variability as well as sequencing errors on k-mer spectra. Specific differences between the datasets from individual populations were identified. CONCLUSIONS Our approach allows rapid estimation of pairwise dissimilarity between metagenomes. Though we applied this technique to gut microbiota, it should be useful for arbitrary metagenomes, even metagenomes with novel microbiota. Dissimilarity measure based on k-mer spectrum provides a wider perspective in comparison with the ones based on the alignment against reference sequence sets. It helps not to miss possible outstanding features of metagenomic composition, particularly related to the presence of an unknown bacteria, virus or eukaryote, as well as to technical artifacts (sample contamination, reads of non-biological origin, etc.) at the early stages of bioinformatic analysis. Our method is complementary to reference-based approaches and can be easily integrated into metagenomic analysis pipelines.
Collapse
Affiliation(s)
- Veronika B Dubinkina
- Research Institute of Physico-Chemical Medicine, Malaya Pirogovskaya, Moscow, 119435, Russia. .,Moscow Institute of Physics and Technology (State University), Institutskiy per., Dolgoprudny, 141700, Russia.
| | - Dmitry S Ischenko
- Research Institute of Physico-Chemical Medicine, Malaya Pirogovskaya, Moscow, 119435, Russia. .,Moscow Institute of Physics and Technology (State University), Institutskiy per., Dolgoprudny, 141700, Russia.
| | | | - Alexander V Tyakht
- Research Institute of Physico-Chemical Medicine, Malaya Pirogovskaya, Moscow, 119435, Russia. .,Moscow Institute of Physics and Technology (State University), Institutskiy per., Dolgoprudny, 141700, Russia.
| | - Dmitry G Alexeev
- Research Institute of Physico-Chemical Medicine, Malaya Pirogovskaya, Moscow, 119435, Russia. .,Moscow Institute of Physics and Technology (State University), Institutskiy per., Dolgoprudny, 141700, Russia.
| |
Collapse
|
16
|
A New Binning Method for Metagenomics by One-Dimensional Cellular Automata. Int J Genomics 2015; 2015:197895. [PMID: 26557648 PMCID: PMC4628670 DOI: 10.1155/2015/197895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Accepted: 02/09/2015] [Indexed: 11/29/2022] Open
Abstract
More and more developed and inexpensive next-generation sequencing (NGS) technologies allow
us to extract vast sequence data from a sample containing multiple species. Characterizing
the taxonomic diversity for the planet-size data plays an important role in the metagenomic
studies, while a crucial step for doing the study is the binning process to group sequence reads
from similar species or taxonomic classes. The metagenomic binning remains a challenge work
because of not only the various read noises but also the tremendous data volume. In this work,
we propose an unsupervised binning method for NGS reads based on the one-dimensional cellular
automaton (1D-CA). Our binning method facilities to reduce the memory usage because 1D-CA
costs only linear space. Experiments on synthetic dataset exhibit that our method is helpful to
identify species of lower abundance compared to the proposed tool.
Collapse
|
17
|
Zhang R, Cheng Z, Guan J, Zhou S. Exploiting topic modeling to boost metagenomic reads binning. BMC Bioinformatics 2015; 16 Suppl 5:S2. [PMID: 25859745 PMCID: PMC4402587 DOI: 10.1186/1471-2105-16-s5-s2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND With the rapid development of high-throughput technologies, researchers can sequence the whole metagenome of a microbial community sampled directly from the environment. The assignment of these metagenomic reads into different species or taxonomical classes is a vital step for metagenomic analysis, which is referred to as binning of metagenomic data. RESULTS In this paper, we propose a new method TM-MCluster for binning metagenomic reads. First, we represent each metagenomic read as a set of "k-mers" with their frequencies occurring in the read. Then, we employ a probabilistic topic model -- the Latent Dirichlet Allocation (LDA) model to the reads, which generates a number of hidden "topics" such that each read can be represented by a distribution vector of the generated topics. Finally, as in the MCluster method, we apply SKWIC -- a variant of the classical K-means algorithm with automatic feature weighting mechanism to cluster these reads represented by topic distributions. CONCLUSIONS Experiments show that the new method TM-MCluster outperforms major existing methods, including AbundanceBin, MetaCluster 3.0/5.0 and MCluster. This result indicates that the exploitation of topic modeling can effectively improve the binning performance of metagenomic reads.
Collapse
|
18
|
Ghosh TS, Mehra V, Mande SS. Grid-Assembly: An oligonucleotide composition-based partitioning strategy to aid metagenomic sequence assembly. J Bioinform Comput Biol 2015; 13:1541004. [PMID: 25790784 DOI: 10.1142/s0219720015410048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Metagenomics approach involves extraction, sequencing and characterization of the genomic content of entire community of microbes present in a given environment. In contrast to genomic data, accurate assembly of metagenomic sequences is a challenging task. Given the huge volume and the diverse taxonomic origin of metagenomic sequences, direct application of single genome assembly methods on metagenomes are likely to not only lead to an immense increase in requirements of computational infrastructure, but also result in the formation of chimeric contigs. A strategy to address the above challenge would be to partition metagenomic sequence datasets into clusters and assemble separately the sequences in individual clusters using any single-genome assembly method. The current study presents such an approach that uses tetranucleotide usage patterns to first represent sequences as points in a three dimensional (3D) space. The 3D space is subsequently partitioned into "Grids". Sequences within overlapping grids are then progressively assembled using any available assembler. We demonstrate the applicability of the current Grid-Assembly method using various categories of assemblers as well as different simulated metagenomic datasets. Validation results indicate that the Grid-Assembly approach helps in improving the overall quality of assembly, in terms of the purity and volume of the assembled contigs.
Collapse
Affiliation(s)
- Tarini Shankar Ghosh
- Biosciences R&D Division, TCS Innovation Labs, 54-B Hadapsar Industrial Estate, Pune, Maharashtra 411013, India
| | | | | |
Collapse
|
19
|
Vinh LV, Lang TV, Binh LT, Hoai TV. A two-phase binning algorithm using l-mer frequency on groups of non-overlapping reads. Algorithms Mol Biol 2015; 10:2. [PMID: 25648210 PMCID: PMC4304631 DOI: 10.1186/s13015-014-0030-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 10/20/2014] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Metagenomics is the study of genetic materials derived directly from complex microbial samples, instead of from culture. One of the crucial steps in metagenomic analysis, referred to as "binning", is to separate reads into clusters that represent genomes from closely related organisms. Among the existing binning methods, unsupervised methods base the classification on features extracted from reads, and especially taking advantage in case of the limitation of reference database availability. However, their performance, under various aspects, is still being investigated by recent theoretical and empirical studies. The one addressed in this paper is among those efforts to enhance the accuracy of the classification. RESULTS This paper presents an unsupervised algorithm, called BiMeta, for binning of reads from different species in a metagenomic dataset. The algorithm consists of two phases. In the first phase of the algorithm, reads are grouped into groups based on overlap information between the reads. The second phase merges the groups by using an observation on l-mer frequency distribution of sets of non-overlapping reads. The experimental results on simulated and real datasets showed that BiMeta outperforms three state-of-the-art binning algorithms for both short and long reads (≥700 b p) datasets. CONCLUSIONS This paper developed a novel and efficient algorithm for binning of metagenomic reads, which does not require any reference database. The software implementing the algorithm and all test datasets mentioned in this paper can be downloaded at http://it.hcmute.edu.vn/bioinfo/bimeta/index.htm.
Collapse
Affiliation(s)
- Le Van Vinh
- />Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam
| | - Tran Van Lang
- />Institute of Applied Mechanics and Informatics, Vietnam Academy of Science and Technology (VAST), 01 Mac Dinh Chi, Q1, Ho Chi Minh City, Vietnam
- />Faculty of Information Technology, Lac Hong University, 10 Huynh Van Nghe, Bien Hoa, Dong Nai Vietnam
| | - Le Thanh Binh
- />Institute of Biotechnology, Vietnam Academy of Science and Technology (VAST), 18 Hoang Quoc Viet, Cau Giay, Ha Noi Vietnam
| | - Tran Van Hoai
- />Faculty of Computer Science and Engineering, HCMC University of Technology, 268 Ly Thuong Kiet, Q10, Ho Chi Minh City, Vietnam
| |
Collapse
|