1
|
Immune environment and antigen specificity of the T cell receptor repertoire of malignant ascites in ovarian cancer. PLoS One 2023; 18:e0279590. [PMID: 36607962 PMCID: PMC9821423 DOI: 10.1371/journal.pone.0279590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 12/10/2022] [Indexed: 01/07/2023] Open
Abstract
We evaluated the association of disease outcome with T cell immune-related characteristics and T cell receptor (TCR) repertoire in malignant ascites from patients with high-grade epithelial ovarian cancer. Ascitic fluid samples were collected from 47 high-grade epithelial ovarian cancer patients and analyzed using flow cytometry and TCR sequencing to characterize the complementarity determining region 3 TCR β-chain. TCR functions were analyzed using the McPAS-TCR and VDJ databases. TCR clustering was implemented using Grouping of Lymphocyte Interactions by Paratope Hotspots software. Patients with poor prognosis had ascites characterized by an increased ratio of CD8+ T cells to regulatory T cells, which correlated with an increased productive frequency of the top 100 clones and decreased productive entropy. TCRs enriched in patients with an excellent or good prognosis were more likely to recognize cancer antigens and contained more TCR reads predicted to recognize epithelial ovarian cancer antigens. In addition, a TCR motif that is predicted to bind the TP53 neoantigen was identified, and this motif was enriched in patients with an excellent or good prognosis. Ascitic fluid in high-grade epithelial ovarian cancer patients with an excellent or good prognosis is enriched with TCRs that may recognize ovarian cancer-specific neoantigens, including mutated TP53 and TEAD1. These results suggest that an effective antigen-specific immune response in ascites is vital for a good outcome in high-grade epithelial ovarian cancer.
Collapse
|
2
|
Abstract
Maximal information coefficient (MIC) explores the associations between pairwise variables in complex relationships. It approaches the correlation by optimized partition on the axis. However, when the relationships meet special noise, MIC may overestimate the correlated value, which leads to the misidentification of the relationship without noiseless. In this article, a novel method of weighted information coefficient mean (WICM) is proposed to detect unbiased associations in large data sets. First, we mathematically analyze the cause of giving an abnormal correlation value to a noisy relationship. Then, the WICM is presented in two core steps. One is to detect the potential overestimation from the relationships with high value, and the other is to rectify the overestimation by calculating information coefficient mean instead of just selecting the maximum element in the characteristic matrix. Finally, experiments in functional relationships and real-world data relationships show that the overestimation can be solved by WICM with both feasibility and effectiveness.
Collapse
Affiliation(s)
- Chuanlu Liu
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Shuliang Wang
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Institute of E-Government, Beijing Institute of Technology, Beijing, China
| | - Hanning Yuan
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Xiaojia Liu
- Department of Data Science and Knowledge Engineering, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
3
|
Giulia A, Anna S, Antonia B, Dario P, Maurizio C. Extending Association Rule Mining to Microbiome Pattern Analysis: Tools and Guidelines to Support Real Applications. FRONTIERS IN BIOINFORMATICS 2022; 1:794547. [PMID: 36303759 PMCID: PMC9580939 DOI: 10.3389/fbinf.2021.794547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/07/2021] [Indexed: 11/24/2022] Open
Abstract
Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.
Collapse
Affiliation(s)
- Agostinetto Giulia
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
- *Correspondence: Agostinetto Giulia,
| | | | - Bruno Antonia
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Pescini Dario
- Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
| | - Casiraghi Maurizio
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
4
|
Abstract
Over the past decade, it has become exceedingly clear that the microbiome is a critical factor in human health and disease and thus should be investigated to develop innovative treatment strategies. The field of metagenomics has come a long way in leveraging the advances of next-generation sequencing technologies resulting in the capability to identify and quantify all microorganisms present in human specimens. However, the field of metagenomics is still in its infancy, specifically in regard to the limitations in computational analysis, statistical assessments, standardization, and validation due to vast variability in the cohorts themselves, experimental design, and bioinformatic workflows. This review summarizes the methods, technologies, computational tools, and model systems for characterizing and studying the microbiome. We also discuss important considerations investigators must make when interrogating the involvement of the microbiome in health and disease in order to establish robust results and mechanistic insights before moving into therapeutic design and intervention.
Collapse
|
5
|
Mitra A, Grossman Biegert GW, Delgado AY, Karpinets TV, Solley TN, Mezzari MP, Yoshida-Court K, Petrosino JF, Mikkelson MD, Lin L, Eifel P, Zhang J, Ramondetta LM, Jhingran A, Sims TT, Schmeler K, Okhuysen P, Colbert LE, Klopp AH. Microbial Diversity and Composition Is Associated with Patient-Reported Toxicity during Chemoradiation Therapy for Cervical Cancer. Int J Radiat Oncol Biol Phys 2020; 107:163-171. [PMID: 31987960 DOI: 10.1016/j.ijrobp.2019.12.040] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 12/20/2019] [Accepted: 12/30/2019] [Indexed: 12/30/2022]
Abstract
PURPOSE Patients receiving pelvic radiation for cervical cancer experience high rates of acute gastrointestinal (GI) toxicity. The association of changes in the gut microbiome with bowel toxicity from radiation is not well characterized. METHODS AND MATERIALS Thirty-five patients undergoing definitive chemoradiation therapy (CRT) underwent longitudinal sampling (baseline and weeks 1, 3, and 5) of the gut microbiome and prospective assessment of patient-reported GI toxicity. DNA was isolated from stool obtained at rectal examination and analyzed with 16S rRNA sequencing. GI toxicity was assessed with the Expanded Prostate Cancer Index Composite instrument to evaluate frequency, urgency, and discomfort associated with bowel function. Shannon diversity index was used to characterize alpha (within sample) diversity. Weighted UniFrac principle coordinates analysis was used to compare beta (between sample) diversity between samples using permutational multivariate analysis of variance. Linear discriminant analysis effect size highlighted microbial features that best distinguish categorized patient samples. RESULTS Gut microbiome diversity continuously decreased over the course of CRT, with the largest decrease at week 5. Expanded Prostate Cancer Index Composite bowel function scores also declined over the course of treatment, reflecting increased symptom burden. At all individual time points, higher diversity of the gut microbiome was linearly correlated with better patient-reported GI function, but baseline diversity was not predictive of eventual outcome. Patients with high toxicity demonstrated different compositional changes during CRT in addition to compositional differences in Clostridia species. CONCLUSIONS Over time, increased radiation toxicity is associated with decreased gut microbiome diversity. Baseline diversity is not predictive of end-of-treatment bowel toxicity, but composition may identify patients at risk for developing high toxicity.
Collapse
Affiliation(s)
- Aparna Mitra
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | | | - Andrea Y Delgado
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Tatiana V Karpinets
- Department of Genomic Medicine, University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Travis N Solley
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Melissa P Mezzari
- Alkek Center for Metagenomics and Microbiome Research Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas
| | - Kyoko Yoshida-Court
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Joe F Petrosino
- Alkek Center for Metagenomics and Microbiome Research Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, Texas
| | - Megan D Mikkelson
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Lilie Lin
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Patricia Eifel
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Jianhua Zhang
- Department of Genomic Medicine, University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Lois M Ramondetta
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Anuja Jhingran
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Travis T Sims
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Kathleen Schmeler
- Department of Gynecologic Oncology and Reproductive Medicine, University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Pablo Okhuysen
- Department of Infectious Diseases, University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Lauren E Colbert
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas.
| | - Ann H Klopp
- Division of Radiation Oncology University of Texas MD Anderson Cancer Center, Houston, Texas
| |
Collapse
|
6
|
Aslam S, Lan XR, Zhang BW, Chen ZL, Wang L, Niu DK. Aerobic prokaryotes do not have higher GC contents than anaerobic prokaryotes, but obligate aerobic prokaryotes have. BMC Evol Biol 2019; 19:35. [PMID: 30691392 PMCID: PMC6350292 DOI: 10.1186/s12862-019-1365-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 01/17/2019] [Indexed: 12/17/2022] Open
Abstract
Background Among the four bases, guanine is the most susceptible to damage from oxidative stress. Replication of DNA containing damaged guanines results in G to T mutations. Therefore, the mutations resulting from oxidative DNA damage are generally expected to predominantly consist of G to T (and C to A when the damaged guanine is not in the reference strand) and result in decreased GC content. However, the opposite pattern was reported 16 years ago in a study of prokaryotic genomes. Although that result has been widely cited and confirmed by nine later studies with similar methods, the omission of the effect of shared ancestry requires a re-examination of the reliability of the results. Results When aerobic and obligate aerobic prokaryotes were mixed together and anaerobic and obligate anaerobic prokaryotes were mixed together, phylogenetic controlled analyses did not detect significant difference in GC content between aerobic and anaerobic prokaryotes. This result is consistent with two generally neglected studied that had accounted for the phylogenetic relationship. However, when obligate aerobic prokaryotes were compared with aerobic prokaryotes, anaerobic prokaryotes, and obligate anaerobic prokaryotes separately using phylogenetic regression analysis, a significant positive association was observed between aerobiosis and GC content, no matter it was calculated from whole genome sequences or the 4-fold degenerate sites of protein-coding genes. Obligate aerobes have significantly higher GC content than aerobes, anaerobes, and obligate anaerobes. Conclusions The positive association between aerobiosis and GC content could be attributed to a mutational force resulting from incorporation of damaged deoxyguanosine during DNA replication rather than oxidation of the guanine nucleotides within DNA sequences. Our results indicate a grade in the aerobiosis-associated mutational force, strong in obligate aerobes, moderate in aerobes, weak in anaerobes and obligate anaerobes. Electronic supplementary material The online version of this article (10.1186/s12862-019-1365-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sidra Aslam
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Xin-Ran Lan
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Bo-Wen Zhang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Zheng-Lin Chen
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Li Wang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Deng-Ke Niu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
7
|
Karpinets TV, Gopalakrishnan V, Wargo J, Futreal AP, Schadt CW, Zhang J. Linking Associations of Rare Low-Abundance Species to Their Environments by Association Networks. Front Microbiol 2018; 9:297. [PMID: 29563898 PMCID: PMC5850922 DOI: 10.3389/fmicb.2018.00297] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 02/08/2018] [Indexed: 01/07/2023] Open
Abstract
Studies of microbial communities by targeted sequencing of rRNA genes lead to recovering numerous rare low-abundance taxa with unknown biological roles. We propose to study associations of such rare organisms with their environments by a computational framework based on transformation of the data into qualitative variables. Namely, we analyze the sparse table of putative species or OTUs (operational taxonomic units) and samples generated in such studies, also known as an OTU table, by collecting statistics on co-occurrences of the species and on shared species richness across samples. Based on the statistics we built two association networks, of the rare putative species and of the samples respectively, using a known computational technique, Association networks (Anets) developed for analysis of qualitative data. Clusters of samples and clusters of OTUs are then integrated and combined with metadata of the study to produce a map of associated putative species in their environments. We tested and validated the framework on two types of microbiomes, of human body sites and that of the Populus tree root systems. We show that in both studies the associations of OTUs can separate samples according to environmental or physiological characteristics of the studied systems.
Collapse
Affiliation(s)
- Tatiana V Karpinets
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Vancheswaran Gopalakrishnan
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Epidemiology, Human Genetics and Environmental Sciences, University of Texas School of Public Health, Dallas, TX, United States
| | - Jennifer Wargo
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Andrew P Futreal
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Christopher W Schadt
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States.,Department of Microbiology, University of Tennessee, Knoxville, Knoxville, TN, United States
| | - Jianhua Zhang
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| |
Collapse
|
8
|
Wang C, Dai D, Li X, Wang A, Zhou X. SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:783-795. [PMID: 27076457 DOI: 10.1109/tcbb.2016.2550430] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The maximal information coefficient (MIC) has been proposed to discover relationships and associations between pairs of variables. It poses significant challenges for bioinformatics scientists to accelerate the MIC calculation, especially in genome sequencing and biological annotations. In this paper, we explore a parallel approach which uses MapReduce framework to improve the computing efficiency and throughput of the MIC computation. The acceleration system includes biological data storage on HDFS, preprocessing algorithms, distributed memory cache mechanism, and the partition of MapReduce jobs. Based on the acceleration approach, we extend the traditional two-variable algorithm to multiple variables algorithm. The experimental results show that our parallel solution provides a linear speedup comparing with original algorithm without affecting the correctness and sensitivity.
Collapse
|
9
|
|
10
|
A family of interaction-adjusted indices of community similarity. ISME JOURNAL 2016; 11:791-807. [PMID: 27935587 PMCID: PMC5322292 DOI: 10.1038/ismej.2016.139] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 06/16/2016] [Accepted: 08/30/2016] [Indexed: 12/27/2022]
Abstract
Interactions between taxa are essential drivers of ecological community structure and dynamics, but they are not taken into account by traditional indices of β diversity. In this study, we propose a novel family of indices that quantify community similarity in the context of taxa interaction networks. Using publicly available datasets, we assessed the performance of two specific indices that are Taxa INteraction-Adjusted (TINA, based on taxa co-occurrence networks), and Phylogenetic INteraction-Adjusted (PINA, based on phylogenetic similarities). TINA and PINA outperformed traditional indices when partitioning human-associated microbial communities according to habitat, even for extremely downsampled datasets, and when organising ocean micro-eukaryotic plankton diversity according to geographical and physicochemical gradients. We argue that interaction-adjusted indices capture novel aspects of diversity outside the scope of traditional approaches, highlighting the biological significance of ecological association networks in the interpretation of community similarity.
Collapse
|
11
|
Naulaerts S, Moens S, Engelen K, Berghe WV, Goethals B, Laukens K, Meysman P. Practical Approaches for Mining Frequent Patterns in Molecular Datasets. Bioinform Biol Insights 2016; 10:37-47. [PMID: 27168722 PMCID: PMC4856181 DOI: 10.4137/bbi.s38419] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Revised: 03/13/2016] [Accepted: 03/16/2016] [Indexed: 12/31/2022] Open
Abstract
Pattern detection is an inherent task in the analysis and interpretation of complex and continuously accumulating biological data. Numerous itemset mining algorithms have been developed in the last decade to efficiently detect specific pattern classes in data. Although many of these have proven their value for addressing bioinformatics problems, several factors still slow down promising algorithms from gaining popularity in the life science community. Many of these issues stem from the low user-friendliness of these tools and the complexity of their output, which is often large, static, and consequently hard to interpret. Here, we apply three software implementations on common bioinformatics problems and illustrate some of the advantages and disadvantages of each, as well as inherent pitfalls of biological data mining. Frequent itemset mining exists in many different flavors, and users should decide their software choice based on their research question, programming proficiency, and added value of extra features.
Collapse
Affiliation(s)
- Stefan Naulaerts
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.; Biomedical Informatics Research Center Antwerpen (Biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium
| | - Sandy Moens
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kristof Engelen
- Department of Computational Biology, Fondazione Edmund Mach, San Michele all'Adige, Trento, Italy
| | - Wim Vanden Berghe
- Department of Biomedical Sciences, Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), University of Antwerp, Antwerp, Belgium
| | - Bart Goethals
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.; Biomedical Informatics Research Center Antwerpen (Biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium
| | - Pieter Meysman
- Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium.; Biomedical Informatics Research Center Antwerpen (Biomina), University of Antwerp/Antwerp University Hospital, Antwerp, Belgium
| |
Collapse
|
12
|
Land M, Hauser L, Jun SR, Nookaew I, Leuze MR, Ahn TH, Karpinets T, Lund O, Kora G, Wassenaar T, Poudel S, Ussery DW. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 2015; 15:141-61. [PMID: 25722247 PMCID: PMC4361730 DOI: 10.1007/s10142-015-0433-4] [Citation(s) in RCA: 412] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 02/11/2015] [Accepted: 02/12/2015] [Indexed: 12/18/2022]
Abstract
Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.
Collapse
Affiliation(s)
- Miriam Land
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Loren Hauser
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA
- Department of Microbiology, University of Tennessee, Knoxville, TN 37996 USA
| | - Se-Ran Jun
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Intawat Nookaew
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Michael R. Leuze
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Tae-Hyuk Ahn
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Tatiana Karpinets
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Ole Lund
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark
| | - Guruprased Kora
- Computer Science and Mathematics Division, Computer Science Research Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
| | - Trudy Wassenaar
- Molecular Microbiology and Genomics Consultants, Tannenstr 7, 55576 Zotzenheim, Germany
| | - Suresh Poudel
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
| | - David W. Ussery
- Comparative Genomics Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 USA
- Joint Institute for Biological Sciences, University of Tennessee, Knoxville, TN 37996 USA
- Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark, Kgs. Lyngby, 2800 Denmark
- Genome Science and Technology, University of Tennessee, Knoxville, TN 37996 USA
| |
Collapse
|
13
|
Zhang W, Zhang Q, Zhang M, Zhang Y, Li F, Lei P. Network analysis in the identification of special mechanisms between small cell lung cancer and non-small cell lung cancer. Thorac Cancer 2014; 5:556-64. [PMID: 26767052 DOI: 10.1111/1759-7714.12134] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 05/04/2014] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND To explore the similar and different pathogenesis between non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). METHODS This study used bioinformatics methods, including functional enrichment analysis, compared the topological features of SCLC and NSCLC in the human protein interaction network in a system aspect, and analyzed the highly intense modules from an integrated network. RESULTS This study included 5082 and 2781 significantly different expression genes for NSCLC and SCLC, respectively. The differently expressed genes of NSCLC are mainly distributed in the extracellular region and synapse. By contrast, the genes of SCLC are located in the organelle, macromolecular complex, membrane-enclosed lumen, cell part, envelope, and synapse. Compared with SCLC, the differently expressed genes of NSCLC act in the biological regulation, multicellular organismal process, and viral reproduction and locomotion, which show that NSCLC is more likely to cause a wide range of cancer cell proliferation and virus infection than SCLC. The network topological properties of SCLC and NSCLC are similar, except the average shortest path length, which indicates that most of the genes of the two lung cancers play a similar function in the entire body. The commonly expressed genes show that all of the genes in the module may also cause NSCLC and SCLC, simultaneously. CONCLUSIONS The proteins in module will involve the same or similar biological functions and the interactions among them induce the occurrence of lung cancer. Moreover, a potential biomarker of SCLC is the interaction between APIP and apoptotic protease activating factor (APAF)1, which share a common module.
Collapse
Affiliation(s)
- Weisan Zhang
- Department of Geriatrics, Tianjin Geriatric Institute, Tianjin Medical University General Hospital Tianjin, China
| | - Qiang Zhang
- Department of Geriatrics, Tianjin Geriatric Institute, Tianjin Medical University General Hospital Tianjin, China
| | - Mingpeng Zhang
- Department of Geriatrics, Tianjin Geriatric Institute, Tianjin Medical University General Hospital Tianjin, China
| | - Yun Zhang
- Department of Geriatrics, Tianjin Geriatric Institute, Tianjin Medical University General Hospital Tianjin, China
| | - Fengtan Li
- Department of Radiology, Tianjin Medical University General Hospital Tianjin, China
| | - Ping Lei
- Department of Geriatrics, Tianjin Geriatric Institute, Tianjin Medical University General Hospital Tianjin, China
| |
Collapse
|
14
|
Karpinets TV, Park BH, Syed MH, Klotz MG, Uberbacher EC. Metabolic environments and genomic features associated with pathogenic and mutualistic interactions between bacteria and plants. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2014; 27:664-677. [PMID: 24580106 DOI: 10.1094/mpmi-12-13-0368-r] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Genomic characteristics discriminating parasitic and mutualistic relationship of bacterial symbionts with plants are poorly understood. This study comparatively analyzed the genomes of 54 mutualists and pathogens to discover genomic markers associated with the different phenotypes. Using metabolic network models, we predict external environments associated with free-living and symbiotic lifestyles and quantify dependences of symbionts on the host in terms of the consumed metabolites. We show that specific differences between the phenotypes are pronounced at the levels of metabolic enzymes, especially carbohydrate active, and protein functions. Overall, biosynthetic functions are enriched and more diverse in plant mutualists whereas processes and functions involved in degradation and host invasion are enriched and more diverse in pathogens. A distinctive characteristic of plant pathogens is a putative novel secretion system with a circadian rhythm regulator. A specific marker of plant mutualists is the co-residence of genes encoding nitrogenase and ribulose bisphosphate carboxylase/oxygenase (RuBisCO). We predict that RuBisCO is likely used in a putative metabolic pathway to supplement carbon obtained heterotrophically with low-cost assimilation of carbon from CO2. We validate results of the comparative analysis by predicting correct phenotype, pathogenic or mutualistic, for 20 symbionts in an independent set of 30 pathogens, mutualists, and commensals.
Collapse
|
15
|
Tang D, Wang M, Zheng W, Wang H. RapidMic: Rapid Computation of the Maximal Information Coefficient. Evol Bioinform Online 2014; 10:11-6. [PMID: 24526831 PMCID: PMC3921152 DOI: 10.4137/ebo.s13121] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2013] [Revised: 11/14/2013] [Accepted: 11/17/2013] [Indexed: 11/05/2022] Open
Abstract
To discover relationships and associations rapidly in large-scale datasets, we propose a cross-platform tool for the rapid computation of the maximal information coefficient based on parallel computing methods. Through parallel processing, the provided tool can effectively analyze large-scale biological datasets with a markedly reduced computing time. The experimental results show that the proposed tool is notably fast, and is able to perform an all-pairs analysis of a large biological dataset using a normal computer. The source code and guidelines can be downloaded from https://github.com/HelloWorldCN/RapidMic.
Collapse
Affiliation(s)
- Dongming Tang
- Institute of Information Research, Southwest Jiaotong University, Chengdu, China
| | - Mingwen Wang
- School of Mathematics, Southwest Jiaotong University, Chengdu, China
| | - Weifan Zheng
- Institute of Information Research, Southwest Jiaotong University, Chengdu, China
| | - Hongjun Wang
- School of Information Science and Technology, Southwest Jiaotong University, Chengdu, China
| |
Collapse
|
16
|
Abstract
The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations.
Collapse
|
17
|
Naulaerts S, Meysman P, Bittremieux W, Vu TN, Vanden Berghe W, Goethals B, Laukens K. A primer to frequent itemset mining for bioinformatics. Brief Bioinform 2013; 16:216-31. [PMID: 24162173 PMCID: PMC4364064 DOI: 10.1093/bib/bbt074] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences.
Collapse
|
18
|
Albanese D, Filosi M, Visintainer R, Riccadonna S, Jurman G, Furlanello C. Minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers. ACTA ACUST UNITED AC 2012; 29:407-8. [PMID: 23242262 DOI: 10.1093/bioinformatics/bts707] [Citation(s) in RCA: 129] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
UNLABELLED We introduce a novel implementation in ANSI C of the MINE family of algorithms for computing maximal information-based measures of dependence between two variables in large datasets, with the aim of a low memory footprint and ease of integration within bioinformatics pipelines. We provide the libraries minerva (with the R interface) and minepy for Python, MATLAB, Octave and C++. The C solution reduces the large memory requirement of the original Java implementation, has good upscaling properties and offers a native parallelization for the R interface. Low memory requirements are demonstrated on the MINE benchmarks as well as on large ( = 1340) microarray and Illumina GAII RNA-seq transcriptomics datasets. AVAILABILITY AND IMPLEMENTATION Source code and binaries are freely available for download under GPL3 licence at http://minepy.sourceforge.net for minepy and through the CRAN repository http://cran.r-project.org for the R package minerva. All software is multiplatform (MS Windows, Linux and OSX).
Collapse
Affiliation(s)
- Davide Albanese
- Fondazione Bruno Kessler, via Sommarive 18, I-38123 Povo (Trento), Italy
| | | | | | | | | | | |
Collapse
|