1
|
Kang Y, Yuan L, Shi X, Chu Y, He Z, Jia X, Lin Q, Ma Q, Wang J, Xiao J, Hu S, Gao Z, Chen F, Yu J. A fine-scale map of genome-wide recombination in divergent Escherichia coli population. Brief Bioinform 2020; 22:6034796. [PMID: 33319232 DOI: 10.1093/bib/bbaa335] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 10/19/2020] [Accepted: 10/23/2020] [Indexed: 01/09/2023] Open
Abstract
Recombination is one of the most important molecular mechanisms of prokaryotic genome evolution, but its exact roles are still in debate. Here we try to infer genome-wide recombination within a species, utilizing a dataset of 149 complete genomes of Escherichia coli from diverse animal hosts and geographic origins, including 45 in-house sequenced with the single-molecular real-time platform. Two major clades identified based on physiological, clinical and ecological characteristics form distinct genetic lineages based on scarcity of interclade gene exchanges. By defining gene-based syntenies for genomic segments within and between the two clades, we build a fine-scale recombination map for this representative global E. coli population. The map suggests extensive within-clade recombination that often breaks physical linkages among individual genes but seldom interrupts the structure of genome organizational frameworks as well as primary metabolic portfolios supported by the framework integrity, possibly due to strong natural selection for both physiological compatibility and ecological fitness. In contrast, the between-clade recombination declines drastically when phylogenetic distance increases to the extent where a 10-fold reduction can be observed, establishing a firm genetic barrier between clades. Our empirical data suggest a critical role for such recombination events in the early stage of speciation where recombination rate is associated with phylogenetic distance in addition to sequence and gene variations. The extensive intraclade recombination binds sister strains into a quasisexual group and optimizes genes or alleles to streamline physiological activities, whereas the sharply declined interclade recombination split the population into clades adaptive to divergent ecological niches.
Collapse
Affiliation(s)
- Yu Kang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100101, Beijing, PR China.,China National Center for Bioinformation, Beijing 100101, PR China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lina Yuan
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100101, Beijing, PR China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xing Shi
- Department of Respiratory & Critical Care Medicine, Peking University People's Hospital, Beijing, 100044, PR China
| | - Yanan Chu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100101, Beijing, PR China.,China National Center for Bioinformation, Beijing 100101, PR China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zilong He
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Interdisciplinary Innovation Institute of Medicine and Engineering, Beihang University, Beijing, 100191, PR China
| | - Xinmiao Jia
- Medical Research Center, Peking Union Medical College Hospital, Peking Union Medical College & Chinese Academy of Medical Sciences, Beijing 100730, PR China
| | - Qiang Lin
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Qin Ma
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, 57007, USA
| | - Jian Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100101, Beijing, PR China.,China National Center for Bioinformation, Beijing 100101, PR China
| | - Jingfa Xiao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100101, Beijing, PR China.,China National Center for Bioinformation, Beijing 100101, PR China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Songnian Hu
- University of Chinese Academy of Sciences, Beijing 100049, China.,State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, 100101, Beijing, PR China
| | - Zhancheng Gao
- Department of Respiratory & Critical Care Medicine, Peking University People's Hospital, Beijing, 100044, PR China
| | - Fei Chen
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100101, Beijing, PR China.,China National Center for Bioinformation, Beijing 100101, PR China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, 100101, Beijing, PR China.,China National Center for Bioinformation, Beijing 100101, PR China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
2
|
Liu Z, Feng J, Yu B, Ma Q, Liu B. The functional determinants in the organization of bacterial genomes. Brief Bioinform 2020; 22:5892344. [PMID: 32793986 DOI: 10.1093/bib/bbaa172] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 06/30/2020] [Accepted: 07/07/2020] [Indexed: 12/13/2022] Open
Abstract
Bacterial genomes are now recognized as interacting intimately with cellular processes. Uncovering organizational mechanisms of bacterial genomes has been a primary focus of researchers to reveal the potential cellular activities. The advances in both experimental techniques and computational models provide a tremendous opportunity for understanding these mechanisms, and various studies have been proposed to explore the organization rules of bacterial genomes associated with functions recently. This review focuses mainly on the principles that shape the organization of bacterial genomes, both locally and globally. We first illustrate local structures as operons/transcription units for facilitating co-transcription and horizontal transfer of genes. We then clarify the constraints that globally shape bacterial genomes, such as metabolism, transcription and replication. Finally, we highlight challenges and opportunities to advance bacterial genomic studies and provide application perspectives of genome organization, including pathway hole assignment and genome assembly and understanding disease mechanisms.
Collapse
Affiliation(s)
| | | | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology
| | - Qin Ma
- Department of Biomedical Informatics, the Ohio State University
| | | |
Collapse
|
3
|
McDermaid A, Monier B, Zhao J, Liu B, Ma Q. Interpretation of differential gene expression results of RNA-seq data: review and integration. Brief Bioinform 2019; 20:2044-2054. [PMID: 30099484 PMCID: PMC6954399 DOI: 10.1093/bib/bby067] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 06/21/2018] [Accepted: 07/04/2018] [Indexed: 12/23/2022] Open
Abstract
Differential gene expression (DGE) analysis is one of the most common applications of RNA-sequencing (RNA-seq) data. This process allows for the elucidation of differentially expressed genes across two or more conditions and is widely used in many applications of RNA-seq data analysis. Interpretation of the DGE results can be nonintuitive and time consuming due to the variety of formats based on the tool of choice and the numerous pieces of information provided in these results files. Here we reviewed DGE results analysis from a functional point of view for various visualizations. We also provide an R/Bioconductor package, Visualization of Differential Gene Expression Results using R, which generates information-rich visualizations for the interpretation of DGE results from three widely used tools, Cuffdiff, DESeq2 and edgeR. The implemented functions are also tested on five real-world data sets, consisting of one human, one Malus domestica and three Vitis riparia data sets.
Collapse
Affiliation(s)
- Adam McDermaid
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, USA
| | - Brandon Monier
- Department of Biology and Microbiology, South Dakota State University, SD, USA
| | - Jing Zhao
- Department of Internal Medicine, Sanford Research, University of South Dakota Sanford School of Medicine
| | | | - Qin Ma
- Department of Agronomy, Horticulture, and Plant Science, Bioinformatics and Mathematical Biosciences Lab, South Dakota State University
- Department of Mathematics and Statistics of SDSU, BioSNTR and Sanford Research, USA
| |
Collapse
|
4
|
Ma Q, Bücking H, Gonzalez Hernandez JL, Subramanian S. Single-Cell RNA Sequencing of Plant-Associated Bacterial Communities. Front Microbiol 2019; 10:2452. [PMID: 31736899 PMCID: PMC6828647 DOI: 10.3389/fmicb.2019.02452] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 10/11/2019] [Indexed: 11/29/2022] Open
Abstract
Plants in soil are not solitary, hence continually interact with and obtain benefits from a community of microbes ("microbiome"). The meta-functional output from the microbiome results from complex interactions among the different community members with distinct taxonomic identities and metabolic capacities. Particularly, the bacterial communities of the root surface are spatially organized structures composed of root-attached biofilms and planktonic cells arranged in complex layers. With the distinct but coordinated roles among the different member cells, bacterial communities resemble properties of a multicellular organism. High throughput sequencing technologies have allowed rapid and large-scale analysis of taxonomic composition and metabolic capacities of bacterial communities. However, these methods are generally unable to reconstruct the assembly of these communities, or how the gene expression patterns in individual cells/species are coordinated within these communities. Single-cell transcriptomes of community members can identify how gene expression patterns vary among members of the community, including differences among different cells of the same species. This information can be used to classify cells based on functional gene expression patterns, and predict the spatial organization of the community. Here we discuss strategies for the isolation of single bacterial cells, mRNA enrichment, library construction, and analysis and interpretation of the resulting single-cell RNA-Seq datasets. Unraveling regulatory and metabolic processes at the single cell level is expected to yield an unprecedented discovery of mechanisms involved in bacterial recruitment, attachment, assembly, organization of the community, or in the specific interactions among the different members of these communities.
Collapse
Affiliation(s)
- Qin Ma
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
| | - Heike Bücking
- Biology and Microbiology Department, South Dakota State University, Brookings, SD, United States
| | - Jose L. Gonzalez Hernandez
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
- Biology and Microbiology Department, South Dakota State University, Brookings, SD, United States
| | - Senthil Subramanian
- Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
- Biology and Microbiology Department, South Dakota State University, Brookings, SD, United States
| |
Collapse
|
5
|
Abstract
Affordable, high-throughput DNA sequencing has accelerated the pace of genome assembly over the past decade. Genome assemblies from high-throughput, short-read sequencing, however, are often not as contiguous as the first generation of genome assemblies. Whereas early genome assembly projects were often aided by clone maps or other mapping data, many current assembly projects forego these scaffolding data and only assemble genomes into smaller segments. Recently, new technologies have been invented that allow chromosome-scale assembly at a lower cost and faster speed than traditional methods. Here, we give an overview of the problem of chromosome-scale assembly and traditional methods for tackling this problem. We then review new technologies for chromosome-scale assembly and recent genome projects that used these technologies to create highly contiguous genome assemblies at low cost.
Collapse
Affiliation(s)
- Edward S. Rice
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA;,
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA;,
- Dovetail Genomics, LLC, Santa Cruz, California 95060, USA
| |
Collapse
|
6
|
McDermaid A, Chen X, Zhang Y, Wang C, Gu S, Xie J, Ma Q. A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation. Front Genet 2018; 9:313. [PMID: 30154828 PMCID: PMC6102479 DOI: 10.3389/fgene.2018.00313] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 07/23/2018] [Indexed: 11/29/2022] Open
Abstract
One of the main benefits of using modern RNA-Sequencing (RNA-Seq) technology is the more accurate gene expression estimations compared with previous generations of expression data, such as the microarray. However, numerous issues can result in the possibility that an RNA-Seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple-mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to effectively test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses. Our investigation into 95 RNA-Seq datasets from seven plant and animal species (totaling 1,951 GB) indicates an average of roughly 22% of all reads are MMRs. Here we present a machine learning-based tool called GeneQC (Gene expression Quality Control), which can accurately estimate the reliability of each gene's expression level derived from an RNA-Seq dataset. The underlying algorithm is designed based on extracted genomic and transcriptomic features, which are then combined using elastic-net regularization and mixture model fitting to provide a clearer picture of mapping uncertainty for each gene. GeneQC allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that is of sufficient quality. This tool also enables researchers to investigate continued re-alignment methods to determine more accurate gene expression estimates for those with low reliability. Application of GeneQC reveals high level of mapping uncertainty in plant samples and limited, severe mapping uncertainty in animal samples. GeneQC is freely available at http://bmbl.sdstate.edu/GeneQC/home.html.
Collapse
Affiliation(s)
- Adam McDermaid
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, United States
| | - Xin Chen
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Yiran Zhang
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
- Department of Electrical Engineering and Computer Science, South Dakota State University, Brookings, SD, United States
| | - Cankun Wang
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
| | - Shaopeng Gu
- Department of Electrical Engineering and Computer Science, South Dakota State University, Brookings, SD, United States
| | - Juan Xie
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, United States
| | - Qin Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture, and Plant Science, South Dakota State University, Brookings, SD, United States
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, United States
| |
Collapse
|