1
|
Genome-powered classification of microbial eukaryotes: focus on coral algal symbionts. Trends Microbiol 2022. [PMID: 35227551 DOI: 10.1016/j.tim.2022.1002.1001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Modern microbial taxonomy generally relies on the use of single marker genes or sets of concatenated genes to generate a framework for the delineation and classification of organisms at different taxonomic levels. However, given that DNA is the 'blueprint of life', and hence the ultimate arbiter of taxonomy, classification systems should attempt to use as much of the blueprint as possible to capture a comprehensive phylogenetic signal. Recent analysis of whole-genome sequences from coral reef symbionts (dinoflagellates of the family Symbiodiniaceae) and other microalgal groups has uncovered extensive divergence not recognised by current algal taxonomic approaches. In the era of 'sequence everything', we argue that whole-genome data are pivotal to guide informed taxonomic inference, particularly for microbial eukaryotes.
Collapse
|
2
|
Genome-powered classification of microbial eukaryotes: focus on coral algal symbionts. Trends Microbiol 2022; 30:831-840. [DOI: 10.1016/j.tim.2022.02.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 01/20/2022] [Accepted: 02/01/2022] [Indexed: 12/20/2022]
|
3
|
Comparison of 15 dinoflagellate genomes reveals extensive sequence and structural divergence in family Symbiodiniaceae and genus Symbiodinium. BMC Biol 2021; 19:73. [PMID: 33849527 PMCID: PMC8045281 DOI: 10.1186/s12915-021-00994-6] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Accepted: 02/25/2021] [Indexed: 02/07/2023] Open
Abstract
Background Dinoflagellates in the family Symbiodiniaceae are important photosynthetic symbionts in cnidarians (such as corals) and other coral reef organisms. Breakdown of the coral-dinoflagellate symbiosis due to environmental stress (i.e. coral bleaching) can lead to coral death and the potential collapse of reef ecosystems. However, evolution of Symbiodiniaceae genomes, and its implications for the coral, is little understood. Genome sequences of Symbiodiniaceae remain scarce due in part to their large genome sizes (1–5 Gbp) and idiosyncratic genome features. Results Here, we present de novo genome assemblies of seven members of the genus Symbiodinium, of which two are free-living, one is an opportunistic symbiont, and the remainder are mutualistic symbionts. Integrating other available data, we compare 15 dinoflagellate genomes revealing high sequence and structural divergence. Divergence among some Symbiodinium isolates is comparable to that among distinct genera of Symbiodiniaceae. We also recovered hundreds of gene families specific to each lineage, many of which encode unknown functions. An in-depth comparison between the genomes of the symbiotic Symbiodinium tridacnidorum (isolated from a coral) and the free-living Symbiodinium natans reveals a greater prevalence of transposable elements, genetic duplication, structural rearrangements, and pseudogenisation in the symbiotic species. Conclusions Our results underscore the potential impact of lifestyle on lineage-specific gene-function innovation, genome divergence, and the diversification of Symbiodinium and Symbiodiniaceae. The divergent features we report, and their putative causes, may also apply to other microbial eukaryotes that have undergone symbiotic phases in their evolutionary history. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-021-00994-6.
Collapse
|
4
|
Genomic signatures in the coral holobiont reveal host adaptations driven by Holocene climate change and reef specific symbionts. SCIENCE ADVANCES 2020; 6:6/48/eabc6318. [PMID: 33246955 PMCID: PMC7695477 DOI: 10.1126/sciadv.abc6318] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 10/15/2020] [Indexed: 05/24/2023]
Abstract
Genetic signatures caused by demographic and adaptive processes during past climatic shifts can inform predictions of species' responses to anthropogenic climate change. To identify these signatures in Acropora tenuis, a reef-building coral threatened by global warming, we first assembled the genome from long reads and then used shallow whole-genome resequencing of 150 colonies from the central inshore Great Barrier Reef to inform population genomic analyses. We identify population structure in the host that reflects a Pleistocene split, whereas photosymbiont differences between reefs most likely reflect contemporary (Holocene) conditions. Signatures of selection in the host were associated with genes linked to diverse processes including osmotic regulation, skeletal development, and the establishment and maintenance of symbiosis. Our results suggest that adaptation to post-glacial climate change in A. tenuis has involved selection on many genes, while differences in symbiont specificity between reefs appear to be unrelated to host population structure.
Collapse
|
5
|
Comparative transcriptomic analyses of Chromera and Symbiodiniaceae. ENVIRONMENTAL MICROBIOLOGY REPORTS 2020; 12:435-443. [PMID: 32452166 DOI: 10.1111/1758-2229.12859] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 05/12/2020] [Accepted: 05/21/2020] [Indexed: 06/11/2023]
Abstract
Reef-building corals live in a mutualistic relationship with photosynthetic algae (family Symbiodiniaceae) that usually provide most of the energy required by the coral host. This relationship is sensitive to temperature stress; as little as a 1°C increase often leads to the collapse of the association. This sensitivity has led to an interest in the potential of more stress-tolerant algae to supplement or substitute for the normal Symbiodiniaceae mutualists. In this respect, the apicomplexan-like microalga Chromera is of particular interest due to its greater temperature tolerance. We generated a de novo transcriptome for a Chromera strain isolated from a GBR coral ('GBR Chromera') and compared with those of the reference strain of Chromera ('Sydney Chromera'), and to those of Symbiodiniaceae (Fugacium kawagutii, Cladocopium goreaui and Breviolum minutum), as well as the apicomplexan parasite, Plasmodium falciparum. In contrast to the high sequence divergence amongst representatives of different genera within the family Symbiodiniaceae, the two Chromera strains featured low sequence divergence at orthologous genes, implying that they are likely to be conspecifics. Although KEGG categories provide few criteria by which true coral mutualists might be identified, they do supply a molecular rationalization that explains the ecological dominance of Cladocopium spp. amongst Indo-Pacific reef corals. The presence of HSP20 genes may contribute to the high thermal tolerance of Chromera.
Collapse
|
6
|
Genomes of the dinoflagellate Polarella glacialis encode tandemly repeated single-exon genes with adaptive functions. BMC Biol 2020; 18:56. [PMID: 32448240 PMCID: PMC7245778 DOI: 10.1186/s12915-020-00782-8] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Accepted: 04/20/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Dinoflagellates are taxonomically diverse and ecologically important phytoplankton that are ubiquitously present in marine and freshwater environments. Mostly photosynthetic, dinoflagellates provide the basis of aquatic primary production; most taxa are free-living, while some can form symbiotic and parasitic associations with other organisms. However, knowledge of the molecular mechanisms that underpin the adaptation of these organisms to diverse ecological niches is limited by the scarce availability of genomic data, partly due to their large genome sizes estimated up to 250 Gbp. Currently available dinoflagellate genome data are restricted to Symbiodiniaceae (particularly symbionts of reef-building corals) and parasitic lineages, from taxa that have smaller genome size ranges, while genomic information from more diverse free-living species is still lacking. RESULTS Here, we present two draft diploid genome assemblies of the free-living dinoflagellate Polarella glacialis, isolated from the Arctic and Antarctica. We found that about 68% of the genomes are composed of repetitive sequence, with long terminal repeats likely contributing to intra-species structural divergence and distinct genome sizes (3.0 and 2.7 Gbp). For each genome, guided using full-length transcriptome data, we predicted > 50,000 high-quality protein-coding genes, of which ~40% are in unidirectional gene clusters and ~25% comprise single exons. Multi-genome comparison unveiled genes specific to P. glacialis and a common, putatively bacterial origin of ice-binding domains in cold-adapted dinoflagellates. CONCLUSIONS Our results elucidate how selection acts within the context of a complex genome structure to facilitate local adaptation. Because most dinoflagellate genes are constitutively expressed, Polarella glacialis has enhanced transcriptional responses via unidirectional, tandem duplication of single-exon genes that encode functions critical to survival in cold, low-light polar environments. These genomes provide a foundational reference for future research on dinoflagellate evolution.
Collapse
|
7
|
Alignment-free inference of hierarchical and reticulate phylogenomic relationships. Brief Bioinform 2019; 20:426-435. [PMID: 28673025 PMCID: PMC6433738 DOI: 10.1093/bib/bbx067] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 05/04/2017] [Indexed: 11/22/2022] Open
Abstract
We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed.
Collapse
|
8
|
Genome Evolution of Coral Reef Symbionts as Intracellular Residents. Trends Ecol Evol 2019; 34:799-806. [DOI: 10.1016/j.tree.2019.04.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Revised: 04/10/2019] [Accepted: 04/15/2019] [Indexed: 02/07/2023]
|
9
|
Abstract
BACKGROUND Whole-genome sequencing (WGS) is a powerful method for revealing the diversity and complexity of the somatic mutation burden of tumours. Here, we investigated the utility of tumour and matched germline WGS for understanding aetiology and treatment opportunities for high-risk individuals with familial breast cancer. PATIENTS AND METHODS We carried out WGS on 78 paired germline and tumour DNA samples from individuals carrying pathogenic variants in BRCA1 (n = 26) or BRCA2 (n = 22) or from non-carriers (non-BRCA1/2; n = 30). RESULTS Matched germline/tumour WGS and somatic mutational signature analysis revealed patients with unreported, dual pathogenic germline variants in cancer risk genes (BRCA1/BRCA2; BRCA1/MUTYH). The strategy identified that 100% of tumours from BRCA1 carriers and 91% of tumours from BRCA2 carriers exhibited biallelic inactivation of the respective gene, together with somatic mutational signatures suggestive of a functional deficiency in homologous recombination. A set of non-BRCA1/2 tumours also had somatic signatures indicative of BRCA-deficiency, including tumours with BRCA1 promoter methylation, and tumours from carriers of a PALB2 pathogenic germline variant and a BRCA2 variant of uncertain significance. A subset of 13 non-BRCA1/2 tumours from early onset cases were BRCA-proficient, yet displayed complex clustered structural rearrangements associated with the amplification of oncogenes and pathogenic germline variants in TP53, ATM and CHEK2. CONCLUSIONS Our study highlights the role that WGS of matched germline/tumour DNA and the somatic mutational signatures can play in the discovery of pathogenic germline variants and for providing supporting evidence for variant pathogenicity. WGS-derived signatures were more robust than germline status and other genomic predictors of homologous recombination deficiency, thus impacting the selection of platinum-based or PARP inhibitor therapy. In this first examination of non-BRCA1/2 tumours by WGS, we illustrate the considerable heterogeneity of these tumour genomes and highlight that complex genomic rearrangements may drive tumourigenesis in a subset of cases.
Collapse
|
10
|
Whole-genome sequence of the bovine blood fluke Schistosoma bovis supports interspecific hybridization with S. haematobium. PLoS Pathog 2019; 15:e1007513. [PMID: 30673782 PMCID: PMC6361461 DOI: 10.1371/journal.ppat.1007513] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 02/04/2019] [Accepted: 12/07/2018] [Indexed: 11/18/2022] Open
Abstract
Mesenteric infection by the parasitic blood fluke Schistosoma bovis is a common veterinary problem in Africa and the Middle East and occasionally in the Mediterranean Region. The species also has the ability to form interspecific hybrids with the human parasite S. haematobium with natural hybridisation observed in West Africa, presenting possible zoonotic transmission. Additionally, this exchange of alleles between species may dramatically influence disease dynamics and parasite evolution. We have generated a 374 Mb assembly of the S. bovis genome using Illumina and PacBio-based technologies. Despite infecting different hosts and organs, the genome sequences of S. bovis and S. haematobium appeared strikingly similar with 97% sequence identity. The two species share 98% of protein-coding genes, with an average sequence identity of 97.3% at the amino acid level. Genome comparison identified large continuous parts of the genome (up to several 100 kb) showing almost 100% sequence identity between S. bovis and S. haematobium. It is unlikely that this is a result of genome conservation and provides further evidence of natural interspecific hybridization between S. bovis and S. haematobium. Our results suggest that foreign DNA obtained by interspecific hybridization was maintained in the population through multiple meiosis cycles and that hybrids were sexually reproductive, producing viable offspring. The S. bovis genome assembly forms a highly valuable resource for studying schistosome evolution and exploring genetic regions that are associated with species-specific phenotypic traits. In this article we detail the assembly and functional annotation of the Schistosoma bovis genome. S. bovis is a parasitic flatworm that primarily infects bovines, with important economic consequences in affected countries. However, it is also a close relative of the human carcinogenic parasite Schistosoma haematobium which is a serious health issue in many endemic countries in Sub-Saharan Africa. The close relationship and overlapping geographical distribution of S. bovis and S. haematobium allows these to hybridise in the wild increasing their genetic diversity and presenting the risk of zoonotic transmission, i.e. the transmission from animals to humans. The hybridization between human and ruminant schistosomes is of particular interest as interspecific hybridization may have dramatic impacts on transmission rates, disease dynamics, control interventions and parasite evolution. By whole-genome sequencing and comparative genomics we present evidence that fertile hybrids are indeed present in the wild, presenting the potential risk of transmission from animal reservoirs to humans.
Collapse
|
11
|
Whole-genome sequence of the oriental lung fluke Paragonimus westermani. Gigascience 2019; 8:5232231. [PMID: 30520948 PMCID: PMC6329441 DOI: 10.1093/gigascience/giy146] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2018] [Accepted: 11/19/2018] [Indexed: 01/16/2023] Open
Abstract
Background Foodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas. Approximately 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease. Despite their medical and economic importance, no genome sequence for any Paragonimus species is available. Results We sequenced and assembled the genome of P. westermani, which is among the largest of the known pathogen genomes with an estimated size of 1.1 Gb. A 922.8 Mb genome assembly was generated from Illumina and Pacific Biosciences (PacBio) sequence data, covering 84% of the estimated genome size. The genome has a high proportion (45%) of repeat-derived DNA, particularly of the long interspersed element and long terminal repeat subtypes, and the expansion of these elements may explain some of the large size. We predicted 12,852 protein coding genes, showing a high level of conservation with related trematode species. The majority of proteins (80%) had homologs in the human liver fluke Opisthorchis viverrini, with an average sequence identity of 64.1%. Assembly of the P. westermani mitochondrial genome from long PacBio reads resulted in a single high-quality circularized 20.6 kb contig. The contig harbored a 6.9 kb region of non-coding repetitive DNA comprised of three distinct repeat units. Our results suggest that the region is highly polymorphic in P. westermani, possibly even within single worm isolates. Conclusions The generated assembly represents the first Paragonimus genome sequence and will facilitate future molecular studies of this important, but neglected, parasite group.
Collapse
|
12
|
Quantitative Modelling of the Waddington Epigenetic Landscape. Methods Mol Biol 2019; 1975:157-171. [PMID: 31062309 DOI: 10.1007/978-1-4939-9224-9_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
C.H. Waddington introduced the epigenetic landscape as a metaphor to represent cellular decision-making during development. Like a population of balls rolling down a rough hillside, developing cells follow specific trajectories (valleys) and eventually come to rest in one or another low-energy state that represents a mature cell type. Waddington depicted the topography of this landscape as determined by interactions among gene products, thereby connecting genotype to phenotype. In modern terms, each point on the landscape represents a state of the underlying genetic regulatory network, which in turn is described by a gene expression profile. In this chapter we demonstrate how the mathematical formalism of Hopfield networks can be used to model this epigenetic landscape. Hopfield networks are auto-associative artificial neural networks; input patterns are stored as attractors of the network and can be recalled from noisy or incomplete inputs. The resulting models capture the temporal dynamics of a gene regulatory network, yielding quantitative insight into cellular development and phenotype.
Collapse
|
13
|
k-mer Similarity, Networks of Microbial Genomes, and Taxonomic Rank. mSystems 2018; 3:e00257-18. [PMID: 30505941 PMCID: PMC6247013 DOI: 10.1128/msystems.00257-18] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 11/02/2018] [Indexed: 01/27/2023] Open
Abstract
Microbial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences, and a tree-like structure does not adequately capture how these processes impact microbial physiology. Here we adopted alignment-free approaches based on k-mer statistics to infer phylogenomic networks involving 2,783 completely sequenced bacterial and archaeal genomes and compared the contributions of rRNA, protein-coding, and plasmid sequences to these networks. Our results show that the phylogenomic signal arising from ribosomal RNAs is strong and extends broadly across all taxa, whereas that from plasmids is strong but restricted to closely related groups, particularly Proteobacteria. However, the signal from the other chromosomal regions is restricted in breadth. We show that mean k-mer similarity can correlate with taxonomic rank. We also link the implicated k-mers to genome annotation (thus, functions) and define core k-mers (thus, core functions) in specific phyletic groups. Highly conserved functions in most phyla include amino acid metabolism and transport as well as energy production and conversion. Intracellular trafficking and secretion are the most prominent core functions among Spirochaetes, whereas energy production and conversion are not highly conserved among the largely parasitic or commensal Tenericutes. These observations suggest that differential conservation of functions relates to niche specialization and evolutionary diversification of microbes. Our results demonstrate that k-mer approaches can be used to efficiently identify phylogenomic signals and conserved core functions at the multigenome scale. IMPORTANCE Genome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularly Proteobacteria. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.
Collapse
|
14
|
Symbiodinium genomes reveal adaptive evolution of functions related to coral-dinoflagellate symbiosis. Commun Biol 2018; 1:95. [PMID: 30271976 PMCID: PMC6123633 DOI: 10.1038/s42003-018-0098-3] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Accepted: 06/21/2018] [Indexed: 12/20/2022] Open
Abstract
Symbiosis between dinoflagellates of the genus Symbiodinium and reef-building corals forms the trophic foundation of the world’s coral reef ecosystems. Here we present the first draft genome of Symbiodinium goreaui (Clade C, type C1: 1.03 Gbp), one of the most ubiquitous endosymbionts associated with corals, and an improved draft genome of Symbiodinium kawagutii (Clade F, strain CS-156: 1.05 Gbp) to further elucidate genomic signatures of this symbiosis. Comparative analysis of four available Symbiodinium genomes against other dinoflagellate genomes led to the identification of 2460 nuclear gene families (containing 5% of Symbiodinium genes) that show evidence of positive selection, including genes involved in photosynthesis, transmembrane ion transport, synthesis and modification of amino acids and glycoproteins, and stress response. Further, we identify extensive sets of genes for meiosis and response to light stress. These draft genomes provide a foundational resource for advancing our understanding of Symbiodinium biology and the coral-algal symbiosis. Huanle Liu et al. report draft genomes of two Symbiodinium species, one from the most dominant type of symbionts in reef-building corals. They find evidence of positive selection in genes related to stress response, meiosis and other traits required for forming successful symbiotic relationships.
Collapse
|
15
|
Abstract
We have developed an alignment-free method that calculates phylogenetic distances using a maximum-likelihood approach for a model of sequence change on patterns that are discovered in unaligned sequences. To evaluate the phylogenetic accuracy of our method, and to conduct a comprehensive comparison of existing alignment-free methods (freely available as Python package decaf+py at http://www.bioinformatics.org.au ), we have created a data set of reference trees covering a wide range of phylogenetic distances. Amino acid sequences were evolved along the trees and input to the tested methods; from their calculated distances we infered trees whose topologies we compared to the reference trees. We find our pattern-based method statistically superior to all other tested alignment-free methods. We also demonstrate the general advantage of alignment-free methods over an approach based on automated alignments when sequences violate the assumption of collinearity. Similarly, we compare methods on empirical data from an existing alignment benchmark set that we used to derive reference distances and trees. Our pattern-based approach yields distances that show a linear relationship to reference distances over a substantially longer range than other alignment-free methods. The pattern-based approach outperforms alignment-free methods and its phylogenetic accuracy is statistically indistinguishable from alignment-based distances.
Collapse
|
16
|
Signatures of adaptation and symbiosis in genomes and transcriptomes of Symbiodinium. Sci Rep 2017; 7:15021. [PMID: 29101370 PMCID: PMC5670126 DOI: 10.1038/s41598-017-15029-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 10/19/2017] [Indexed: 12/02/2022] Open
Abstract
Symbiodinium is best-known as the photosynthetic symbiont of corals, but some clades are symbiotic in other organisms or include free-living forms. Identifying similarities and differences among these clades can help us understand their relationship with corals, and thereby inform on measures to manage coral reefs in a changing environment. Here, using sequences from 24 publicly available transcriptomes and genomes of Symbiodinium, we assessed 78,389 gene families in Symbiodinium clades and the immediate outgroup Polarella glacialis, and identified putative overrepresented functions in gene families that (1) distinguish Symbiodinium from other members of Order Suessiales, (2) are shared by all of the Symbiodinium clades for which we have data, and (3) based on available information, are specific to each clade. Our findings indicate that transmembrane transport, mechanisms of response to reactive oxygen species, and protection against UV radiation are functions enriched in all Symbiodinium clades but not in P. glacialis. Enrichment of these functions indicates the capability of Symbiodinium to establish and maintain symbiosis, and to respond and adapt to its environment. The observed differences in lineage-specific gene families imply extensive genetic divergence among clades. Our results provide a platform for future investigation of lineage- or clade-specific adaptation of Symbiodinium to their environment.
Collapse
|
17
|
Evolutionary conservation of a core root microbiome across plant phyla along a tropical soil chronosequence. Nat Commun 2017; 8:215. [PMID: 28790312 PMCID: PMC5548757 DOI: 10.1038/s41467-017-00262-8] [Citation(s) in RCA: 146] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 06/14/2017] [Indexed: 11/30/2022] Open
Abstract
Culture-independent molecular surveys of plant root microbiomes indicate that soil type generally has a stronger influence on microbial communities than host phylogeny. However, these studies have mostly focussed on model plants and crops. Here, we examine the root microbiomes of multiple plant phyla including lycopods, ferns, gymnosperms, and angiosperms across a soil chronosequence using 16S rRNA gene amplicon profiling. We confirm that soil type is the primary determinant of root-associated bacterial community composition, but also observe a significant correlation with plant phylogeny. A total of 47 bacterial genera are associated with roots relative to bulk soil microbial communities, including well-recognized plant-associated genera such as Bradyrhizobium, Rhizobium, and Burkholderia, and major uncharacterized lineages such as WPS-2, Ellin329, and FW68. We suggest that these taxa collectively constitute an evolutionarily conserved core root microbiome at this site. This lends support to the inference that a core root microbiome has evolved with terrestrial plants over their 400 million year history. Yeoh et al. study root microbiomes of different plant phyla across a tropical soil chronosequence. They confirm that soil type is the primary determinant of root-associated bacterial communities, but also observe a clear correlation with plant phylogeny and define a core root microbiome at this site.
Collapse
|
18
|
Erratum to: Introducing BASE: the Biomes of Australian Soil Environments soil microbial diversity database. Gigascience 2017; 6:3806414. [PMID: 30137319 PMCID: PMC5437940 DOI: 10.1093/gigascience/gix021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
|
19
|
Modeling the Attractor Landscape of Disease Progression: a Network-Based Approach. Front Genet 2017; 8:48. [PMID: 28458684 PMCID: PMC5394169 DOI: 10.3389/fgene.2017.00048] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 03/31/2017] [Indexed: 12/25/2022] Open
Abstract
Genome-wide regulatory networks enable cells to function, develop, and survive. Perturbation of these networks can lead to appearance of a disease phenotype. Inspired by Conrad Waddington's epigenetic landscape of cell development, we use a Hopfield network formalism to construct an attractor landscape model of disease progression based on protein- or gene-correlation networks of Parkinson's disease, glioma, and colorectal cancer. Attractors in this landscape correspond to normal and disease states of the cell. We introduce approaches to estimate the size and robustness of these attractors, and take a network-based approach to study their biological features such as the key genes and their functions associated with the attractors. Our results show that the attractor of cancer cells is wider than the attractor of normal cells, suggesting a heterogeneous nature of cancer. Perturbation analysis shows that robustness depends on characteristics of the input data (number of samples per time-point, and the fraction which converge to an attractor). We identify unique gene interactions at each stage, which reflect the temporal rewiring of the gene regulatory network (GRN) with disease progression. Our model of the attractor landscape, constructed from large-scale gene expression profiles of individual patients, captures snapshots of disease progression and identifies gene interactions specific to different stages, opening the way for development of stage-specific therapeutic strategies.
Collapse
|
20
|
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF. Front Microbiol 2017; 8:21. [PMID: 28154557 PMCID: PMC5243798 DOI: 10.3389/fmicb.2017.00021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 01/04/2017] [Indexed: 11/13/2022] Open
Abstract
Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k.
Collapse
|
21
|
The metastasis suppressor RARRES3 as an endogenous inhibitor of the immunoproteasome expression in breast cancer cells. Sci Rep 2017; 7:39873. [PMID: 28051153 PMCID: PMC5209724 DOI: 10.1038/srep39873] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 11/28/2016] [Indexed: 01/17/2023] Open
Abstract
In breast cancer metastasis, the dynamic continuum involving pro- and anti-inflammatory regulators can become compromised. Over 600 genes have been implicated in metastasis to bone, lung or brain but how these genes might contribute to perturbation of immune function is poorly understood. To gain insight, we adopted a gene co-expression network approach that draws on the functional parallels between naturally occurring bone marrow-derived mesenchymal stem cells (BM-MSCs) and cancer stem cells (CSCs). Our network analyses indicate a key role for metastasis suppressor RARRES3, including potential to regulate the immunoproteasome (IP), a specialized proteasome induced under inflammatory conditions. Knockdown of RARRES3 in near-normal mammary epithelial and breast cancer cell lines increases overall transcript and protein levels of the IP subunits, but not of their constitutively expressed counterparts. RARRES3 mRNA expression is controlled by interferon regulatory factor IRF1, an inducer of the IP, and is sensitive to depletion of the retinoid-related receptor RORA that regulates various physiological processes including immunity through modulation of gene expression. Collectively, these findings identify a novel regulatory role for RARRES3 as an endogenous inhibitor of IP expression, and contribute to our evolving understanding of potential pathways underlying breast cancer driven immune modulation.
Collapse
|
22
|
Abstract
Lateral genetic transfer (LGT) is the process by which genetic material moves between organisms (and viruses) in the biosphere. Among the many approaches developed for the inference of LGT events from DNA sequence data, methods based on the comparison of phylogenetic trees remain the gold standard for many types of problem. Identifying LGT events from sequenced genomes typically involves a series of steps in which homologous sequences are identified and aligned, phylogenetic trees are inferred, and their topologies are compared to identify unexpected or conflicting relationships. These types of approach have been used to elucidate the nature and extent of LGT and its physiological and ecological consequences throughout the Tree of Life. Advances in DNA sequencing technology have led to enormous increases in the number of sequenced genomes, including ultra-deep sampling of specific taxonomic groups and single cell-based sequencing of unculturable "microbial dark matter." Environmental shotgun sequencing enables the study of LGT among organisms that share the same habitat.This abundance of genomic data offers new opportunities for scientific discovery, but poses two key problems. As ever more genomes are generated, the assembly and annotation of each individual genome receives less scrutiny; and with so many genomes available it is tempting to include them all in a single analysis, but thousands of genomes and millions of genes can overwhelm key algorithms in the analysis pipeline. Identifying LGT events of interest therefore depends on choosing the right dataset, and on algorithms that appropriately balance speed and accuracy given the size and composition of the chosen set of genomes.
Collapse
|
23
|
Abstract
Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared
k-mers (subsequences at fixed length
k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel’s idea of ontogeny, we argue that genome phylogenies can be inferred using
k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.
Collapse
|
24
|
Integrating Multi-omics Data to Dissect Mechanisms of DNA repair Dysregulation in Breast Cancer. Sci Rep 2016; 6:34000. [PMID: 27666291 PMCID: PMC5036051 DOI: 10.1038/srep34000] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 09/01/2016] [Indexed: 12/20/2022] Open
Abstract
DNA repair genes and pathways that are transcriptionally dysregulated in cancer provide the first line of evidence for the altered DNA repair status in tumours, and hence have been explored intensively as a source for biomarker discovery. The molecular mechanisms underlying DNA repair dysregulation, however, have not been systematically investigated in any cancer type. In this study, we performed a statistical analysis to dissect the roles of DNA copy number alteration (CNA), DNA methylation (DM) at gene promoter regions and the expression changes of transcription factors (TFs) in the differential expression of individual DNA repair genes in normal versus tumour breast samples. These gene-level results were summarised at pathway level to assess whether different DNA repair pathways are affected in distinct manners. Our results suggest that CNA and expression changes of TFs are major causes of DNA repair dysregulation in breast cancer, and that a subset of the identified TFs may exert global impacts on the dysregulation of multiple repair pathways. Our work hence provides novel insights into DNA repair dysregulation in breast cancer. These insights improve our understanding of the molecular basis of the DNA repair biomarkers identified thus far, and have potential to inform future biomarker discovery.
Collapse
|
25
|
Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer. Sci Rep 2016; 6:28970. [PMID: 27363362 PMCID: PMC4929450 DOI: 10.1038/srep28970] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 06/13/2016] [Indexed: 12/22/2022] Open
Abstract
Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.
Collapse
|
26
|
Introducing BASE: the Biomes of Australian Soil Environments soil microbial diversity database. Gigascience 2016; 5:21. [PMID: 27195106 PMCID: PMC4870752 DOI: 10.1186/s13742-016-0126-5] [Citation(s) in RCA: 108] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 05/02/2016] [Indexed: 01/27/2023] Open
Abstract
Background Microbial inhabitants of soils are important to ecosystem and planetary functions, yet there are large gaps in our knowledge of their diversity and ecology. The ‘Biomes of Australian Soil Environments’ (BASE) project has generated a database of microbial diversity with associated metadata across extensive environmental gradients at continental scale. As the characterisation of microbes rapidly expands, the BASE database provides an evolving platform for interrogating and integrating microbial diversity and function. Findings BASE currently provides amplicon sequences and associated contextual data for over 900 sites encompassing all Australian states and territories, a wide variety of bioregions, vegetation and land-use types. Amplicons target bacteria, archaea and general and fungal-specific eukaryotes. The growing database will soon include metagenomics data. Data are provided in both raw sequence (FASTQ) and analysed OTU table formats and are accessed via the project’s data portal, which provides a user-friendly search tool to quickly identify samples of interest. Processed data can be visually interrogated and intersected with other Australian diversity and environmental data using tools developed by the ‘Atlas of Living Australia’. Conclusions Developed within an open data framework, the BASE project is the first Australian soil microbial diversity database. The database will grow and link to other global efforts to explore microbial, plant, animal, and marine biodiversity. Its design and open access nature ensures that BASE will evolve as a valuable tool for documenting an often overlooked component of biodiversity and the many microbe-driven processes that are essential to sustain soil function and ecosystem services.
Collapse
|
27
|
PhySortR: a fast, flexible tool for sorting phylogenetic trees in R. PeerJ 2016; 4:e2038. [PMID: 27190724 PMCID: PMC4868591 DOI: 10.7717/peerj.2038] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 04/24/2016] [Indexed: 12/13/2022] Open
Abstract
A frequent bottleneck in interpreting phylogenomic output is the need to screen often thousands of trees for features of interest, particularly robust clades of specific taxa, as evidence of monophyletic relationship and/or reticulated evolution. Here we present PhySortR, a fast, flexible R package for classifying phylogenetic trees. Unlike existing utilities, PhySortR allows for identification of both exclusive and non-exclusive clades uniting the target taxa based on tip labels (i.e., leaves) on a tree, with customisable options to assess clades within the context of the whole tree. Using simulated and empirical datasets, we demonstrate the potential and scalability of PhySortR in analysis of thousands of phylogenetic trees without a priori assumption of tree-rooting, and in yielding readily interpretable trees that unambiguously satisfy the query. PhySortR is a command-line tool that is freely available and easily automatable.
Collapse
|
28
|
Abstract PD6-05: Identifying genetic vulnerabilities in cancers driven by defects in DNA-damage response. Cancer Res 2016. [DOI: 10.1158/1538-7445.sabcs15-pd6-05] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Although defects in cancer susceptibility genes within the DNA-damage response (DDR) machinery including BRCA1 and BRCA2 account for only 5-10% of all breast cancer cases, these defects are highly penetrant and significantly increase the risk of breast (60-80%) and also ovarian (35%) cancers [1]. Together with defects in the DNA-damage sensor ATM, apoptosis effector TP53, and PTEN and CDH1 with roles in regulation of DDR, these account for considerable proportions of sporadic breast (63%) and ovarian (85%) cancers. To compensate for these DDR defects and to avoid cell death triggered from a genomic catastrophe, cancer cells rewire their DDR network while also selecting (during clonal expansion) the optimal combination of oncogenic events. Deciphering these combinations of events would aid in mapping the vulnerabilities of cancer cells harbouring defects in DDR.
While there have been several studies screening for essentiality of genes across DDR-deficient cell-lines, the essential genes so identified are either restricted only to these cell-line models or are not frequently (over)expressed in cancers. Here, we observe that oncogenic events that are mutually exclusive to DDR defects in large proportions of cancers constitute the (clonally) selected combinations that are amenable to cancer-cell survival, and therefore by systematically mining for these events, we infer vulnerability genes that if targeted in conjunction with DDR defects could induce a genomic catastrophe and trigger cancer-cell death.
Using data from DNA copy-number and mRNA-expression profiles we infer vulnerability genes that are mutually exclusive to defects in six DDR genes ATM, BRCA1, BRCA2, CDH1, PTEN and TP53 across four cancers (total 3980 samples) – breast (2029), prostate (623), ovarian (828) and uterine (500) from The Cancer Genome Atlas. Interestingly, across the four cancers these vulnerability genes form the most combinations with BRCA2 (59.02%), followed by CDH1 (24.59%), PTEN (8.20%) and TP53 (8.19%) at p<0.01 (1-hypergeometric test), whereas these show distinct patterns within the individual cancers: combinations dominated by CDH1 (90%) in breast, PTEN (78.38%) and BRCA2 (16.82%) in prostate, and BRCA1 (71.94%) and TP53 (16.21%) in ovarian cancers. Validation using GARP (Gene Activity Rank Profile)-score data from essentiality screens [2] from ten breast cancer cell lines (HCC1143, HCC1187, HCC1395, HCC1419, HCC1428, HCC1500, HCC1806, HCC1954, HCC38, MCF7) which harbour defects in at least one of the six DDR genes shows remarkable agreement between the GARP rankings and our inferred vulnerabilities. Our inferred genes are significantly enriched (p<0.0001 X2 test) in the top quartile of the entire set of profiled (∼16000) essential genes in these screens. Moreover, Kaplan-Meier analysis using survival data from 1000 breast cancer patients shows considerable overexpression of these genes (e.g. TLK2 in 37% luminal cases) which correlates significantly (TLK2: p<0.0006; Grade 3 hazard ratio 2.5) with poor prognosis. Experimental validation of these genes using single- and double knockout with DDR in breast cancer cell lines is currently underway.
[1] Liu & Srihari et al., Nucl Acids Res 2014, 42(10):6106-27.
[2] Marcotte et al., Cancer Discov 2012, 2(2):172-89.
Citation Format: Srihari S, Singla J, Wong L, Simpson PT, Khanna KK, Ragan MA. Identifying genetic vulnerabilities in cancers driven by defects in DNA-damage response. [abstract]. In: Proceedings of the Thirty-Eighth Annual CTRC-AACR San Antonio Breast Cancer Symposium: 2015 Dec 8-12; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2016;76(4 Suppl):Abstract nr PD6-05.
Collapse
|
29
|
Personalised pathway analysis reveals association between DNA repair pathway dysregulation and chromosomal instability in sporadic breast cancer. Mol Oncol 2016; 10:179-93. [PMID: 26456802 PMCID: PMC5528935 DOI: 10.1016/j.molonc.2015.09.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 08/19/2015] [Accepted: 09/04/2015] [Indexed: 01/05/2023] Open
Abstract
The Homologous Recombination (HR) pathway is crucial for the repair of DNA double-strand breaks (DSBs) generated during DNA replication. Defects in HR repair have been linked to the initiation and development of a wide variety of human malignancies, and exploited in chemical, radiological and targeted therapies. In this study, we performed a personalised pathway analysis independently for four large sporadic breast cancer cohorts to investigate the status of HR pathway dysregulation in individual sporadic breast tumours, its association with HR repair deficiency and its impact on tumour characteristics. Specifically, we first manually curated a list of HR genes according to our recent review on this pathway (Liu et al., 2014), and then applied a personalised pathway analysis method named Pathifier (Drier et al., 2013) on the expression levels of the curated genes to obtain an HR score quantifying HR pathway dysregulation in individual tumours. Based on the score, we observed a great diversity in HR dysregulation between and within gene expression-based breast cancer subtypes, and by using two published HR-defect signatures, we found HR pathway dysregulation reflects HR repair deficiency. Furthermore, we identified a novel association between HR pathway dysregulation and chromosomal instability (CIN) in sporadic breast cancer. Although CIN has long been considered as a hallmark of most solid tumours, with recent extensive studies highlighting its importance in tumour evolution and drug resistance, the molecular basis of CIN in sporadic cancers remains poorly understood. Our results imply that HR pathway dysregulation might contribute to CIN in sporadic breast cancer.
Collapse
|
30
|
Understanding the functional impact of copy number alterations in breast cancer using a network modeling approach. MOLECULAR BIOSYSTEMS 2016; 12:963-72. [DOI: 10.1039/c5mb00655d] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
We apply a network approach to identify genes associated incisor intranswith copy-number alterations in breast cancer pathogenesis.
Collapse
|
31
|
Inferring synthetic lethal interactions from mutual exclusivity of genetic events in cancer. Biol Direct 2015; 10:57. [PMID: 26427375 PMCID: PMC4590705 DOI: 10.1186/s13062-015-0086-1] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 09/23/2015] [Indexed: 12/21/2022] Open
Abstract
Background Synthetic lethality (SL) refers to the genetic interaction between two or more genes where only their co-alteration (e.g. by mutations, amplifications or deletions) results in cell death. In recent years, SL has emerged as an attractive therapeutic strategy against cancer: by targeting the SL partners of altered genes in cancer cells, these cells can be selectively killed while sparing the normal cells. Consequently, a number of studies have attempted prediction of SL interactions in human, a majority by extrapolating SL interactions inferred through large-scale screens in model organisms. However, these predicted SL interactions either do not hold in human cells or do not include genes that are (frequently) altered in human cancers, and are therefore not attractive in the context of cancer therapy. Results Here, we develop a computational approach to infer SL interactions directly from frequently altered genes in human cancers. It is based on the observation that pairs of genes that are altered in a (significantly) mutually exclusive manner in cancers are likely to constitute lethal combinations. Using genomic copy-number and gene-expression data from four cancers, breast, prostate, ovarian and uterine (total 3980 samples) from The Cancer Genome Atlas, we identify 718 genes that are frequently amplified or upregulated, and are likely to be synthetic lethal with six key DNA-damage response (DDR) genes in these cancers. By comparing with published data on gene essentiality (~16000 genes) from ten DDR-deficient cancer cell lines, we show that our identified genes are enriched among the top quartile of essential genes in these cell lines, implying that our inferred genes are highly likely to be (synthetic) lethal upon knockdown in these cell lines. Among the inferred targets are tousled-like kinase 2 (TLK2) and the deubiquitinating enzyme ubiquitin-specific-processing protease 7 (USP7) whose overexpression correlates with poor survival in cancers. Conclusion Mutual exclusivity between frequently occurring genetic events identifies synthetic lethal combinations in cancers. These identified genes are essential in cell lines, and are potential candidates for targeted cancer therapy. Availability: http://bioinformatics.org.au/tools-data/underMutExSL Reviewers This article was reviewed by Dr Michael Galperin, Dr Sebastian Maurer-Stroh and Professor Sanghyuk Lee. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0086-1) contains supplementary material, which is available to authorized users.
Collapse
|
32
|
The core root microbiome of sugarcanes cultivated under varying nitrogen fertilizer application. Environ Microbiol 2015; 18:1338-51. [PMID: 26032777 DOI: 10.1111/1462-2920.12925] [Citation(s) in RCA: 121] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 05/26/2015] [Accepted: 05/26/2015] [Indexed: 12/01/2022]
Abstract
Diazotrophic bacteria potentially supply substantial amounts of biologically fixed nitrogen to crops, but their occurrence may be suppressed by high nitrogen fertilizer application. Here, we explored the impact of high nitrogen fertilizer rates on the presence of diazotrophs in field-grown sugarcane with industry-standard or reduced nitrogen fertilizer application. Despite large differences in soil microbial communities between test sites, a core sugarcane root microbiome was identified. The sugarcane root-enriched core taxa overlap with those of Arabidopsis thaliana raising the possibility that certain bacterial families have had long association with plants. Reduced nitrogen fertilizer application had remarkably little effect on the core root microbiome and did not increase the relative abundance of root-associated diazotrophs or nif gene counts. Correspondingly, low nitrogen fertilizer crops had lower biomass and nitrogen content, reflecting a lack of major input of biologically fixed nitrogen, indicating that manipulating nitrogen fertilizer rates does not improve sugarcane yields by enriching diazotrophic populations under the test conditions. Standard nitrogen fertilizer crops had improved biomass and nitrogen content, and corresponding soils had higher abundances of nitrification and denitrification genes. These findings highlight that achieving a balance in maximizing crop yields and minimizing nutrient pollution associated with nitrogen fertilizer application requires understanding of how microbial communities respond to fertilizer use.
Collapse
|
33
|
Nitrogen fertilizer dose alters fungal communities in sugarcane soil and rhizosphere. Sci Rep 2015; 5:8678. [PMID: 25728892 PMCID: PMC5155403 DOI: 10.1038/srep08678] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 01/29/2015] [Indexed: 12/18/2022] Open
Abstract
Fungi play important roles as decomposers, plant symbionts and pathogens in soils. The structure of fungal communities in the rhizosphere is the result of complex interactions among selection factors that may favour beneficial or detrimental relationships. Using culture-independent fungal community profiling, we have investigated the effects of nitrogen fertilizer dosage on fungal communities in soil and rhizosphere of field-grown sugarcane. The results show that the concentration of nitrogen fertilizer strongly modifies the composition but not the taxon richness of fungal communities in soil and rhizosphere. Increased nitrogen fertilizer dosage has a potential negative impact on carbon cycling in soil and promotes fungal genera with known pathogenic traits, uncovering a negative effect of intensive fertilization.
Collapse
|
34
|
Abstract
Background Differential expression analysis of (individual) genes is often used to study their roles in diseases. However, diseases such as cancer are a result of the combined effect of multiple genes. Gene products such as proteins seldom act in isolation, but instead constitute stable multi-protein complexes performing dedicated functions. Therefore, complexes aggregate the effect of individual genes (proteins) and can be used to gain a better understanding of cancer mechanisms. Here, we observe that complexes show considerable changes in their expression, in turn directed by the concerted action of transcription factors (TFs), across cancer conditions. We seek to gain novel insights into cancer mechanisms through a systematic analysis of complexes and their transcriptional regulation. Results We integrated large-scale protein-interaction (PPI) and gene-expression datasets to identify complexes that exhibit significant changes in their expression across different conditions in cancer. We devised a log-linear model to relate these changes to the differential regulation of complexes by TFs. The application of our model on two case studies involving pancreatic and familial breast tumour conditions revealed: (i) complexes in core cellular processes, especially those responsible for maintaining genome stability and cell proliferation (e.g. DNA damage repair and cell cycle) show considerable changes in expression; (ii) these changes include decrease and countering increase for different sets of complexes indicative of compensatory mechanisms coming into play in tumours; and (iii) TFs work in cooperative and counteractive ways to regulate these mechanisms. Such aberrant complexes and their regulating TFs play vital roles in the initiation and progression of cancer. Conclusions Complexes in core cellular processes display considerable decreases and countering increases in expression, strongly reflective of compensatory mechanisms in cancer. These changes are directed by the concerted action of cooperative and counteractive TFs. Our study highlights the roles of these complexes and TFs and presents several case studies of compensatory processes, thus providing novel insights into cancer mechanisms.
Collapse
|
35
|
Breast cancer classification: linking molecular mechanisms to disease prognosis. Brief Bioinform 2014; 16:461-74. [PMID: 24950687 DOI: 10.1093/bib/bbu020] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2014] [Accepted: 05/07/2014] [Indexed: 12/21/2022] Open
Abstract
Breast cancer was traditionally perceived as a single disease; however, recent advances in gene expression and genomic profiling have revealed that breast cancer is in fact a collection of diseases exhibiting distinct anatomical features, responses to treatment and survival outcomes. Consequently, a number of schemes have been proposed for subtyping of breast cancer to bring out the biological and clinically relevant characteristics of the subtypes. Although some of these schemes capture underlying molecular differences, others predict variations in response to treatment and survival patterns. However, despite this diversity in the approaches, it is clear that molecular mechanisms drive clinical outcomes, and therefore an effective scheme should integrate molecular as well as clinical parameters to enable deeper understanding of cancer mechanisms and allow better decision making in the clinic. Here, using a large cohort of ∼550 breast tumours from The Cancer Genome Atlas, we systematically evaluate a number of expression-based schemes including at least eight molecular pathways implicated in breast cancer and three prognostic signatures, across a variety of classification scenarios covering molecular characteristics, biomarker status, tumour stages and survival patterns. We observe that a careful combination of these schemes yields better classification results compared with using them individually, thus confirming that molecular mechanisms and clinical outcomes are related and that an effective scheme should therefore integrate both these parameters to enable a deeper understanding of the cancer.
Collapse
|
36
|
A fine-scale dissection of the DNA double-strand break repair machinery and its implications for breast cancer therapy. Nucleic Acids Res 2014; 42:6106-27. [PMID: 24792170 PMCID: PMC4041457 DOI: 10.1093/nar/gku284] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 03/21/2014] [Accepted: 03/26/2014] [Indexed: 02/06/2023] Open
Abstract
DNA-damage response machinery is crucial to maintain the genomic integrity of cells, by enabling effective repair of even highly lethal lesions such as DNA double-strand breaks (DSBs). Defects in specific genes acquired through mutations, copy-number alterations or epigenetic changes can alter the balance of these pathways, triggering cancerous potential in cells. Selective killing of cancer cells by sensitizing them to further DNA damage, especially by induction of DSBs, therefore requires careful modulation of DSB-repair pathways. Here, we review the latest knowledge on the two DSB-repair pathways, homologous recombination and non-homologous end joining in human, describing in detail the functions of their components and the key mechanisms contributing to the repair. Such an in-depth characterization of these pathways enables a more mechanistic understanding of how cells respond to therapies, and suggests molecules and processes that can be explored as potential therapeutic targets. One such avenue that has shown immense promise is via the exploitation of synthetic lethal relationships, for which the BRCA1-PARP1 relationship is particularly notable. Here, we describe how this relationship functions and the manner in which cancer cells acquire therapy resistance by restoring their DSB repair potential.
Collapse
|
37
|
INsPeCT: INtegrative Platform for Cancer Transcriptomics. Cancer Inform 2014; 13:59-66. [PMID: 24653643 PMCID: PMC3956744 DOI: 10.4137/cin.s13630] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2013] [Revised: 01/08/2014] [Accepted: 01/08/2014] [Indexed: 01/21/2023] Open
Abstract
The emergence of transcriptomics, fuelled by high-throughput sequencing technologies, has changed the nature of cancer research and resulted in a massive accumulation of data. Computational analysis, integration, and data visualization are now major bottlenecks in cancer biology and translational research. Although many tools have been brought to bear on these problems, their use remains unnecessarily restricted to computational biologists, as many tools require scripting skills, data infrastructure, and powerful computational facilities. New user-friendly, integrative, and automated analytical approaches are required to make computational methods more generally useful to the research community. Here we present INsPeCT (INtegrative Platform for Cancer Transcriptomics), which allows users with basic computer skills to perform comprehensive in-silico analyses of microarray, ChIP-seq, and RNA-seq data. INsPeCT supports the selection of interesting genes for advanced functional analysis. Included in its automated workflows are (i) a novel analytical framework, RMaNI (regulatory module network inference), which supports the inference of cancer subtype-specific transcriptional module networks and the analysis of modules; and (ii) WGCNA (weighted gene co-expression network analysis), which infers modules of highly correlated genes across microarray samples, associated with sample traits, eg survival time. INsPeCT is available free of cost from Bioinformatics Resource Australia-EMBL and can be accessed at http://inspect.braembl.org.au.
Collapse
|
38
|
Abstract
From 1971 to 1985, Carl Woese and colleagues generated oligonucleotide catalogs of 16S/18S rRNAs from more than 400 organisms. Using these incomplete and imperfect data, Carl and his colleagues developed unprecedented insights into the structure, function, and evolution of the large RNA components of the translational apparatus. They recognized a third domain of life, revealed the phylogenetic backbone of bacteria (and its limitations), delineated taxa, and explored the tempo and mode of microbial evolution. For these discoveries to have stood the test of time, oligonucleotide catalogs must carry significant phylogenetic signal; they thus bear re-examination in view of the current interest in alignment-free phylogenetics based on k-mers. Here we consider the aims, successes, and limitations of this early phase of molecular phylogenetics. We computationally generate oligonucleotide sets (e-catalogs) from 16S/18S rRNA sequences, calculate pairwise distances between them based on D2 statistics, compute distance trees, and compare their performance against alignment-based and k-mer trees. Although the catalogs themselves were superseded by full-length sequences, this stage in the development of computational molecular biology remains instructive for us today.
Collapse
|
39
|
Abstract
MOTIVATION Cancer is a heterogeneous progressive disease caused by perturbations of the underlying gene regulatory network that can be described by dynamic models. These dynamics are commonly modeled as Boolean networks or as ordinary differential equations. Their inference from data is computationally challenging, and at least partial knowledge of the regulatory network and its kinetic parameters is usually required to construct predictive models. RESULTS Here, we construct Hopfield networks from static gene-expression data and demonstrate that cancer subtypes can be characterized by different attractors of the Hopfield network. We evaluate the clustering performance of the network and find that it is comparable with traditional methods but offers additional advantages including a dynamic model of the energy landscape and a unification of clustering, feature selection and network inference. We visualize the Hopfield attractor landscape and propose a pruning method to generate sparse networks for feature selection and improved understanding of feature relationships.
Collapse
|
40
|
Evolution and Controllability of Cancer Networks: A Boolean Perspective. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:83-94. [PMID: 26355510 DOI: 10.1109/tcbb.2013.128] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Cancer forms a robust system capable of maintaining stable functioning (cell sustenance and proliferation) despite perturbations. Cancer progresses as stages over time typically with increasing aggressiveness and worsening prognosis. Characterizing these stages and identifying the genes driving transitions between them is critical to understand cancer progression and to develop effective anti-cancer therapies. In this work, we propose a novel model for the `cancer system' as a Boolean state space in which a Boolean network, built from protein-interaction and gene-expression data from different stages of cancer, transits between Boolean satisfiability states by "editing" interactions and "flipping" genes. Edits reflect rewiring of the PPI network while flipping of genes reflect activation or silencing of genes between stages. We formulate a minimization problem min flip to identify these genes driving the transitions. The application of our model (called BoolSpace) on three case studies-pancreatic and breast tumours in human and post spinal-cord injury (SCI) in rats-reveals valuable insights into the phenomenon of cancer progression: (i) interactions involved in core cell-cycle and DNA-damage repair pathways are significantly rewired in tumours, indicating significant impact to key genome-stabilizing mechanisms; (ii) several of the genes flipped are serine/threonine kinases which act as biological switches, reflecting cellular switching mechanisms between stages; and (iii) different sets of genes are flipped during the initial and final stages indicating a pattern to tumour progression. Based on these results, we hypothesize that robustness of cancer partly stems from "passing of the baton" between genes at different stages-genes from different biological processes and/or cellular components are involved in different stages of tumour progression thereby allowing tumour cells to evade targeted therapy, and therefore an effective therapy should target a "cover set" of these genes. A C/C++ implementation of BoolSpace is freely available at: http://www.bioinformatics.org.au/tools-data.
Collapse
|
41
|
A new species of Burkholderia isolated from sugarcane roots promotes plant growth. Microb Biotechnol 2013; 7:142-54. [PMID: 24350979 PMCID: PMC3937718 DOI: 10.1111/1751-7915.12105] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Accepted: 11/07/2013] [Indexed: 01/21/2023] Open
Abstract
Sugarcane is a globally important food, biofuel and biomaterials crop. High nitrogen (N) fertilizer rates aimed at increasing yield often result in environmental damage because of excess and inefficient application. Inoculation with diazotrophic bacteria is an attractive option for reducing N fertilizer needs. However, the efficacy of bacterial inoculants is variable, and their effective formulation remains a knowledge frontier. Here, we take a new approach to investigating diazotrophic bacteria associated with roots using culture-independent microbial community profiling of a commercial sugarcane variety (Q208(A) ) in a field setting. We first identified bacteria that were markedly enriched in the rhizosphere to guide isolation and then tested putative diazotrophs for the ability to colonize axenic sugarcane plantlets (Q208(A) ) and promote growth in suboptimal N supply. One isolate readily colonized roots, fixed N2 and stimulated growth of plantlets, and was classified as a new species, Burkholderia australis sp. nov. Draft genome sequencing of the isolate confirmed the presence of nitrogen fixation. We propose that culture-independent identification and isolation of bacteria that are enriched in rhizosphere and roots, followed by systematic testing and confirming their growth-promoting capacity, is a necessary step towards designing effective microbial inoculants.
Collapse
|
42
|
Abstract
Large quantities of information describing the mechanisms of biological pathways continue to be collected in publicly available databases. At the same time, experiments have increased in scale, and biologists increasingly use pathways defined in online databases to interpret the results of experiments and generate hypotheses. Emerging computational techniques that exploit the rich biological information captured in reaction systems require formal standardized descriptions of pathways to extract these reaction networks and avoid the alternative: time-consuming and largely manual literature-based network reconstruction. Here, we systematically evaluate the effects of commonly used knowledge representations on the seemingly simple task of extracting a reaction network describing signal transduction from a pathway database. We show that this process is in fact surprisingly difficult, and the pathway representations adopted by various knowledge bases have dramatic consequences for reaction network extraction, connectivity, capture of pathway crosstalk and in the modelling of cell-cell interactions. Researchers constructing computational models built from automatically extracted reaction networks must therefore consider the issues we outline in this review to maximize the value of existing pathway knowledge.
Collapse
|
43
|
Phylogeny rather than ecology or lifestyle biases the construction of Escherichia coli-Shigella genetic exchange communities. Open Biol 2013; 2:120112. [PMID: 23091700 PMCID: PMC3472396 DOI: 10.1098/rsob.120112] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 08/20/2012] [Indexed: 11/12/2022] Open
Abstract
Genetic material can be transmitted not only vertically from parent to offspring, but also laterally (horizontally) from one bacterial lineage to another. Lateral genetic transfer is non-uniform; biases in its nature or frequency construct communities of genetic exchange. These biases have been proposed to arise from phylogenetic relatedness, shared ecology and/or common lifestyle. Here, we test these hypotheses using a graph-based abstraction of inferred genetic-exchange relationships among 27 Escherichia coli and Shigella genomes. We show that although barriers to inter-phylogenetic group lateral transfer are low, E. coli and Shigella are more likely to have exchanged genetic material with close relatives. We find little evidence of bias arising from shared environment or lifestyle. More than one-third of donor-recipient pairs in our analysis show some level of fragmentary gene transfer. Thus, within the E. coli-Shigella clade, intact genes and gene fragments have been disseminated non-uniformly and at appreciable frequency, constructing communities that transgress environmental and lifestyle boundaries.
Collapse
|
44
|
Abstract
Pathway analysis is important in interpreting the functional implications of high-throughput experimental results, but robust comparison across platforms and species is problematic. A new approach, Pathprinting, provides a cross-platform, cross-species comparative analysis of pathway expression signatures. This method calculates pathway-level statistics from gene expression across nearly 180,000 microarrays in the Gene Expression Omnibus. Pathprinting can accurately retrieve phenotypically similar samples and identify sets of human and mouse genes that are prognostic in cancer. See related Research paper, http://genomemedicine.com/content/5/7/68
Collapse
|
45
|
Biological Intuition in Alignment-Free Methods: Response to Posada. J Mol Evol 2013; 77:1-2. [DOI: 10.1007/s00239-013-9573-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 07/04/2013] [Indexed: 10/26/2022]
|
46
|
Yeast rises to the occasion. eLife 2013; 2:e00933. [PMID: 23795300 PMCID: PMC3687331 DOI: 10.7554/elife.00933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genetic analyses of 15 species of yeast have shed new light on the divergence of gene regulation during evolution, with significant changes occurring after an event in which a whole genome was duplicated.
Collapse
|
47
|
Abstract
Inference of gene regulatory network from expression data is a challenging task. Many methods have been developed to this purpose but a comprehensive evaluation that covers unsupervised, semi-supervised and supervised methods, and provides guidelines for their practical application, is lacking. We performed an extensive evaluation of inference methods on simulated and experimental expression data. The results reveal low prediction accuracies for unsupervised techniques with the notable exception of the Z-SCORE method on knockout data. In all other cases, the supervised approach achieved the highest accuracies and even in a semi-supervised setting with small numbers of only positive samples, outperformed the unsupervised techniques.
Collapse
|
48
|
|
49
|
Clustering evolving proteins into homologous families. BMC Bioinformatics 2013; 14:120. [PMID: 23566217 PMCID: PMC3637521 DOI: 10.1186/1471-2105-14-120] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2012] [Accepted: 03/27/2013] [Indexed: 11/20/2022] Open
Abstract
Background Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Results Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Conclusions Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.
Collapse
|
50
|
Abstract
Thanks to advances in next-generation technologies, genome sequences are now being generated at breadth (e.g. across environments) and depth (thousands of closely related strains, individuals or samples) unimaginable only a few years ago. Phylogenomics--the study of evolutionary relationships based on comparative analysis of genome-scale data--has so far been developed as industrial-scale molecular phylogenetics, proceeding in the two classical steps: multiple alignment of homologous sequences, followed by inference of a tree (or multiple trees). However, the algorithms typically employed for these steps scale poorly with number of sequences, such that for an increasing number of problems, high-quality phylogenomic analysis is (or soon will be) computationally infeasible. Moreover, next-generation data are often incomplete and error-prone, and analysis may be further complicated by genome rearrangement, gene fusion and deletion, lateral genetic transfer, and transcript variation. Here we argue that next-generation data require next-generation phylogenomics, including so-called alignment-free approaches.
Collapse
|