1
|
Enhancing Bioinformatics and Genomics Courses: Building Capacity and Skills via Lab Meeting Activities: Fostering a Culture of Critical Capacities to Read, Write, Communicate and Engage in Rigorous Scientific Exchanges. Bioessays 2020; 42:e2000134. [PMID: 32830345 DOI: 10.1002/bies.202000134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 07/08/2020] [Indexed: 11/08/2022]
Abstract
Reading, writing, publishing, and publicly presenting scientific works are vital for a young researcher's profile building and career development. Generally, the traditional educational curricula do not offer training possibilities to learn and practice how to prepare, write, and present scientific works. These are rather a part of lab meeting activities in research groups. The lack of such training is more critical in some developing countries because this adds to the rare opportunities to discuss and become involved in the exchanges on state of the art scientific literature. Here the authors relate their experience in introducing a weekly 1-day lab meeting in the framework of two previously organized 3-month courses on "Bioinformatics and Genome Analyses". The main activities which are developed during these lab meetings include scientific literature follow up as well as preparing and presenting oral and written scientific reviews. These activities prove to be useful for a student's self-confidence building, for enhancing their active participation during the lectures and practical sessions, as well as for the positive impact on running the whole course program. Incorporation of such lab meeting activities in the course program significantly improves the capacity building of the participants, their analytical and critical reading of scientific literature, as well as communication skills. In this work it is shown how to proceed with the different steps involved in the implementation of lab meeting activities, and to recommend their regular institution in similar courses.
Collapse
|
2
|
Abstract
Genome data, with underlying new knowledge, are accumulating at exponential rate thanks to ever-improving sequencing technologies and the parallel development of dedicated efficient Bioinformatics methods and tools. Advanced Education in Bioinformatics and Genome Analyses is to a large extent not accessible to students in developing countries where endeavors to set up Bioinformatics courses concern most often only basic levels. Here, we report a pioneering pilot experience concerning the design and implementation, from scratch, of a three-months advanced and extensive course in Bioinformatics and Genome Analyses in the Institut Pasteur de Tunis. Most significantly the outcome of the course was upgrading the participants’ skills in Bioinformatics and Genome Analyses to recognized international standards. Here we detail the different steps involved in the implementation of this course as well as the topics covered in the program. The description of this pilot experience might be helpful for the implementation of other similar educational projects, notably in developing countries, aiming to go beyond basics and providing young researchers with high-level skills.
Collapse
|
3
|
Genome Data Exploration Using Correspondence Analysis. Bioinform Biol Insights 2016; 10:59-72. [PMID: 27279736 PMCID: PMC4898644 DOI: 10.4137/bbi.s39614] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Revised: 04/12/2016] [Accepted: 04/14/2016] [Indexed: 01/14/2023] Open
Abstract
Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results, particularly by the ability of relating individual patterns with their corresponding characteristic variables.
Collapse
|
4
|
Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016; 9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]
Abstract
With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Collapse
|
5
|
Complete DNA sequence of Kuraishia capsulata illustrates novel genomic features among budding yeasts (Saccharomycotina). Genome Biol Evol 2014; 5:2524-39. [PMID: 24317973 PMCID: PMC3879985 DOI: 10.1093/gbe/evt201] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The numerous yeast genome sequences presently available provide a rich source of information for functional as well as evolutionary genomics but unequally cover the large phylogenetic diversity of extant yeasts. We present here the complete sequence of the nuclear genome of the haploid-type strain of Kuraishia capsulata (CBS1993T), a nitrate-assimilating Saccharomycetales of uncertain taxonomy, isolated from tunnels of insect larvae underneath coniferous barks and characterized by its copious production of extracellular polysaccharides. The sequence is composed of seven scaffolds, one per chromosome, totaling 11.4 Mb and containing 6,029 protein-coding genes, ∼13.5% of which being interrupted by introns. This GC-rich yeast genome (45.7%) appears phylogenetically related with the few other nitrate-assimilating yeasts sequenced so far, Ogataea polymorpha, O. parapolymorpha, and Dekkera bruxellensis, with which it shares a very reduced number of tRNA genes, a novel tRNA sparing strategy, and a common nitrate assimilation cluster, three specific features to this group of yeasts. Centromeres were recognized in GC-poor troughs of each scaffold. The strain bears MAT alpha genes at a single MAT locus and presents a significant degree of conservation with Saccharomyces cerevisiae genes, suggesting that it can perform sexual cycles in nature, although genes involved in meiosis were not all recognized. The complete absence of conservation of synteny between K. capsulata and any other yeast genome described so far, including the three other nitrate-assimilating species, validates the interest of this species for long-range evolutionary genomic studies among Saccharomycotina yeasts.
Collapse
|
6
|
SuperPartitions: detection and classification of orthologs. Gene 2011; 492:199-211. [PMID: 22056699 DOI: 10.1016/j.gene.2011.10.027] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Revised: 10/08/2011] [Accepted: 10/11/2011] [Indexed: 10/16/2022]
Abstract
The proper detection of orthologs is crucial for evolutionary studies of genes and species. Despite large efforts to solve this problem the methodological situation appears unsettled to a large extent and the "quest for orthologs" is still an ongoing task in large-scale genome comparisons. Here, we introduce a simple operational framework for the detection of orthologs and their classification. The operational framework relies on well-established principles, optimizing their implementation for the considered purposes, and chaining components in coherent procedures: 1) We take advantage of the efficiency and simplicity of the Reciprocal Best Hit (RBH) detections, remedying (by design) the drawback concerning the limitations in terms of 1:1 detections. The procedure is based on the partitioning of Reciprocal Best Hits, with the further merging of partitions including members of the same paralogous classes ("SuperPartition of Orthologs" (SPOs)). 2) We then resort to the conservation profiles of the obtained clusters, allowing simple detection of SPOs containing duplicated members. Based on accepted evolutionary principles, such members can be further tagged as in-paralogs (co-orthologs) or out-paralogs. The method is illustrated and validated by extensive genomic analyses. The performances of the overall approach are characterized in global terms for three sets of species (Chlamydiae, Mycobacteria, Aspergilli), showing that at least 75% of the sets of orthologs contain at most one protein from a given species. The sets including more than one protein from a given species are shown to contain in-paralogs in proportions varying from 28% to 58%. The characterizations also show that the large majority of SPOs are associated with ancestral motifs, and accordingly not prone to chaining effects that might be triggered by multi-domain proteins. Further the SPO formulation is compared to other similarity based ortholog detection methods. Beyond core common results, significant differences are observed between various methods, which can be accounted for to a large extent on conceptual grounds, relative to the different merging schemes involved. Such comparisons highlight a major advantage of the SPO approach concerning the proper clustering of associated paralogs, which appear to be often dispatched spuriously into distinct orthologous classes. Finally the perspectives for future applications and elaborations of SPO-based compositional analyses are discussed.
Collapse
|
7
|
Antigenic and genetic relationships between European very virulent infectious bursal disease viruses and an early West African isolate. Avian Pathol 2010. [DOI: 10.1080/03079459995028] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
8
|
|
9
|
Protection against Mycobacterium ulcerans lesion development by exposure to aquatic insect saliva. PLoS Med 2007; 4:e64. [PMID: 17326707 PMCID: PMC1808094 DOI: 10.1371/journal.pmed.0040064] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2006] [Accepted: 01/02/2007] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Buruli ulcer is a severe human skin disease caused by Mycobacterium ulcerans. This disease is primarily diagnosed in West Africa with increasing incidence. Antimycobacterial drug therapy is relatively effective during the preulcerative stage of the disease, but surgical excision of lesions with skin grafting is often the ultimate treatment. The mode of transmission of this Mycobacterium species remains a matter of debate, and relevant interventions to prevent this disease lack (i) the proper understanding of the M. ulcerans life history traits in its natural aquatic ecosystem and (ii) immune signatures that could be correlates of protection. We previously set up a laboratory ecosystem with predatory aquatic insects of the family Naucoridae and laboratory mice and showed that (i) M. ulcerans-carrying aquatic insects can transmit the mycobacterium through bites and (ii) that their salivary glands are the only tissues hosting replicative M. ulcerans. Further investigation in natural settings revealed that 5%-10% of these aquatic insects captured in endemic areas have M. ulcerans-loaded salivary glands. In search of novel epidemiological features we noticed that individuals working close to aquatic environments inhabited by insect predators were less prone to developing Buruli ulcers than their relatives. Thus we set out to investigate whether those individuals might display any immune signatures of exposure to M. ulcerans-free insect predator bites, and whether those could correlate with protection. METHODS AND FINDINGS We took a two-pronged approach in this study, first investigating whether the insect bites are protective in a mouse model, and subsequently looking for possibly protective immune signatures in humans. We found that, in contrast to control BALB/c mice, BALB/c mice exposed to Naucoris aquatic insect bites or sensitized to Naucoris salivary gland homogenates (SGHs) displayed no lesion at the site of inoculation of M. ulcerans coated with Naucoris SGH components. Then using human serum samples collected in a Buruli ulcer-endemic area (in the Republic of Benin, West Africa), we assayed sera collected from either ulcer-free individuals or patients with Buruli ulcers for the titre of IgGs that bind to insect predator SGH, focusing on those molecules otherwise shown to be retained by M. ulcerans colonies. IgG titres were lower in the Buruli ulcer patient group than in the ulcer-free group. CONCLUSIONS These data will help structure future investigations in Buruli ulcer-endemic areas, providing a rationale for research into human immune signatures of exposure to predatory aquatic insects, with special attention to those insect saliva molecules that bind to M. ulcerans.
Collapse
|
10
|
Reductive evolution and niche adaptation inferred from the genome of Mycobacterium ulcerans, the causative agent of Buruli ulcer. Genome Res 2007; 17:192-200. [PMID: 17210928 PMCID: PMC1781351 DOI: 10.1101/gr.5942807] [Citation(s) in RCA: 269] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Mycobacterium ulcerans is found in aquatic ecosystems and causes Buruli ulcer in humans, a neglected but devastating necrotic disease of subcutaneous tissue that is rampant throughout West and Central Africa. Here, we report the complete 5.8-Mb genome sequence of M. ulcerans and show that it comprises two circular replicons, a chromosome of 5632 kb and a virulence plasmid of 174 kb. The plasmid is required for production of the polyketide toxin mycolactone, which provokes necrosis. Comparisons with the recently completed 6.6-Mb genome of Mycobacterium marinum revealed >98% nucleotide sequence identity and genome-wide synteny. However, as well as the plasmid, M. ulcerans has accumulated 213 copies of the insertion sequence IS2404, 91 copies of IS2606, 771 pseudogenes, two bacteriophages, and multiple DNA deletions and rearrangements. These data indicate that M. ulcerans has recently evolved via lateral gene transfer and reductive evolution from the generalist, more rapid-growing environmental species M. marinum to become a niche-adapted specialist. Predictions based on genome inspection for the production of modified mycobacterial virulence factors, such as the highly abundant phthiodiolone lipids, were confirmed by structural analyses. Similarly, 11 protein-coding sequences identified as M. ulcerans-specific by comparative genomics were verified as such by PCR screening a diverse collection of 33 strains of M. ulcerans and M. marinum. This work offers significant insight into the biology and evolution of mycobacterial pathogens and is an important component of international efforts to counter Buruli ulcer.
Collapse
|
11
|
Evolution of proteomes: fundamental signatures and global trends in amino acid compositions. BMC Genomics 2006; 7:307. [PMID: 17147802 PMCID: PMC1764020 DOI: 10.1186/1471-2164-7-307] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2006] [Accepted: 12/05/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The evolutionary characterization of species and lifestyles at global levels is nowadays a subject of considerable interest, particularly with the availability of many complete genomes. Are there specific properties associated with lifestyles and phylogenies? What are the underlying evolutionary trends? One of the simplest analyses to address such questions concerns characterization of proteomes at the amino acids composition level. RESULTS In this work, amino acid compositions of a large set of 208 proteomes, with significant number of representatives from the three phylogenetic domains and different lifestyles are analyzed, resorting to an appropriate multidimensional method: Correspondence analysis. The analysis reveals striking discrimination between eukaryotes, prokaryotic mesophiles and hyperthemophiles-themophiles, following amino acid usage. In sharp contrast, no similar discrimination is observed for psychrophiles. The observed distributional properties are compared with various inferred chronologies for the recruitment of amino acids into the genetic code. Such comparisons reveal correlations between the observed segregations of species following amino acid usage, and the separation of amino acids following early or late recruitment. CONCLUSION A simple description of proteomes according to amino acid compositions reveals striking signatures, with sharp segregations or on the contrary non-discriminations following phylogenies and lifestyles. The distribution of species, following amino acid usage, exhibits a discrimination between [high GC]-[high optimal growth temperatures] and [low GC]-[moderate temperatures] characteristics. This discrimination appears to coincide closely with the separation of amino acids following their inferred early or late recruitment into the genetic code. Taken together the various results provide a consistent picture for the evolution of proteomes, in terms of amino acid usage.
Collapse
|
12
|
Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus. Nature 2006; 438:1151-6. [PMID: 16372009 DOI: 10.1038/nature04332] [Citation(s) in RCA: 989] [Impact Index Per Article: 54.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2005] [Accepted: 10/12/2005] [Indexed: 11/09/2022]
Abstract
Aspergillus fumigatus is exceptional among microorganisms in being both a primary and opportunistic pathogen as well as a major allergen. Its conidia production is prolific, and so human respiratory tract exposure is almost constant. A. fumigatus is isolated from human habitats and vegetable compost heaps. In immunocompromised individuals, the incidence of invasive infection can be as high as 50% and the mortality rate is often about 50% (ref. 2). The interaction of A. fumigatus and other airborne fungi with the immune system is increasingly linked to severe asthma and sinusitis. Although the burden of invasive disease caused by A. fumigatus is substantial, the basic biology of the organism is mostly obscure. Here we show the complete 29.4-megabase genome sequence of the clinical isolate Af293, which consists of eight chromosomes containing 9,926 predicted genes. Microarray analysis revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype. The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus.
Collapse
|
13
|
Abstract
The concept of the genome tree depends on the potential evolutionary significance in the clustering of species according to similarities in the gene content of their genomes. In this respect, genome trees have often been identified with species trees. With the rapid expansion of genome sequence data it becomes of increasing importance to develop accurate methods for grasping global trends for the phylogenetic signals that mutually link the various genomes. We therefore derive here the methodological concept of genome trees based on protein conservation profiles in multiple species. The basic idea in this derivation is that the multi-component “presence-absence” protein conservation profiles permit tracking of common evolutionary histories of genes across multiple genomes. We show that a significant reduction in informational redundancy is achieved by considering only the subset of distinct conservation profiles. Beyond these basic ideas, we point out various pitfalls and limitations associated with the data handling, paving the way for further improvements. As an illustration for the methods, we analyze a genome tree based on the above principles, along with a series of other trees derived from the same data and based on pair-wise comparisons (ancestral duplication-conservation and shared orthologs). In all trees we observe a sharp discrimination between the three primary domains of life: Bacteria, Archaea, and Eukarya. The new genome tree, based on conservation profiles, displays a significant correspondence with classically recognized taxonomical groupings, along with a series of departures from such conventional clusterings. Since Darwin's Origin of Species and Haeckel's Tree of Life, systematic biology has attempted to classify species into “family trees.” Genomics has provided a new framework permitting descriptions of sibling relations between species on the basis of their complete genetic blueprints. While trees based on single genes (rRNA), or limited numbers of genes have been useful, genome trees derived from complete genome comparisons should lead to more complete pictures of phylogenetic relations between various organisms. In order to reach such a global vision, procedures to establish sibling relationships should depend on an overall comparison that captures the evolutionary fates of proteins jointly in multiple genomes. This paper aims to establish a methodological basis to use genuine multidimensional procedures in the construction of genome trees. This approach completes the derivation of trees based on more classical techniques of pair-wise comparison between species. The authors survey classification schemes emerging from this approach, which either supports traditional views, such as the separation between the three phylogenetic domains Bacteria, Archaea, and Eukarya, or challenges them by suggesting, for example, intermingled clusterings of Proteobacteria with various other bacterial species.
Collapse
|
14
|
Abstract
Large-scale genome comparisons have shown that no gene sets are shared exclusively by both Aspergillus fumigatus and any other human pathogen sequenced to date, such as Candida or Cryptococcus species. By contrast, and in agreement with the environmental occurrence of this fungus in decaying vegetation, the enzymatic machinery required by a fungus to colonize plant substrates has been found in the A. fumigatus genome. In addition, the proteome of this fungus contains numerous efflux pumps, including >100 major facilitators that help the fungus to resist either natural aggressive molecules present in the environment or antifungal drugs in humans. Environment sensing, counteracting reactive oxidants, and retrieving essential nutriments from the environment are general metabolic traits that are associated with the growth of the saprotrophic mold A. fumigatus in an unfriendly environment such as its human host.
Collapse
|
15
|
Abstract
Recent sequencing and assembly of the genome for the fungal pathogen Candida albicans used simple automated procedures for the identification of putative genes. We have reviewed the entire assembly, both by hand and with additional bioinformatic resources, to accurately map and describe 6,354 genes and to identify 246 genes whose original database entries contained sequencing errors (or possibly mutations) that affect their reading frame. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that might be targeted for antifungal therapy. We also observed that, compared to other fungi, the protein-coding sequences in the C. albicans genome are especially rich in short sequence repeats. Finally, our improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl-CoA and enoyl-CoA carriers, several novel amino acid degrading enzymes, a variety of secreted catabolic lipases and proteases, and numerous transporters to assimilate the resulting nutrients. The results of these efforts will ensure that the Candida research community has uniform and comprehensive genomic information for medical research as well as for future diagnostic and therapeutic applications. Candida albicans is a commonly encountered fungal pathogen usually responsible for superficial infections (thrush and vaginitis). However, an estimated 30% of severe fungal infections, most due to Candida, result in death. Those who are most at risk include individuals taking immune-suppressive drugs following organ transplantation, people with HIV infection, premature infants, and cancer patients undergoing chemotherapy. Current therapies for this pathogen are made more difficult by the significant secondary effects of anti-fungal drugs that target proteins that are also found in the human host. Recent sequencing and assembly of the genome for the fungal pathogen C. albicans used simple automated procedures for the identification of putative genes. Here, we report a detailed annotation of the 6,354 genes that are present in the genome sequence of this organism, essentially writing the dictionary of the C. albicans genome. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that are absent from the human genome and whose products might be targeted for antifungal therapy. The results of these efforts will thus ensure that the Candida research community has uniform and comprehensive genomic information for medical research, for the development of functional genomic tools as well as for future diagnostic and therapeutic applications.
Collapse
|
16
|
Abstract
CandidaDB is a database dedicated to the genome of the most prevalent systemic fungal pathogen of humans, Candida albicans. CandidaDB is based on an annotation of the Stanford Genome Technology Center C.albicans genome sequence data by the European Galar Fungail Consortium. CandidaDB Release 2.0 (June 2004) contains information pertaining to Assembly 19 of the genome of C.albicans strain SC5314. The current release contains 6244 annotated entries corresponding to 130 tRNA genes and 5917 protein-coding genes. For these, it provides tentative functional assignments along with numerous pre-run analyses that can assist the researcher in the evaluation of gene function for the purpose of specific or large-scale analysis. CandidaDB is based on GenoList, a generic relational data schema and a World Wide Web interface that has been adapted to the handling of eukaryotic genomes. The interface allows users to browse easily through genome data and retrieve information. CandidaDB also provides more elaborate tools, such as pattern searching, that are tightly connected to the overall browsing system. As the C.albicans genome is diploid and still incompletely assembled, CandidaDB provides tools to browse the genome by individual supercontigs and to examine information about allelic sequences obtained from complementary contigs. CandidaDB is accessible at http://genolist.pasteur.fr/CandidaDB.
Collapse
|
17
|
Specific molecular features in the organization and biosynthesis of the cell wall ofAspergillus fumigatus. Med Mycol 2005; 43 Suppl 1:S15-22. [PMID: 16110787 DOI: 10.1080/13693780400029155] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
The cell wall of Aspergillus fumigatus is composed of a branched beta1,3 glucan covalently bound to chitin, beta1,3, beta1,4 glucans, and galactomannan, that is embedded in an amorphous cement composed of alpha1,3 glucan, galactomannan and polygalactosamin. The mycelial cell wall of A. fumigatus is very different from the yeast Saccharomyces cerevisiae cell wall, and in particular lacks beta1,6 glucans and proteins covalently bound to cell wall polysaccharides. The differences in cell wall composition between the mould A. fumigatus and the yeast S. cerevisiae are also reflected at the genomic level where unique features have been identified in A. fumigatus. A single gene codes for the glucan synthase catalytic subumit; this finding has lead to the development of a RNAi methodology for the disruption of essential genes in A. fumigatus. In contrast to the glucan synthase, multiple genes have been found in the chitin synthase and the alpha glucan synthase families; in spite of homologous sequences, each gene in each family have very different function. Similarly homologous mannosyltransferase genes are found in yeast and moulds but they lead to the synthesis of very different N-mannan structures. This chemo-genomic comparative analysis has also suggested that GPI-anchored proteins do not have a role of linker in the three dimensional organization of the fungal cell wall.
Collapse
|
18
|
Abstract
Integration of mitochondrial DNA fragments into nuclear chromosomes (giving rise to nuclear DNA sequences of mitochondrial origin, or NUMTs) is an ongoing process that shapes nuclear genomes. In yeast this process depends on double-strand-break repair. Since NUMTs lack amplification and specific integration mechanisms, they represent the prototype of exogenous insertions in the nucleus. From sequence analysis of the genome of Homo sapiens, followed by sampling humans from different ethnic backgrounds, and chimpanzees, we have identified 27 NUMTs that are specific to humans and must have colonized human chromosomes in the last 4–6 million years. Thus, we measured the fixation rate of NUMTs in the human genome. Six such NUMTs show insertion polymorphism and provide a useful set of DNA markers for human population genetics. We also found that during recent human evolution, Chromosomes 18 and Y have been more susceptible to colonization by NUMTs. Surprisingly, 23 out of 27 human-specific NUMTs are inserted in known or predicted genes, mainly in introns. Some individuals carry a NUMT insertion in a tumor-suppressor gene and in a putative angiogenesis inhibitor. Therefore in humans, but not in yeast, NUMT integrations preferentially target coding or regulatory sequences. This is indeed the case for novel insertions associated with human diseases and those driven by environmental insults. We thus propose a mutagenic phenomenon that may be responsible for a variety of genetic diseases in humans and suggest that genetic or environmental factors that increase the frequency of chromosome breaks provide the impetus for the continued colonization of the human genome by mitochondrial DNA. DNA from mitochondria has regularly inserted into the human nuclear genome. Some insertions are polymorphic, revealing that the invasion of the human genome is an ongoing process
Collapse
MESH Headings
- Algorithms
- Animals
- Base Sequence
- Biological Evolution
- Cell Lineage
- Cell Nucleus/metabolism
- Chromosomes, Human/ultrastructure
- Chromosomes, Human, Pair 18
- Chromosomes, Human, Y/genetics
- Computational Biology/methods
- DNA
- DNA Transposable Elements
- DNA, Mitochondrial/genetics
- Databases, Genetic
- Evolution, Molecular
- Gene Duplication
- Genetic Markers
- Genome
- Genome, Human
- Humans
- Models, Genetic
- Molecular Sequence Data
- Mutagenesis
- Pan troglodytes/genetics
- Phylogeny
- Polymorphism, Genetic
- Sequence Analysis, DNA
- Time Factors
Collapse
|
19
|
Abstract
Identifying the mechanisms of eukaryotic genome evolution by comparative genomics is often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism. The hemiascomycete yeasts, with their compact genomes, similar lifestyle and distinct sexual and physiological properties, provide a unique opportunity to explore such mechanisms. We present here the complete, assembled genome sequences of four yeast species, selected to represent a broad evolutionary range within a single eukaryotic phylum, that after analysis proved to be molecularly as diverse as the entire phylum of chordates. A total of approximately 24,200 novel genes were identified, the translation products of which were classified together with Saccharomyces cerevisiae proteins into about 4,700 families, forming the basis for interspecific comparisons. Analysis of chromosome maps and genome redundancies reveal that the different yeast lineages have evolved through a marked interplay between several distinct molecular mechanisms, including tandem gene repeat formation, segmental duplication, a massive genome duplication and extensive gene loss.
Collapse
|
20
|
Abstract
We screened nearly one thousand random sequenced targets obtained by partial sequencing of 13 hemiascomycete genomes identified by higher amino acid sequence similarity to a non-Saccharomyces cerevisiae protein than to a S. Cerevisiae protein. Among those sequences we have identified 36 novel phylogenetic clusters of putative transporters which, according to the Transport Commission system (TC-DB, 2002; http:// tcdb.ucsd.edu/tcdb), do not belong to acknowledged S. Cerevisiae protein families [De Hertogh et al.: Funct. Integr. Genomics 2002;2:154-170; http://cbi.labri.u-bordeaux.fr/Genolevures]. These novel hemiascomycete transporters comprise 3 channels, 23 secondary transporters, 5 primary transporters and 5 membrane proteins of unknown function.
Collapse
|
21
|
A novel design of whole-genome microarray probes for Saccharomyces cerevisiae which minimizes cross-hybridization. BMC Genomics 2003; 4:38. [PMID: 14499002 PMCID: PMC239980 DOI: 10.1186/1471-2164-4-38] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2003] [Accepted: 09/22/2003] [Indexed: 12/19/2022] Open
Abstract
Background Numerous DNA microarray hybridization experiments have been performed in yeast over the last years using either synthetic oligonucleotides or PCR-amplified coding sequences as probes. The design and quality of the microarray probes are of critical importance for hybridization experiments as well as subsequent analysis of the data. Results We present here a novel design of Saccharomyces cerevisiae microarrays based on a refined annotation of the genome and with the aim of reducing cross-hybridization between related sequences. An effort was made to design probes of similar lengths, preferably located in the 3'-end of reading frames. The sequence of each gene was compared against the entire yeast genome and optimal sub-segments giving no predicted cross-hybridization were selected. A total of 5660 novel probes (more than 97% of the yeast genes) were designed. For the remaining 143 genes, cross-hybridization was unavoidable. Using a set of 18 deletant strains, we have experimentally validated our cross-hybridization procedure. Sensitivity, reproducibility and dynamic range of these new microarrays have been measured. Based on this experience, we have written a novel program to design long oligonucleotides for microarray hybridizations of complete genome sequences. Conclusions A validated procedure to predict cross-hybridization in microarray probe design was defined in this work. Subsequently, a novel Saccharomyces cerevisiae microarray (which minimizes cross-hybridization) was designed and constructed. Arrays are available at Eurogentec S. A. Finally, we propose a novel design program, OliD, which allows automatic oligonucleotide design for microarrays. The OliD program is available from authors.
Collapse
|
22
|
Expressed sequence tag analysis of the human pathogen Paracoccidioides brasiliensis yeast phase: identification of putative homologues of Candida albicans virulence and pathogenicity genes. EUKARYOTIC CELL 2003; 2:34-48. [PMID: 12582121 PMCID: PMC141168 DOI: 10.1128/ec.2.1.34-48.2003] [Citation(s) in RCA: 141] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2002] [Accepted: 10/23/2002] [Indexed: 11/20/2022]
Abstract
Paracoccidioides brasiliensis, a thermodimorphic fungus, is the causative agent of the prevalent systemic mycosis in Latin America, paracoccidioidomycosis. We present here a survey of expressed genes in the yeast pathogenic phase of P. brasiliensis. We obtained 13,490 expressed sequence tags from both 5' and 3' ends. Clustering analysis yielded the partial sequences of 4,692 expressed genes that were functionally classified by similarity to known genes. We have identified several Candida albicans virulence and pathogenicity homologues in P. brasiliensis. Furthermore, we have analyzed the expression of some of these genes during the dimorphic yeast-mycelium-yeast transition by real-time quantitative reverse transcription-PCR. Clustering analysis of the mycelium-yeast transition revealed three groups: (i) RBT, hydrophobin, and isocitrate lyase; (ii) malate dehydrogenase, contigs Pb1067 and Pb1145, GPI, and alternative oxidase; and (iii) ubiquitin, delta-9-desaturase, HSP70, HSP82, and HSP104. The first two groups displayed high mRNA expression in the mycelial phase, whereas the third group showed higher mRNA expression in the yeast phase. Our results suggest the possible conservation of pathogenicity and virulence mechanisms among fungi, expand considerably gene identification in P. brasiliensis, and provide a broader basis for further progress in understanding its biological peculiarities.
Collapse
|
23
|
Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene 2002; 297:51-60. [PMID: 12384285 DOI: 10.1016/s0378-1119(02)00871-5] [Citation(s) in RCA: 146] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Can we infer the lifestyle of an organism from the characteristic properties of its genome? More precisely, what are the relations between easily quantifiable properties from genomic sequences, such as amino-acid compositions, and more subtle characteristics concerning for example lifestyles or evolutionary trends? Here, we seek a global picture for such properties, based on a large number (56) of complete genomes, including significant numbers of representatives from the three domains of life. We consider the amino acid compositions of the predicted proteomes, and we use correspondence analysis, as a multivariate method to extract the relevant information from the large-scale data. From these analyses we derive a series of conclusions, concerning lifestyles, as well as physico-chemical and evolutionary trends: (1) correspondence analysis of the amino acid compositions permits discrimination between the three known lifestyles (mesophily/thermophily/hyperthermophily). (2) For various organisms, amino-acid composition properties are essentially driven by GC content, and to a significantly lesser extent by growth temperatures associated with lifestyles. Roughly speaking, the respective contributions of these two components are 57 and 20%. It is notable that these proportions are essentially unchanged with respect to a previous analysis (Nature 393 (1998) 537), which involved only 15 genomes, available at the time. (3) In terms of amino acid compositional biases, two specific 'signatures' for thermophily (in a broad sense, including hyperthermophily) can be detected. First, thermophilic species display a relative abundance in glutamic acid (Glu), concomitantly with the depletion in glutamine. Second, in thermophilic species, the relative abundance in Glu (negative charge) is significantly correlated (Pearson correlation coefficient r=0.83 with P<0.0001), with the increase in the lumped 'pool' lysine+arginine (positive charges). This correlation (absent in mesophiles) could be interpreted on a physico-chemical basis, relevant to the thermostability of proteins. (4) Statistically significant differences are observed between the average lengths of the genes in the surveyed species, which follow their distribution between the three domains of life. Also a significant difference is observed between the average lengths of thermophilic (283.0+/-5.8) versus mesophilic (340+/-9.4) genes. It is thus possible that the 'general' shortening of the primary sequences in thermophilic proteins plays a role in thermostability. (5) Considering various combinations of conservation properties (genes conserved exclusively in eukaryotes, in archaea, in bacteria, in combinations of two domains, etc.) correspondence analysis reveals a trend towards thermophilic-hyperthermophilic profiles for the most conserved subset of genes (ancient genes). (6) When limited to the subset of species-specific genes, correspondence analysis leads to a different picture for the clustering of genomes following amino-acid compositions: for example, the 'core' specific part of a genome can bear lifestyle signatures different from those of the complete genome.Various results are discussed both on methodological and biological grounds. The evolutionary perspectives opened by our analyses are noted.
Collapse
|
24
|
Otoancorin, an inner ear protein restricted to the interface between the apical surface of sensory epithelia and their overlying acellular gels, is defective in autosomal recessive deafness DFNB22. Proc Natl Acad Sci U S A 2002; 99:6240-5. [PMID: 11972037 PMCID: PMC122933 DOI: 10.1073/pnas.082515999] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A 3,673-bp murine cDNA predicted to encode a glycosylphosphatidylinositol-anchored protein of 1,088 amino acids was isolated during a study aimed at identifying transcripts specifically expressed in the inner ear. This inner ear-specific protein, otoancorin, shares weak homology with megakaryocyte potentiating factor/mesothelin precursor. Otoancorin is located at the interface between the apical surface of the inner ear sensory epithelia and their overlying acellular gels. In the cochlea, otoancorin is detected at two attachment zones of the tectorial membrane, a permanent one along the top of the spiral limbus and a transient one on the surface of the developing greater epithelial ridge. In the vestibule, otoancorin is present on the apical surface of nonsensory cells, where they contact the otoconial membranes and cupulae. The identification of the mutation (IVS12+2T>C) in the corresponding gene OTOA in one consanguineous Palestinian family affected by nonsyndromic recessive deafness DFNB22 assigns an essential function to otoancorin. We propose that otoancorin ensures the attachment of the inner ear acellular gels to the apical surface of the underlying nonsensory cells.
Collapse
|
25
|
The decaying genome of Mycobacterium leprae. LEPROSY REV 2001; 72:387-98. [PMID: 11826475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
Everything that we need to know about Mycobacterium leprae, a close relative of the tubercle bacillus, is encrypted in its genome. Inspection of the 3.27 Mb genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus identified 1,605 genes encoding proteins and 50 genes for stable RNA species. Comparison with the genome sequence of Mycobacterium tuberculosis revealed an extreme case of reductive evolution, since less than half of the genome contains functional genes while inactivated or pseudogenes are highly abundant. The level of gene duplication was approximately 34% and, on classification of the proteins into families, the largest functional groups were found to be involved in the metabolism and modification of fatty acids and polyketides, transport of metabolites, cell envelope synthesis and gene regulation. Reductive evolution, gene decay and genome downsizing have eliminated entire metabolic pathways, together with their regulatory circuits and accessory functions, particularly those involved in catabolism. This may explain the unusually long generation time and account for our inability to culture the leprosy bacillus.
Collapse
|
26
|
Transcript profiling in Candida albicans reveals new cellular functions for the transcriptional repressors CaTup1, CaMig1 and CaNrg1. Mol Microbiol 2001; 42:981-93. [PMID: 11737641 DOI: 10.1046/j.1365-2958.2001.02713.x] [Citation(s) in RCA: 179] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The pathogenic fungus, Candida albicans contains homologues of the transcriptional repressors ScTup1, ScMig1 and ScNrg1 found in budding yeast. In Saccharomyces cerevisiae, ScMig1 targets the ScTup1/ScSsn6 complex to the promoters of glucose repressed genes to repress their transcription. ScNrg1 is thought to act in a similar manner at other promoters. We have examined the roles of their homologues in C. albicans by transcript profiling with an array containing 2002 genes, representing about one quarter of the predicted number of open reading frames (ORFs) in C. albicans. The data revealed that CaNrg1 and CaTup1 regulate a different set of C. albicans genes from CaMig1 and CaTup1. This is consistent with the idea that CaMig1 and CaNrg1 target the CaTup1 repressor to specific subsets of C. albicans genes. However, CaMig1 and CaNrg1 repress other C. albicans genes in a CaTup1-independent fashion. The targets of CaMig1 and CaNrg1 repression, and phenotypic analyses of nrg1/nrg1 and mig1/mig1 mutants, indicate that these factors play differential roles in the regulation of metabolism, cellular morphogenesis and stress responses. Hence, the data provide important information both about the modes of action of these transcriptional regulators and their cellular roles. The transcript profiling data are available at http://www.pasteur.fr/recherche/unites/RIF/transcriptdata/.
Collapse
|
27
|
NRG1 represses yeast-hypha morphogenesis and hypha-specific gene expression in Candida albicans. EMBO J 2001; 20:4742-52. [PMID: 11532938 PMCID: PMC125592 DOI: 10.1093/emboj/20.17.4742] [Citation(s) in RCA: 331] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2001] [Revised: 07/03/2001] [Accepted: 07/09/2001] [Indexed: 11/12/2022] Open
Abstract
We have characterized CaNrg1 from Candida albicans, the major fungal pathogen in humans. CaNrg1 contains a zinc finger domain that is conserved in transcriptional regulators from fungi to humans. It is most closely related to ScNrg1, which represses transcription in a Tup1-dependent fashion in Saccharomyces cerevisiae. Inactivation of CaNrg1 in C.albicans causes filamentous and invasive growth, derepresses hypha-specific genes, increases sensitivity to some stresses and attenuates virulence. A tup1 mutant displays similar phenotypes. However, unlike tup1 cells, nrg1 cells can form normal hyphae, generate chlamydospores at normal rates and grow at 42 degrees C. Transcript profiling of 2002 C.albicans genes reveals that CaNrg1 represses a subset of CaTup1-regulated genes, which includes known hypha-specific genes and other virulence factors. Most of these genes contain an Nrg1 response element (NRE) in their promoter. CaNrg1 interacts specifically with an NRE in vitro. Also, deletion of two NREs from the ALS8 promoter releases it from Nrg1-mediated repression. Hence, CaNrg1 is a transcriptional repressor that appears to target CaTup1 to a distinct set of virulence-related functions, including yeast-hypha morphogenesis.
Collapse
|
28
|
Genomic exploration of the hemiascomycetous yeasts: 3. Methods and strategies used for sequence analysis and annotation. FEBS Lett 2000; 487:17-30. [PMID: 11152878 DOI: 10.1016/s0014-5793(00)02274-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The primary analysis of the sequences for our Hemiascomycete random sequence tag (RST) project was performed using a combination of classical methods for sequence comparison and contig assembly, and of specifically written scripts and computer visualization routines. Comparisons were performed first against DNA and protein sequences from Saccharomyces cerevisiae, then against protein sequences from other completely sequenced organisms and, finally, against protein sequences from all other organisms. Blast alignments were individually inspected to help recognize genes within our random genomic sequences despite the fact that only parts of them were available. For each yeast species, validated alignments were used to infer the proper genetic code, to determine codon usage preferences and to calculate their degree of sequence divergence with S. cerevisiae. The quality of each genomic library was monitored from contig analysis of the DNA sequences. Annotated sequences were submitted to the EMBL database, and the general annotation tables produced served as a basis for our comparative description of the evolution, redundancy and function of the Hemiascomycete genomes described in other articles of this issue.
Collapse
|
29
|
Abstract
This paper reports the genomic analysis of strain CBS732 of Zygosaccharomyces rouxii, a homothallic diploid yeast. We explored the sequences of 4934 random sequencing tags of about 1 kb in size and compared them to the Saccharomyces cerevisiae gene products. Approximately 2250 nuclear genes, 57 tRNAs, the rDNA locus, the endogenous pSR1 plasmid and 15 mitochondrial genes were identified. According to 18S and 25S rRNA cladograms and to synteny analysis, Z. rouxii could be placed among the S. cerevisiae sensu lato yeasts.
Collapse
|
30
|
Genomic exploration of the hemiascomycetous yeasts: 18. Comparative analysis of chromosome maps and synteny with Saccharomyces cerevisiae. FEBS Lett 2000; 487:101-12. [PMID: 11152893 DOI: 10.1016/s0014-5793(00)02289-4] [Citation(s) in RCA: 66] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
We have analyzed the evolution of chromosome maps of Hemiascomycetes by comparing gene order and orientation of the 13 yeast species partially sequenced in this program with the genome map of Saccharomyces cerevisiae. From the analysis of nearly 8000 situations in which two distinct genes having homologs in S. cerevisiae could be identified on the sequenced inserts of another yeast species, we have quantified the loss of synteny, the frequency of single gene deletion and the occurrence of gene inversion. Traces of ancestral duplications in the genome of S. cerevisiae could be identified from the comparison with the other species that do not entirely coincide with those identified from the comparison of S. cerevisiae with itself. From such duplications and from the correlation observed between gene inversion and loss of synteny, a model is proposed for the molecular evolution of Hemiascomycetes. This model, which can possibly be extended to other eukaryotes, is based on the reiteration of events of duplication of chromosome segments, creating transient merodiploids that are subsequently resolved by single gene deletion events.
Collapse
|
31
|
Abstract
Comparisons of the 6213 predicted Saccharomyces cerevisiae open reading frame (ORF) products with sequences from organisms of other biological phyla differentiate genes commonly conserved in evolution from 'maverick' genes which have no homologue in phyla other than the Ascomycetes. We show that a majority of the 'maverick' genes have homologues among other yeast species and thus define a set of 1892 genes that, from sequence comparisons, appear 'Ascomycetes-specific'. We estimate, retrospectively, that the S. cerevisiae genome contains 5651 actual protein-coding genes, 50 of which were identified for the first time in this work, and that the present public databases contain 612 predicted ORFs that are not real genes. Interestingly, the sequences of the 'Ascomycetes-specific' genes tend to diverge more rapidly in evolution than that of other genes. Half of the 'Ascomycetes-specific' genes are functionally characterized in S. cerevisiae, and a few functional categories are over-represented in them.
Collapse
|
32
|
Abstract
We explored the biological diversity of hemiascomycetous yeasts using a set of 22000 newly identified genes in 13 species through BLASTX searches. Genes without clear homologue in Saccharomyces cerevisiae appeared to be conserved in several species, suggesting that they were recently lost by S. cerevisiae. They often identified well-known species-specific traits. Cases of gene acquisition through horizontal transfer appeared to occur very rarely if at all. All identified genes were ascribed to functional classes. Functional classes were differently represented among species. Species classification by functional clustering roughly paralleled rDNA phylogeny. Unequal distribution of rapidly evolving, ascomycete-specific, genes among species and functions was shown to contribute strongly to this clustering. A few cases of gene family amplification were documented, but no general correlation could be observed between functional differentiation of yeast species and variations of gene family sizes. Yeast biological diversity seems thus to result from limited species-specific gene losses or duplications, and for a large part from rapid evolution of genes and regulatory factors dedicated to specific functions.
Collapse
|
33
|
Genomic exploration of the hemiascomycetous yeasts: 20. Evolution of gene redundancy compared to Saccharomyces cerevisiae. FEBS Lett 2000; 487:122-33. [PMID: 11152895 DOI: 10.1016/s0014-5793(00)02291-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
We have evaluated the degree of gene redundancy in the nuclear genomes of 13 hemiascomycetous yeast species. Saccharomyces cerevisiae singletons and gene families appear generally conserved in these species as singletons and families of similar size, respectively. Variations of the number of homologues with respect to that expected affect from 7 to less than 24% of each genome. Since S. cerevisiae homologues represent the majority of the genes identified in the genomes studied, the overall degree of gene redundancy seems conserved across all species. This is best explained by a dynamic equilibrium resulting from numerous events of gene duplication and deletion rather than by a massive duplication event occurring in some lineages and not in others.
Collapse
|
34
|
Abstract
This paper reports the genomic analysis of the strain CBS7064 of Pichia sorbitophila, a homothallic diploid yeast. We sequenced 4829 random sequence tags of about 1 kb and compared them to the Saccharomyces cerevisiae gene products. Approximately 1300 nuclear genes, 22 tRNAs, the rDNA locus, and six mitochondrial genes have been identified. The analysis of the rDNA genes has permitted to classify this organism close to the Candida species. Accession numbers from AL414896 to AL419724 at EMBL databank.
Collapse
|
35
|
Genomic exploration of the hemiascomycetous yeasts: 1. A set of yeast species for molecular evolution studies. FEBS Lett 2000; 487:3-12. [PMID: 11152876 DOI: 10.1016/s0014-5793(00)02272-9] [Citation(s) in RCA: 162] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The identification of molecular evolutionary mechanisms in eukaryotes is approached by a comparative genomics study of a homogeneous group of species classified as Hemiascomycetes. This group includes Saccharomyces cerevisiae, the first eukaryotic genome entirely sequenced, back in 1996. A random sequencing analysis has been performed on 13 different species sharing a small genome size and a low frequency of introns. Detailed information is provided in the 20 following papers. Additional tables available on websites describe the ca. 20000 newly identified genes. This wealth of data, so far unique among eukaryotes, allowed us to examine the conservation of chromosome maps, to identify the 'yeast-specific' genes, and to review the distribution of gene families into functional classes. This project conducted by a network of seven French laboratories has been designated 'Génolevures'.
Collapse
|
36
|
Abstract
Since its completion more than 4 years ago, the sequence of Saccharomyces cerevisiae has been extensively used and studied. The original sequence has received a few corrections, and the identification of genes has been completed, thanks in particular to transcriptome analyses and to specialized studies on introns, tRNA genes, transposons or multigene families. In order to undertake the extensive comparative sequence analysis of this program, we have entirely revisited the S. cerevisiae sequence using the same criteria for all 16 chromosomes and taking into account publicly available annotations for genes and elements that cannot be predicted. Comparison with the other yeast species of this program indicates the existence of 50 novel genes in segments previously considered as 'intergenic' and suggests extensions for 26 of the previously annotated genes.
Collapse
|
37
|
Analysis of the proteome of Mycobacterium tuberculosis in silico. TUBERCLE AND LUNG DISEASE : THE OFFICIAL JOURNAL OF THE INTERNATIONAL UNION AGAINST TUBERCULOSIS AND LUNG DISEASE 2000; 79:329-42. [PMID: 10694977 DOI: 10.1054/tuld.1999.0220] [Citation(s) in RCA: 243] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Novel bioinformatics routines have been used to provide a more detailed definition of the proteome of Mycobacterium tuberculosis H37Rv. Over half of the current proteins result from gene duplication or domain shuffling events while one-sixth show no similarity to polypeptides described in other organisms. Prominent among the genes that appear to have been duplicated on numerous occasions are those involved in fatty acid metabolism, regulation of gene expression, and the unusually glycine-rich PE and PPE proteins. Protein similarity analysis, coupled with inspection of the genetic neighbourhood, was used to explore possible functional relatedness. This uncovered four large mce operons whose proteins may mediate initial interactions between the tubercle bacillus and host cells, together with a cluster of genes that might encode components of a structure required for secretion of ESAT-6 like proteins. Close linkage of the mmpL genes, encoding large membrane proteins, with those required for fatty acid metabolism suggests involvement in lipid transport. Compared to free-living bacteria, M. tuberculosis has a significantly smaller transport protein repertoire and this may reflect its intracellular lifestyle.
Collapse
|
38
|
Detection and genetic polymorphism of human herpes virus type 8 in endemic or epidemic Kaposi's sarcoma from West and Central Africa, and South America. Int J Cancer 2000. [DOI: 10.1002/(sici)1097-0215(20000115)85:2<166::aid-ijc3>3.0.co;2-l] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
39
|
Detection and genetic polymorphism of human herpes virus type 8 in endemic or epidemic Kaposi's sarcoma from West and Central Africa, and South America. Int J Cancer 2000; 85:166-70. [PMID: 10629072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
Kaposi's-sarcoma-associated herpesvirus(KSHV)/human-herpes-virus-8(HHV-8) sequences originally detected in AIDS-associated Kaposi's sarcoma have been found in almost every KS tested, whether endemic, classic, iatrogenic or epidemic. Most of the studies on African KS involved East African patients. We report herewith the study of 17 African or Guyanan KS patients, 3 with epidemic KS (EKS) from Central African Republic, 3 from Senegal (2 EKS and 1 endemic KS), 3 EKS from Cameroon and 8 from French Guiana (3 EKS and 5 endemic KS). Serum-specific antibodies directed against latent and/or lytic HHV-8 antigens were present in 16 of them (94%), detected either by immunofluorescence assay and/or by immunoperoxidase. Polymerase chain reaction (PCR), using specific primers for HHV-8 ORF26 (233 bp) and ORF75 (601 bp), was carried out on DNA extracted from KS cutaneous biopsies, clinically uninvolved skin biopsies and peripheral-blood mononuclear cells (PBMC). HHV-8 DNA was detected in 16 out of 16 (100%) KS biopsies, regardless of their origin or clinico-pathological sub-type, in 7 out of 15 (47%) normal skin samples and 7 out of 16 (44%) PBMC. Comparative PCR, carried out in 7 patients, regularly found a much higher viral load in KS biopsies than in autologous normal skin and PBMC samples. Sequencing of fragments of the ORF26 and of the ORF75 demonstrated that the 16 HHV-8 strains were of the A, B or C sub-type. Furthermore, sequences of the entire ORF K1 of 4 strains showed that these HHV-8 strains of African origin were of the A5 or the B sub-type.
Collapse
|
40
|
Detection and genetic polymorphism of human herpes virus type 8 in endemic or epidemic Kaposi's sarcoma from West and Central Africa, and South America. Int J Cancer 2000. [DOI: 10.1002/(sici)1097-0215(20000115)85:2%3c166::aid-ijc3%3e3.0.co;2-l] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
41
|
Abstract
In this work detailed statistics on ancestral gene duplication and gene conservation in completely sequenced cellular genomes are presented. Analysis of open reading frame (ORF) products having simultaneous matches in several distinct organisms showed a significant correlation between duplication and conservation. Systematic comparisons of predicted proteomes of 23 organisms (including 20 that have been completely sequenced), have allowed us to quantify the degree of ancestral duplication within each genome and the level of conservation between genomes, using threshold values calculated for individual organisms. Statistical analysis of various gene proportions revealed interesting trends in gene structure and evolution, such as that (a) more than one-quarter (25%-66%) of the predicted ORF products of the surveyed organisms are not unique, indicating a high level of ancestral duplications; (b) levels of exclusive conservation within Bacteria are higher than those within the eukaryal or archaeal domains; and (c) at least one-half (47-99%) of the total predicted ORF products in the surveyed genomes have one or several highly significant matches in another genome. Significant matches are based on simulations taking into account the mean size of ORF products and the composition of each target organism's proteome. The methodology we have developed ensures stability and comparability of our results as the number of completely sequenced genomes increases.
Collapse
|
42
|
The genomic tree as revealed from whole proteome comparisons. Genome Res 1999; 9:550-7. [PMID: 10400922 PMCID: PMC310764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/13/2023]
Abstract
The availability of a number of complete cellular genome sequences allows the development of organisms' classification, taking into account their genome content, the loss or acquisition of genes, and overall gene similarities as signatures of common ancestry. On the basis of correspondence analysis and hierarchical classification methods, a methodological framework is introduced here for the classification of the available 20 completely sequenced genomes and partial information for Schizosaccharomyces pombe, Homo sapiens, and Mus musculus. The outcome of such an analysis leads to a classification of genomes that we call a genomic tree. Although these trees are phenograms, they carry with them strong phylogenetic signatures and are remarkably similar to 16S-like rRNA-based phylogenies. Our results suggest that duplication and deletion events that took place through evolutionary time were globally similar in related organisms. The genomic trees presented here place the Archaea in the proximity of the Bacteria when the whole gene content of each organism is considered, and when ancestral gene duplications are eliminated. Genomic trees represent an additional approach for the understanding of evolution at the genomic level and may contribute to the proper assessment of the evolutionary relationships between extant species.
Collapse
|
43
|
Abstract
The availability of a number of complete cellular genome sequences allows the development of organisms’ classification, taking into account their genome content, the loss or acquisition of genes, and overall gene similarities as signatures of common ancestry. On the basis of correspondence analysis and hierarchical classification methods, a methodological framework is introduced here for the classification of the available 20 completely sequenced genomes and partial information for Schizosaccharomyces pombe, Homo sapiens, and Mus musculus. The outcome of such an analysis leads to a classification of genomes that we call a genomic tree. Although these trees are phenograms, they carry with them strong phylogenetic signatures and are remarkably similar to 16S-like rRNA-based phylogenies. Our results suggest that duplication and deletion events that took place through evolutionary time were globally similar in related organisms. The genomic trees presented here place the Archaea in the proximity of the Bacteria when the whole gene content of each organism is considered, and when ancestral gene duplications are eliminated. Genomic trees represent an additional approach for the understanding of evolution at the genomic level and may contribute to the proper assessment of the evolutionary relationships between extant species.
Collapse
|
44
|
Seroepidemiological and molecular studies of human T cell lymphotropic virus type II, subtype b, in isolated groups of Mataco and Toba Indians of northern Argentina. AIDS Res Hum Retroviruses 1999; 15:407-17. [PMID: 10195750 DOI: 10.1089/088922299311150] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We studied plasma samples from 2082 Mataco Indians living in 22 different communities in the western part of Formosa Province, northern Argentina. Samples were screened for HTLV-I/II antibodies by particle agglutination assay. All positive or borderline samples were then tested by an immunofluorescence assay (IFA) on C19 HTLV-II-producing cells. Western blot was used for confirmation of all IFA-positive plasma samples. The crude HTLV-II seroprevalence was 3.0% (62 of 2051), and 0.9% (5 of 588) in children less than 10 years old. The latter result suggests ongoing mother-to-child transmission, probably by breast feeding. There was a marked increase in HTLV-II seroprevalence with age (0.9%, 0-10 years; 1.6%, 11-20 years; 4.4%, 21-30 years; 3.4%, 31-40 years; 7.2%, 41-50 years; 5.7%, >50 years) in both male (p = 0.002) and female subjects (p = 0.00002). None of the 80 non-Indian inhabitants tested was HTLV-I/II seropositive. In a second study, among 105 Toba Indians from a village (Primavera) of the eastern part of this region, 23 were HTLV-II seropositive with a seroprevalence of 59% in those more than 40 years old. From seven of the Indians from Primavera, three others from neighboring regions (including two Tobas and one Pilaga), and one intravenous drug user (IVDU) from Rosario, DNA was extracted from peripheral blood mononuclear cells, and the gp21 transmembrane-encoding gene (590 bp) was amplified by PCR, cloned, and sequenced. LTR sequences were also obtained from the Pilaga, the IVDU, and one Toba. Molecular and phylogenetic analyses revealed that the Indians were all infected with closely related HTLV-II molecular strains belonging to the b subtype, while the IVDU was infected with an HTLV-II subtype a variant. Such data help to make a phylogenetic atlas of HTLV-II among Amerindian tribes and are crucial to gain new insights into the origin and modes of dissemination of this human retrovirus in the Americas.
Collapse
|
45
|
Random exploration of the Kluyveromyces lactis genome and comparison with that of Saccharomyces cerevisiae. Nucleic Acids Res 1998; 26:5511-24. [PMID: 9826779 PMCID: PMC148010 DOI: 10.1093/nar/26.23.5511] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The genome of the yeast Kluyveromyces lactis was explored by sequencing 588 short tags from two random genomic libraries (random sequenced tags, or RSTs), representing altogether 1.3% of the K. lactis genome. After systematic translation of the RSTs in all six possible frames and comparison with the complete set of proteins predicted from the Saccharomyces cerevisiae genomic sequence using an internally standardized threshold, 296 K.lactis genes were identified of which 292 are new. This corresponds to approximately 5% of the estimated genes of this organism and triples the total number of identified genes in this species. Of the novel K.lactis genes, 169 (58%) are homologous to S.cerevisiae genes of known or assigned functions, allowing tentative functional assignment, but 59 others (20%) correspond to S.cerevisiae genes of unknown function and previously without homolog among all completely sequenced genomes. Interestingly, a lower degree of sequence conservation is observed in this latter class. In nearly all instances in which the novel K.lactis genes have homologs in different species, sequence conservation is higher with their S.cerevisiae counterparts than with any of the other organisms examined. Conserved gene order relationships (synteny) between the two yeast species are also observed for half of the cases studied.
Collapse
|
46
|
Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 1998. [DOI: 10.1038/24206] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
47
|
Abstract
Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.
Collapse
|
48
|
Molecular epidemiology of HTLV-II among United States blood donors and intravenous drug users: an age-cohort effect for HTLV-II RFLP type aO. Virology 1998; 242:425-34. [PMID: 9514966 DOI: 10.1006/viro.1997.9009] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Molecular subtyping was used to investigate the epidemiology of human T-lymphotropic virus type II (HTLV-II) in the United States. Nested polymerase chain reaction of the HTLV-II long terminal repeat region followed by restriction fragment length polymorphism (RFLP) analysis was performed on HTLV-II seropositive subjects including 97 U.S. blood donors without major risk factors for HTLV-II infection, 53 injection drug users (IDU), and 10 American Indian blood donors. Three new HTLV-II RFLP types were confirmed with DNA sequencing and phylogenetic analysis. HTLV-II RFLP type aO (Switzer classification) was associated with older age [adjusted odds ratio (OR) 1.06 per year of age, 95% confidence interval (CI) 1.02-1.09] and with Black (OR 5.24, 95% CI 1.90-14.47) and White (OR 4.43, 95% CI 1.67-11.75) race/ethnicity. These data are consistent with an age-cohort effect for HTLV-II RFLP type aO among older White and Black IDU and blood donors. This finding could be explained by an epidemic of non-aO HTLV-II RFLP types among younger persons of Hispanic and other race/ethnicity, superimposed upon endemic HTLV-II RFLP type aO among older Black and White persons.
Collapse
|
49
|
Molecular epidemiology of 58 new African human T-cell leukemia virus type 1 (HTLV-1) strains: identification of a new and distinct HTLV-1 molecular subtype in Central Africa and in Pygmies. J Virol 1997; 71:1317-33. [PMID: 8995656 PMCID: PMC191187 DOI: 10.1128/jvi.71.2.1317-1333.1997] [Citation(s) in RCA: 105] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
To gain new insights on the origin, evolution, and modes of dissemination of human T-cell leukemia virus type I (HTLV-1), we performed a molecular analysis of 58 new African HTLV-1 strains (18 from West Africa, 36 from Central Africa, and 4 from South Africa) originating from 13 countries. Of particular interest were eight strains from Pygmies of remote areas of Cameroon and the Central African Republic (CAR), considered to be the oldest inhabitants of these regions. Eight long-term activated T-cell lines producing HTLV-1 gag and env antigens were established from peripheral blood mononuclear cell cultures of HTLV-1 seropositive individuals, including three from Pygmies. A fragment of the env gene encompassing most of the gp21 transmembrane region was sequenced for the 58 new strains, while the complete long terminal repeat (LTR) region was sequenced for 9 strains, including 4 from Pygmies. Comparative sequence analyses and phylogenetic studies performed on both the env and LTR regions by the neighbor-joining and DNA parsimony methods demonstrated that all 22 strains from West and South Africa belong to the widespread cosmopolitan subtype (also called HTLV-1 subtype A). Within or alongside the previously described Zairian cluster (HTLV-1 subtype B), we discovered a number of new HTLV-1 variants forming different subgroups corresponding mainly to the geographical origins of the infected persons, Cameroon, Gabon, and Zaire. Six of the eight Pygmy strains clustered together within this Central African subtype, suggesting a common origin. Furthermore, three new strains (two originating from Pygmies from Cameroon and the CAR, respectively, and one from a Gabonese individual) were particularly divergent and formed a distinct new phylogenetic cluster, characterized by specific mutations and occupying in most analyses a unique phylogenetic position between the large Central African genotype (HTLV-1 subtype B) and the Melanesian subtype (HTLV-1 subtype C). We have tentatively named this new HTLV-1 genotype HTLV-1 subtype D. While the HTLV-1 subtype D strains were not closely related to any known African strain of simian T-cell leukemia virus type 1 (STLV-1), other Pygmy strains and some of the new Cameroonian and Gabonese HTLV-1 strains were very similar (>98% nucleotide identity) to chimpanzee STLV-1 strains, reinforcing the hypothesis of interspecies transmission between humans and monkeys in Central Africa.
Collapse
|
50
|
Cloning and characterisation of a gene from Plasmodium vivax and P. knowlesi: homology with valine-tRNA synthetase. Gene 1996; 173:137-45. [PMID: 8964490 DOI: 10.1016/0378-1119(96)00235-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
We have previously described a lambdagt11 clone detected by immune screening with a monoclonal antibody (mAb) A12. This mAb is capable of completely blocking Plasmodium vivax transmission in the mosquito vector. An epitope recognised by A12 was mapped to six amino acids (aa) within the translated sequence of this clone. Here, we describe the complete sequence of the gene within which we mapped this epitope. Surprisingly, the translated sequence of the full-length open reading frame shows homology with that of valine-tRNA synthetases (Val-tRS) from other organisms. DNA cross-hybridisation with several of these species was observed by Southern blot. In addition, the corresponding gene has been obtained from the closely related simian malaria parasite, P. knowlesi. The two aa sequences show 66% identity and yet are very divergent from other Val-tRS sequences, apart from conserved blocks related to functional activity. Multiple sequence alignments reflect this dichotomy, as do predicted differences in antigenicity.
Collapse
|