1
|
Loenenbach A, Pawlita M, Waterboer T, Harder T, Poethko-Müller C, Thamm M, Lachmann R, Deleré Y, Wichmann O, Wiese-Posselt M. Seroprevalence of mucosal and cutaneous human papillomavirus (HPV) types among children and adolescents in the general population in Germany. BMC Infect Dis 2022; 22:44. [PMID: 35012452 PMCID: PMC8751243 DOI: 10.1186/s12879-022-07028-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 12/28/2021] [Indexed: 11/30/2022] Open
Abstract
Background In Germany, HPV vaccination of adolescent girls was introduced in 2007. Nationally representative data on the distribution of vaccine-relevant HPV types in the pre-vaccination era are, however, only available for the adult population. To obtain data in children and adolescents, we assessed the prevalence and determinants of serological response to 16 different HPV types in a representative sample of 12,257 boys and girls aged 1–17 years living in Germany in 2003–2005. Methods Serum samples were tested for antibodies to nine mucosal and seven cutaneous HPV types. The samples had been collected during the nationally representative German Health Interview and Examination Survey for Children and Adolescents in 2003–2006. We calculated age- and gender-specific HPV seroprevalence. We used multivariable regression models to identify associations between demographic and behavioral characteristics and HPV seropositivity. Results We found low but non-zero seroprevalence for the majority of tested HPV types among children and adolescents in Germany. The overall seroprevalence of HPV-16 was 2.6%, with slightly higher values in adolescents. Seroprevalence of all mucosal types but HPV-6 ranged from 0.6% for HPV-33, to 6.4% for HPV-31 and did not differ by gender. We found high overall seroprevalence for HPV-6 with 24.8%. Cutaneous HPV type seroprevalence ranged from 4.0% for HPV-38 to 31.7% for HPV-1. In the majority of cutaneous types, seroprevalence did not differ between boys and girls, but increased sharply with age, (e.g., HPV-1 from 1.5% in 1–3-years-old to 45.1% in 10–11-years-old). Associations between behavioral factors and type-specific HPV prevalence were determined to be heterogeneous. Conclusions We report the first nationally representative data of naturally acquired HPV antibody reactivity in the pre-HPV-vaccination era among children and adolescents living in Germany. These data can be used as baseline estimates for evaluating the impact of the current HPV vaccination strategy targeting 9–14-years-old boys and girls. Supplementary Information The online version contains supplementary material available at 10.1186/s12879-022-07028-8.
Collapse
Affiliation(s)
- Anna Loenenbach
- Department for Infectious Disease Epidemiology, Immunization Unit, Robert Koch-Institute, Berlin, Germany. .,Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt- Universität zu Berlin, Berlin, Germany.
| | - Michael Pawlita
- Infections and Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Tim Waterboer
- Infections and Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Thomas Harder
- Department for Infectious Disease Epidemiology, Immunization Unit, Robert Koch-Institute, Berlin, Germany
| | | | - Michael Thamm
- Department of Epidemiology and Health Monitoring, Robert Koch-Institute, Berlin, Germany
| | - Raskit Lachmann
- Department for Infectious Disease Epidemiology, Immunization Unit, Robert Koch-Institute, Berlin, Germany
| | | | - Ole Wichmann
- Department for Infectious Disease Epidemiology, Immunization Unit, Robert Koch-Institute, Berlin, Germany
| | - Miriam Wiese-Posselt
- Department for Infectious Disease Epidemiology, Immunization Unit, Robert Koch-Institute, Berlin, Germany.,Institute of Hygiene and Environmental Medicine, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt- Universität zu Berlin, Berlin, Germany
| |
Collapse
|
2
|
A Puzzling Anomaly in the 4-Mer Composition of the Giant Pandoravirus Genomes Reveals a Stringent New Evolutionary Selection Process. J Virol 2019; 93:JVI.01206-19. [PMID: 31534042 DOI: 10.1128/jvi.01206-19] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 09/10/2019] [Indexed: 12/26/2022] Open
Abstract
Pandoraviridae is a rapidly growing family of giant viruses, all of which have been isolated using laboratory strains of Acanthamoeba The genomes of 10 distinct strains have been fully characterized, reaching up to 2.5 Mb in size. These double-stranded DNA genomes encode the largest of all known viral proteomes and are propagated in oblate virions that are among the largest ever described (1.2 μm long and 0.5 μm wide). The evolutionary origin of these atypical viruses is the object of numerous speculations. Applying the chaos game representation to the pandoravirus genome sequences, we discovered that the tetranucleotide (4-mer) "AGCT" is totally absent from the genomes of 2 strains (Pandoravirus dulcis and Pandoravirus quercus) and strongly underrepresented in others. Given the amazingly low probability of such an observation in the corresponding randomized sequences, we investigated its biological significance through a comprehensive study of the 4-mer compositions of all viral genomes. Our results indicate that AGCT was specifically eliminated during the evolution of the Pandoraviridae and that none of the previously proposed host-virus antagonistic relationships could explain this phenomenon. Unlike the three other families of giant viruses (Mimiviridae, Pithoviridae, and Molliviridae) infecting the same Acanthamoeba host, the pandoraviruses exhibit a puzzling genomic anomaly suggesting a highly specific DNA editing in response to a new kind of strong evolutionary pressure.IMPORTANCE Recent years have seen the discovery of several families of giant DNA viruses infecting the ubiquitous amoebozoa of the genus Acanthamoeba With double-stranded DNA (dsDNA) genomes reaching 2.5 Mb in length packaged in oblate particles the size of a bacterium, the pandoraviruses are currently the most complex and largest viruses known. In addition to their spectacular dimensions, the pandoraviruses encode the largest proportion of proteins without homologs in other organisms, which is thought to result from a de novo gene creation process. While using comparative genomics to investigate the evolutionary forces responsible for the emergence of such an unusual giant virus family, we discovered a unique bias in the tetranucleotide composition of the pandoravirus genomes that can result only from an undescribed evolutionary process not encountered in any other microorganism.
Collapse
|
3
|
Abstract
The secondary structure of an RNA molecule represents the base-pairing interactions within the molecule and fundamentally determines its overall structure. In this chapter, we overview the main approaches and existing tools for predicting RNA secondary structures, as well as methods for identifying noncoding RNAs from genomic sequences or RNA sequencing data. We then focus on the identification of a well-known class of small noncoding RNAs, namely microRNAs, which play very important roles in many biological processes through regulating post-transcriptionally the expression of genes and which dysregulation has been shown to be involved in several human diseases.
Collapse
Affiliation(s)
- Fariza Tahi
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France.
- IPS2, University of Paris-Saclay, 91190, Gif-sur-Yvette, France.
| | - Van Du T Tran
- Vital-IT group, SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Anouar Boucheham
- IBISC, UEVE/Genopole, 23 bv. de France, 91000, Evry, France
- College of NTIC, Constantine University 2, Constantine, Algeria
| |
Collapse
|
4
|
Boyce K, Sievers F, Higgins DG. Instability in progressive multiple sequence alignment algorithms. Algorithms Mol Biol 2015; 10:26. [PMID: 26457114 PMCID: PMC4599319 DOI: 10.1186/s13015-015-0057-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 09/29/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Progressive alignment is the standard approach used to align large numbers of sequences. As with all heuristics, this involves a tradeoff between alignment accuracy and computation time. RESULTS We examine this tradeoff and find that, because of a loss of information in the early steps of the approach, the alignments generated by the most common multiple sequence alignment programs are inherently unstable, and simply reversing the order of the sequences in the input file will cause a different alignment to be generated. Although this effect is more obvious with larger numbers of sequences, it can also be seen with data sets in the order of one hundred sequences. We also outline the means to determine the number of sequences in a data set beyond which the probability of instability will become more pronounced. CONCLUSIONS This has major ramifications for both the designers of large-scale multiple sequence alignment algorithms, and for the users of these alignments.
Collapse
|
5
|
Waterman MS. Sequence alignments in the neighborhood of the optimum with general application to dynamic programming. Proc Natl Acad Sci U S A 2010; 80:3123-4. [PMID: 16593315 PMCID: PMC393987 DOI: 10.1073/pnas.80.10.3123] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
When applying dynamic programming techniques to obtain optimal sequence alignments, a set of weights must be assigned to mismatches, insertion/deletions, etc. These weights are not predetermined, although efforts are being made to deduce biologically meaningful values from data. In addition, there are sometimes unknown constraints on the sequences that cause the "true" alignment to disagree with the optimum (computer) solution. To assist in overcoming these difficulties, an algorithm has been developed to produce all alignments within a specified distance of the optimum. The distance can be chosen after the optimum is computed, and the algorithm can be repeated at will. Earlier algorithms to solve this problem were very complex and not practical for any case involving sequences with significant time or storage requirements. The algorithm presented here overcomes these difficulties and has application to general, discrete dynamic programming problems.
Collapse
Affiliation(s)
- M S Waterman
- Department of Mathematics, University of Southern California, Los Angeles, California 90089-1113
| |
Collapse
|
6
|
Nakato R, Gotoh O. Cgaln: fast and space-efficient whole-genome alignment. BMC Bioinformatics 2010; 11:224. [PMID: 20433723 PMCID: PMC2873541 DOI: 10.1186/1471-2105-11-224] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2010] [Accepted: 04/30/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whole-genome sequence alignment is an essential process for extracting valuable information about the functions, evolution, and peculiarities of genomes under investigation. As available genomic sequence data accumulate rapidly, there is great demand for tools that can compare whole-genome sequences within practical amounts of time and space. However, most existing genomic alignment tools can treat sequences that are only a few Mb long at once, and no state-of-the-art alignment program can align large sequences such as mammalian genomes directly on a conventional standalone computer. RESULTS We previously proposed the CGAT (Coarse-Grained AlignmenT) algorithm, which performs an alignment job in two steps: first at the block level and then at the nucleotide level. The former is "coarse-grained" alignment that can explore genomic rearrangements and reduce the sizes of the regions to be analyzed in the next step. The latter is detailed alignment within limited regions. In this paper, we present an update of the algorithm and the open-source program, Cgaln, that implements the algorithm. We compared the performance of Cgaln with those of other programs on whole genomic sequences of several bacteria and of some mammalian chromosome pairs. The results showed that Cgaln is several times faster and more memory-efficient than the best existing programs, while its sensitivity and accuracy are comparable to those of the best programs. Cgaln takes less than 13 hours to finish an alignment between the whole genomes of human and mouse in a single run on a conventional desktop computer with a single CPU and 2 GB memory. CONCLUSIONS Cgaln is not only fast and memory efficient but also effective in coping with genomic rearrangements. Our results show that Cgaln is very effective for comparison of large genomes, especially of intact chromosomal sequences. We believe that Cgaln provides novel viewpoint for reducing computational complexity and will contribute to various fields of genome science.
Collapse
Affiliation(s)
- Ryuichiro Nakato
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto-shi, Kyoto 606-8501, Japan
| | | |
Collapse
|
7
|
Xu D. Computational methods for protein sequence comparison and search. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2009; Chapter 2:2.1.1-2.1.27. [PMID: 19365790 DOI: 10.1002/0471140864.ps0201s56] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein sequence comparison and search has become commonplace not only for bioinformatics researchers but also for experimentalists in many cases. Because of the exponential growth in sequence data, sequence comparison in particular has become an increasingly important tool. Relating a new gene sequence to other known sequences often reveals its function, structure, and evolution. Many sequence comparison and search tools are available through public Web servers, and biologists can use them easily with little knowledge of computers or bioinformatics. This unit provides some theoretical background and describes popular tools for dot plot, sequence search against a database, multiple sequence alignments, protein tree construction, and protein family and motif search. Step-by-step examples are provided to illustrate how to use some of the most well-known tools. Finally, some general advice is given on combining different sequence analysis tools for biological inference.
Collapse
Affiliation(s)
- Dong Xu
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, Missouri
| |
Collapse
|
8
|
Michaels G, Garian R. Computational methods for protein sequence analysis. ACTA ACUST UNITED AC 2008; Chapter 2:Unit2.1. [PMID: 18429149 DOI: 10.1002/0471140864.ps0201s00] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This unit is presented as a guide to addressing the issue of what to do with a protein sequence once it is obtained. A theoretical background for protein sequence analysis is provided first, followed by a discussion of matrix methods for sequence comparison (Matrix Methods for Sequence Comparison: Dot Plots). Sequence similarity searching is then presented, including the BLAST and FASTA databases. Other aspects of protein sequence analysis covered here are alignment methods, scoring matrices, multiple alignments, cluster methods and trees, and identification of functional sites.
Collapse
Affiliation(s)
- G Michaels
- George Mason University, Fairfax, Virginia, USA
| | | |
Collapse
|
9
|
Gotoh O. A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence. Nucleic Acids Res 2008; 36:2630-8. [PMID: 18344523 PMCID: PMC2377433 DOI: 10.1093/nar/gkn105] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The mapping and alignment of transcripts (cDNA, expressed sequence tag or amino acid sequences) onto a genomic sequence is a fundamental step for genome annotation, including gene finding and analyses of transcriptional activity, alternative splicing and nucleotide polymorphisms. As DNA sequence data of genomes and transcripts are accumulating at an unprecedented rate, steady improvement in accuracy, speed and space requirement in the computational tools for mapping/alignment is desired. We devised a multi-phase heuristic algorithm and implemented it in the development of the stand-alone computer program Spaln (space-efficient spliced alignment). Spaln is reasonably fast and space efficient; it requires <1 Gb of memory to map and align >120 000 Unigene sequences onto the unmasked whole human genome with a conventional computer, finishing the job in <6 h. With artificially introduced noise of various levels, Spaln significantly outperforms other leading alignment programs currently available with respect to the accuracy of mapped exon–intron structures. This performance is achieved without extensive learning procedures to adjust parameter values to a particular organism. According to the handiness and accuracy, Spaln may be used for studies on a wide area of genome analyses.
Collapse
Affiliation(s)
- Osamu Gotoh
- Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto 606-8501, Japan.
| |
Collapse
|
10
|
García-Sancho M. Mapping and sequencing information: the social context for the genomics revolution. ENDEAVOUR 2007; 31:18-23. [PMID: 17336383 DOI: 10.1016/j.endeavour.2007.01.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2006] [Revised: 01/25/2007] [Accepted: 01/31/2007] [Indexed: 05/14/2023]
Abstract
In 1983, after devoting some eight years of his life to the description of how a nematode worm develops from an embryo into an adult, molecular biologist John Sulston embarked on a remarkably different project: he decided to map the worm's genome. Sulston's impulsive desire to characterise this creature's DNA from start to finish offers only a partial explanation for this transition. Instead, a close examination of the wider social context for this 'moment' in molecular biology gives a more rewarding explanation of Sulston's intellectual leap. This reveals a world in which biotechnology gradually adapted to and integrated into an 'information society' increasingly dependent on the creation, distribution and manipulation of information. The application of computing to DNA during the first half of the 1980s was crucial for this integration, fostering the emergence of genomics and ultimately the Human Genome Project.
Collapse
|
11
|
Bo X, Lou S, Sun D, Shu W, Yang J, Wang S. Selection of antisense oligonucleotides based on multiple predicted target mRNA structures. BMC Bioinformatics 2006; 7:122. [PMID: 16526963 PMCID: PMC1421440 DOI: 10.1186/1471-2105-7-122] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2005] [Accepted: 03/09/2006] [Indexed: 01/31/2023] Open
Abstract
Background Local structures of target mRNAs play a significant role in determining the efficacies of antisense oligonucleotides (ODNs), but some structure-based target site selection methods are limited by uncertainties in RNA secondary structure prediction. If all the predicted structures of a given mRNA within a certain energy limit could be used simultaneously, target site selection would obviously be improved in both reliability and efficiency. In this study, some key problems in ODN target selection on the basis of multiple predicted target mRNA structures are systematically discussed. Results Two methods were considered for merging topologically different RNA structures into integrated representations. Several parameters were derived to characterize local target site structures. Statistical analysis on a dataset with 448 ODNs against 28 different mRNAs revealed 9 features quantitatively associated with efficacy. Features of structural consistency seemed to be more highly correlated with efficacy than indices of the proportion of bases in single-stranded or double-stranded regions. The local structures of the target site 5' and 3' termini were also shown to be important in target selection. Neural network efficacy predictors using these features, defined on integrated structures as inputs, performed well in "minus-one-gene" cross-validation experiments. Conclusion Topologically different target mRNA structures can be merged into integrated representations and then used in computer-aided ODN design. The results of this paper imply that some features characterizing multiple predicted target site structures can be used to predict ODN efficacy.
Collapse
Affiliation(s)
- Xiaochen Bo
- Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, P R China
| | - Shaoke Lou
- Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, P R China
| | - Daochun Sun
- Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, P R China
| | - Wenjie Shu
- Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, P R China
| | - Jing Yang
- Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, P R China
| | - Shengqi Wang
- Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, P R China
| |
Collapse
|
12
|
Huang X, Yang SP, Chinwalla AT, Hillier LW, Minx P, Mardis ER, Wilson RK. Application of a superword array in genome assembly. Nucleic Acids Res 2006; 34:201-5. [PMID: 16397298 PMCID: PMC1325203 DOI: 10.1093/nar/gkj419] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP.
Collapse
Affiliation(s)
- Xiaoqiu Huang
- Department of Computer Science, Iowa State University, Ames, IA 50011-1040, USA.
| | | | | | | | | | | | | |
Collapse
|
13
|
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol 2004; 5:R12. [PMID: 14759262 PMCID: PMC395750 DOI: 10.1186/gb-2004-5-2-r12] [Citation(s) in RCA: 3677] [Impact Index Per Article: 183.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2003] [Revised: 12/15/2003] [Accepted: 12/17/2003] [Indexed: 11/29/2022] Open
Abstract
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at .
Collapse
Affiliation(s)
- Stefan Kurtz
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | - Adam Phillippy
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Arthur L Delcher
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Michael Smoot
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
- Current address: Department of Computer Science, University of Virginia, Charlottesville, VA 22904, USA
| | - Martin Shumway
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Corina Antonescu
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| | - Steven L Salzberg
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA
| |
Collapse
|
14
|
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol 2004. [PMID: 14759262 DOI: 10.1186/gb-200-5-2-r12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2023] Open
Abstract
The newest version of MUMmer easily handles comparisons of large eukaryotic genomes at varying evolutionary distances, as demonstrated by applications to multiple genomes. Two new graphical viewing tools provide alternative ways to analyze genome alignments. The new system is the first version of MUMmer to be released as open-source software. This allows other developers to contribute to the code base and freely redistribute the code. The MUMmer sources are available at http://www.tigr.org/software/mummer.
Collapse
Affiliation(s)
- Stefan Kurtz
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | | | | | | | | | | | |
Collapse
|
15
|
Abstract
Elucidation of interrelationships among sequence, structure, function, and evolution (FESS relationships) of a family of genes or gene products is a central theme of modern molecular biology. Multiple sequence alignment has been proven to be a powerful tool for many fields of studies such as phylogenetic reconstruction, illumination of functionally important regions, and prediction of higher order structures of proteins and RNAs. However, it is far too trivial to automatically construct a multiple alignment from a set of related sequences. A variety of methods for solving this computationally difficult problem are reviewed. Several important applications of multiple alignment for elucidation of the FESS relationships are also discussed. For a long period, progressive methods have been the only practical means to solve a multiple alignment problem of appreciable size. This situation is now changing with the development of new techniques including several classes of iterative methods. Today's progress in multiple sequence alignment methods has been made by the multidisciplinary endeavors of mathematicians, computer scientists, and biologists in various fields including biophysicists in particular. The ideas are also originated from various backgrounds, pure algorithmics, statistics, thermodynamics, and others. The outcomes are now enjoyed by researchers in many fields of biological sciences. In the near future, generalized multiple alignment may play a central role in studies of FESS relationships. The organized mixture of knowledge from multiple fields will ferment to develop fruitful results which would be hard to obtain within each area. I hope this review provides a useful information resource for future development of theory and practice in this rapidly expanding area of bioinformatics.
Collapse
Affiliation(s)
- O Gotoh
- Saitama Cancer Center Research Institute, Japan
| |
Collapse
|
16
|
Sohail M, Akhtar S, Southern EM. The folding of large RNAs studied by hybridization to arrays of complementary oligonucleotides. RNA (NEW YORK, N.Y.) 1999; 5:646-55. [PMID: 10334335 PMCID: PMC1369792 DOI: 10.1017/s1355838299982195] [Citation(s) in RCA: 45] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Folding pathways of large RNAs are poorly understood. We have addressed this question by hybridizing in vitro transcripts, which varied in size, to an array of antisense oligonucleotides. All transcripts included a common sequence and all but one shared the same start-point; the other had a small deletion of the 5' end. Minimal free energy calculations predicted quite different folds for these transcripts. However, hybridization to the array showed predominant features that were shared by transcripts of all lengths, though some oligonucleotides that hybridized strongly to the short transcripts gave weak interaction with longer transcripts. A full-length RNA fragment that had been denatured by heating and allowed to cool slowly gave the same hybridization result as a shorter transcript. Taken together, these results support theories that RNA folding creates local stable states that are trapped early in the transcription or folding process. As the transcript elongates, interactions are added between regions that are transcribed early and those transcribed late. The method here described helps in identifying regions in the transcripts that take part in long-range interactions.
Collapse
Affiliation(s)
- M Sohail
- Department of Biochemistry, University of Oxford, England, United Kingdom.
| | | | | |
Collapse
|
17
|
McNaughton JC, Hughes G, Jones WA, Stockwell PA, Klamut HJ, Petersen GB. The evolution of an intron: analysis of a long, deletion-prone intron in the human dystrophin gene. Genomics 1997; 40:294-304. [PMID: 9119397 DOI: 10.1006/geno.1996.4543] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The sequence of a 112-kb region of the human dystrophin (DMD/BMD) gene encompassing the deletion prone intron 7 (110 kb) and the much shorter intron 8 (1.1 kb) has been determined. Recognizable insertion sequences account for approximately 40% of intron 7. LINE-1 and THE-1/LTR sequences occur in intron 7 with significantly higher frequency than would be expected statistically while Alu sequences are underrepresented. Intron 7 also contains numerous mammalian-wide interspersed repeats, a diverse range of medium reiteration repeats of unknown origin, and a sequence derived from a mariner transposon. By contrast, the shorter intron 8 contains no detectable insertion sequences. Dating of the LI and Alu sequences suggests that intron 7 has approximately doubled in size within the past 130 million years, and comparison with the corresponding intron from the pufferfish (Fugu rubripes) suggests that the intron has expanded some 44-fold over a period of 400 million years. The possible contribution of the insertion elements to the instability of intron 7 is discussed.
Collapse
Affiliation(s)
- J C McNaughton
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | | | | | | | | | | |
Collapse
|
18
|
Frischer ME, Floriani PJ, Nierzwicki-Bauer SA. Differential sensitivity of 16S rRNA targeted oligonucleotide probes used for fluorescence in situ hybridization is a result of ribosomal higher order structure. Can J Microbiol 1996; 42:1061-71. [PMID: 8890483 DOI: 10.1139/m96-136] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The use of 16S rRNA targeted gene probes for the direct analysis of microbial communities has revolutionized the field of microbial ecology, yet a comprehensive approach for the design of such probes does not exist. The development of 16S rRNA targeted oligonucleotide probes for use with fluorescence in situ hybridization (FISH) procedures has been especially difficult as a result of the complex nature of the rRNA target molecule. In this study a systematic comparison of 16S rRNA targeted oligonucleotide gene probes was conducted to determine if target location influences the hybridization efficiency of oligonucleotide probes when used with in situ hybridization protocols for the detection of whole microbial cells. Five unique universal 12-mer oligonucleotide sequences, located at different regions of the 16S rRNA molecule, were identified by a computer-aided sequence analysis of over 1000 partial and complete 16S rRNA sequences. The complements of these oligomeric sequences were chemically synthesized for use as probes and end labeled with either [gamma-32P]ATP or the fluorescent molecule tetramethylrhodamine-5/-6. Hybridization sensitivity for each of the probes was determined by hybridization to heat-denatured RNA immobilized on blots or to formaldehyde fixed whole cells. All of the probes hybridized with equal efficiency to denatured RNA. However, the probes exhibited a wide range of sensitivity (from none to very strong) when hybridized with whole cells using a previously developed FISH procedure. Differential hybridization efficiencies against whole cells could not be attributed to cell wall type, since the relative probe efficiency was preserved when either Gram-negative or -positive cells were used. These studies represent one of the first attempts to systematically define criteria for 16S rRNA targeted probe design for use against whole cells and establish target site location as a critical parameter in probe design.
Collapse
Affiliation(s)
- M E Frischer
- Department of Biology, MRC 306 Rensselaer Polytechnic Institute, Troy, NY 12180-3590, USA
| | | | | |
Collapse
|
19
|
López-Nieto CE, Nigam SK. Selective amplification of protein-coding regions of large sets of genes using statistically designed primer sets. Nat Biotechnol 1996; 14:857-61. [PMID: 9631010 DOI: 10.1038/nbt0796-857] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
We describe a novel approach to design a set of primers selective for large groups of genes. This method is based on the distribution frequency of all nucleotide combinations (octa- to decanucleotides), and the combined ability of primer pairs, based on these oligonucleotides, to detect genes. By analyzing 1000 human mRNAs, we found that a surprisingly small subset of octanucleotides is shared by a high proportion of human protein-coding region sense strands. By computer simulation of polymerase chain reactions, a set based on only 30 primers was able to detect approximately 75% of known (and presumably unknown) human protein-coding regions. To validate the method and provide experimental support for the feasibility of the more ambitious goal of targeting human protein-coding regions, we sought to apply the technique to a large protein family: G-protein coupled receptors (GPCRs). Our results indicate that there is sufficient low level homology among human coding regions to allow design of a limited set of primer pairs that can selectively target coding regions in general, as well as genomic subsets (e.g., GPCRs). The approach should be generally applicable to human coding regions, and thus provide an efficient method for analyzing much of the transcriptionally active human genome.
Collapse
Affiliation(s)
- C E López-Nieto
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | | |
Collapse
|
20
|
Kolchanov NA, Titov II, Vlassova IE, Vlassov VV. Chemical and computer probing of RNA structure. PROGRESS IN NUCLEIC ACID RESEARCH AND MOLECULAR BIOLOGY 1996; 53:131-96. [PMID: 8650302 PMCID: PMC7133174 DOI: 10.1016/s0079-6603(08)60144-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Ribonucleic acids (RNAs) are one of the most important types of biopolymers. RNAs play key roles in the storage and multiplication of genetic information. They are important in catalysis and RNA splicing and are the most important steps of translation. This chapter describes experimental methods for probing RNA structure and theoretical methods allowing the prediction of thermodynamically favorable RNA folding. These methods are complementary and together they provide a powerful approach to determine the structure of RNAs. The three-dimensional (tertiary) structure of RNA is formed by hydrogen-bonding among functional groups of nucleosides in different regions of the molecule, by coordination of polyvalent cations, and by stacking between the double-stranded regions present in the RNA. The tertiary structures of only some small RNAs have been determined by high-resolution X-ray crystallographic analysis and nuclear magnetic resonance analysis. The most widely used approach for the investigation of RNA structure is chemical and enzymatic probing, in combination with theoretical methods and phylogenetic studies allowing the prediction of variants of RNA folding. Investigations of RNA structures with different enzymatic and chemical probes can provide detailed data allowing the identification of double-stranded regions of the molecules and nucleotides involved in tertiary interactions.
Collapse
Affiliation(s)
- N A Kolchanov
- Institute of Cytology and Genetics, Siberian Division of Russian Academy of Sciences, Novosibirsk, Russia
| | | | | | | |
Collapse
|
21
|
Abstract
Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pairwise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH), infers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises to be very fast and practical for DNA sequence assembly.
Collapse
Affiliation(s)
- R M Idury
- Department of Mathematics, University of Southern California, Los Angeles 90089-1113, USA
| | | |
Collapse
|
22
|
Kister A, Magarshak Y, Malinsky J. The theoretical analysis of the process of RNA molecule self-assembly. Biosystems 1993; 30:31-48. [PMID: 7690610 DOI: 10.1016/0303-2647(93)90060-p] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The Kinetic approach to the problem of the RNA structure prediction based on the analysis of the molecule self-formation is proposed. Re-structurization that occurs during processing is described in terms of Markov processes. A new formalism designating nucleotides by complex numbers is proposed, leading to the complex unitary space of nucleic vectors. Properties of structure and transition matrices are discussed in relation to the analysis of RNA structural formation processes. The non-linear dynamic behavior of secondary structure transition is analyzed. Soliton-like oscillations of RNA and DNA tertiary structures are predicted. The Monte-Carlo simulation of the RNA structure self-formation is used to calculate the ensemble of the secondary structures of the tRNA(Ala) precursor from Bombix mori formed during processing.
Collapse
Affiliation(s)
- A Kister
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115
| | | | | |
Collapse
|
23
|
Molecular phylogeny of three platyrrhine primates, capuchin monkey, spider monkey and owl monkey, as inferred from nucleotide sequences of the ψη-globin gene. J Hum Evol 1992. [DOI: 10.1016/0047-2484(92)90087-p] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
24
|
Abstract
Multiple sequence comparison refers to the search for similarity in three or more sequences. This article presents a survey of the exhaustive (optimal) and heuristic (possibly sub-optimal) methods developed for the comparison of multiple macromolecular sequences. Emphasis is given to the different approaches of the heuristic methods. Four distance measures derived from information engineering and genetic studies are introduced for the comparison between two alignments of sequences. The use of entropy, which plays a central role in information theory as measures of information, choice and uncertainty, is proposed as a simple measure for the evaluation of the optimality of an alignment in the absence of any a priori knowledge about the structures of the sequences being compared. This article also gives two examples of comparison between alternative alignments of the same set of 5SRNAs as obtained by several different heuristic methods.
Collapse
Affiliation(s)
- S C Chan
- Department of Systems Design Engineering, University of Waterloo, Canada
| | | | | |
Collapse
|
25
|
Montpetit ML, Cassol S, Salas T, O'Shaughnessy MV. OLIGSCAN: a computer program to assist in the design of PCR primers homologous to multiple DNA sequences. J Virol Methods 1992; 36:119-28. [PMID: 1556160 DOI: 10.1016/0166-0934(92)90143-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
OLIGSCAN (oligonucleotide scanner) is a computer program for IBM-PC-compatible computers that allows the user to scan up to 200 DNA sequences for homology to oligonucleotide sequences of interest. Once a core sequence of longer than the user-defined minimum length is found, the remainder of the oligonucleotide is compared to the corresponding positions of the larger sequence to identify matches or mismatches flanking the core region. This algorithm results in identification of the longest possible homologous regions first. The program was originally designed to assist in the identification of potential annealing sites for polymerase chain reaction (PCR) primers in the genomic DNA of related strains of viruses. However, it may also be used for more general pattern-identification purposes, including scanning for various sequence motifs of functional importance. We present the analysis of homology to an oligonucleotide primer in 16 complete genomic sequences of the human and simian immunodeficiency viruses.
Collapse
Affiliation(s)
- M L Montpetit
- Federal Centre for AIDS, Health and Welfare Canada, Ottawa, Ontario
| | | | | | | |
Collapse
|
26
|
Le SY, Shapiro BA, Chen JH, Nussinov R, Maizel JV. RNA pseudoknots downstream of the frameshift sites of retroviruses. GENETIC ANALYSIS, TECHNIQUES AND APPLICATIONS 1991; 8:191-205. [PMID: 1663382 PMCID: PMC7128882 DOI: 10.1016/1050-3862(91)90013-h] [Citation(s) in RCA: 29] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/15/1991] [Revised: 07/30/1991] [Accepted: 07/30/1991] [Indexed: 12/28/2022]
Abstract
RNA pseudoknot structural motifs could have implications for a wide range of biological processes of RNAs. In this study, the potential RNA pseudoknots just downstream from the known and suspected retroviral frame-shift sites were predicted in the Rous sarcoma virus, primate immunodeficiency viruses (HIV-1, HIV-2, and SIV), equine infectious anemia virus, visna virus, bovine leukemia virus, human T-cell leukemia virus (types I and II), mouse mammary tumor virus, Mason-Pfizer monkey virus, and simian SRV-1 type-D retrovirus. Also, the putative RNA pseudoknots were detected in the gag-pol overlaps of two retrotransposons of Drosophila, 17.6 and gypsy, and the mouse intracisternal A particle. For each sequence, the thermodynamic stability and statistical significance of the secondary structure involved in the predicted tertiary structure were assessed and compared. Our results show that the stem-loop structures in the pseudoknots are both thermodynamically highly stable and statistically significant relative to other such configurations that potentially occur in the gag-pol or gag-pro and pro-pol junction domains of these viruses (300 nucleotides upstream and downstream from the possible frameshift sites are included). Moreover, the structural features of the predicted pseudoknots following the frameshift site of pro-pol overlaps of the HTLV-1 and HTLV-2 retroviruses are structurally well conserved. The occurrence of eight compensatory base changes in the tertiary interaction of the two related sequences allow the conservation of their tertiary structures in spite of the sequence divergence. The results support the possible control mechanism for frameshifting proposed by Brierley et al. and Jacks et al.
Collapse
Affiliation(s)
- S Y Le
- Institute of Biological Sciences, National Research Council of Canada, Ottawa
| | | | | | | | | |
Collapse
|
27
|
Sala-Rovira M, Geraud ML, Caput D, Jacques F, Soyer-Gobillard MO, Vernet G, Herzog M. Molecular cloning and immunolocalization of two variants of the major basic nuclear protein (HCc) from the histone-less eukaryote Crypthecodinium cohnii (Pyrrhophyta). Chromosoma 1991; 100:510-8. [PMID: 1764969 DOI: 10.1007/bf00352201] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Two clones that encode variants (HCc1 and HCc2) of the major basic nuclear protein of the dinoflagellate Crypthecodinium cohnii, were identified by immunoscreening of a cDNA expression library. The first clone carries a full-length cDNA with an open reading frame (HCc1) encoding 113 amino acids. The cDNA from the second clone lacks some of the 5' end, and the coding sequence is only 102 residues. The two proteins display 77% sequence similarity and their NH2-ends are homologous to the NH2-peptide of the HCc protein determined by P. Rizzo. The amino acid composition, which confirms the basic nature of lysine-rich HCc proteins, differs markedly from other known DNA-binding proteins such as histones, HMGs or prokaryotic histone-like proteins. No convincing homology was found with other proteins. HCc antigens were localized on C. cohnii by immunofluorescence, and by electron microscopy (EM) with immunogold labelling. HCc proteins are mainly detected at the periphery of the permanently condensed chromosomes, where active chromatin is located, as well as in the nucleolar organizing region (NOR). This suggests that these basic, non-histone proteins, with a moderate affinity for DNA, are involved at some level in the regulation of gene expression.
Collapse
Affiliation(s)
- M Sala-Rovira
- Département de Biologie Cellulaire et Moléculaire, Université de Paris, VI CNRS UA 117, Banyuls-sur-Mer, France
| | | | | | | | | | | | | |
Collapse
|
28
|
Griffais R, André PM, Thibon M. K-tuple frequency in the human genome and polymerase chain reaction. Nucleic Acids Res 1991; 19:3887-91. [PMID: 1861980 PMCID: PMC328479 DOI: 10.1093/nar/19.14.3887] [Citation(s) in RCA: 58] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The frequency occurrences of K-tuple (overlapping sequences of defined length, K) were computed from the known human genome sequences. The significance of these frequencies for the whole human genome was tested by polymerase chain reaction (PCR). A computer programs based on these results was written to choose primers to amplify DNA target sequences, either of human genes or of human infectious agents. The software also gave nested primer sequences which were used to synthesize non radioactive probes by PCR. We applied these two methods, primer selection and non radioactive probes, to easily and quickly set up very efficient PCR sets to work in the human genome context.
Collapse
Affiliation(s)
- R Griffais
- Laboratoire des Chlamydiales et des Rickettsiales, Institut Pasteur, Paris, France
| | | | | |
Collapse
|
29
|
Abstract
A new approach is proposed for determining common RNA secondary structures within a set of homologous RNAs. The approach is a combination of phylogenetic and thermodynamic methods which is based on the prediction of optimal and suboptimal secondary structures, topological similarity searches and phylogenetic comparative analysis. The optimal and suboptimal RNA secondary structures are predicted by energy minimization. Structural comparison of the predicted RNA secondary structures is used to find conserved structures that are topologically similar in all these homologous RNAs. The validity of the conserved structural elements found is then checked by phylogenetic comparison of the sequences. This procedure is used to predict common structures of ribonuclease P (RNAase P) RNAs.
Collapse
Affiliation(s)
- S Y Le
- Institute for Biological Sciences, National Research Council of Canada, Ottawa, Ontario
| | | |
Collapse
|
30
|
Tyler EC, Horton MR, Krause PR. A review of algorithms for molecular sequence comparison. COMPUTERS AND BIOMEDICAL RESEARCH, AN INTERNATIONAL JOURNAL 1991; 24:72-96. [PMID: 2004526 DOI: 10.1016/0010-4809(91)90014-n] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Computers have recently become an essential component of research in molecular biology. Most computer analyses of nucleic acid and protein sequences depend on comparisons between sequences. These comparisons, depending on their purpose, may differ not only in the kinds of comparisons that are done, but also in the way the results of the comparison are used by molecular biologists or by other computer programs. This paper reviews algorithms currently in use to solve comparison problems in molecular biology. Each algorithm is explained in detail and discussed in terms of the molecular biology problems it is most suited to solve.
Collapse
Affiliation(s)
- E C Tyler
- Division of Computer Research and Technology, National Institutes of Health
| | | | | |
Collapse
|
31
|
Stückle EE, Emmrich C, Grob U, Nielsen PJ. Statistical analysis of nucleotide sequences. Nucleic Acids Res 1990; 18:6641-7. [PMID: 2251125 PMCID: PMC332623 DOI: 10.1093/nar/18.22.6641] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not equal to 0 as is usually the case in databases. As a test of these modifications, we show that in E. coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites.
Collapse
Affiliation(s)
- E E Stückle
- Max-Planck-Institut für Immunbiologie, Freiburg, FRG
| | | | | | | |
Collapse
|
32
|
Abstract
Pairwise optimal alignments between three or more sequences are not necessarily consistent as a whole, but consistent and inconsistent residues are usually distributed in clusters. An efficient method has been developed for locating consistent regions when each pairwise alignment is given in the form of a "skeletal representation" (Bull. math. Biol. 52, 359-373). This method is further extended so that the combination of pairwise alignments that gives the greatest consistency is found when possibly many alignments are equally optimal for each pairwise comparison. A method for acceleration of simultaneous multiple sequence alignment is proposed in which consistent regions serve as "anchor points" limiting application of direct multi-way alignment to the rest of "inconsistent" regions.
Collapse
Affiliation(s)
- O Gotoh
- Graduate School of Biomedical Sciences, University of Texas, Houston 77225
| |
Collapse
|
33
|
Dakka N, Puigserver A, Wicker C. Regulation by a protein-free carbohydrate-rich diet of rat pancreatic mRNAs encoding trypsin and elastase isoenzymes. Biochem J 1990; 268:471-4. [PMID: 2363685 PMCID: PMC1131456 DOI: 10.1042/bj2680471] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The levels of mRNAs coding for trypsin and elastase isoenzymic forms were determined in the pancreatic tissue of rats fed a high-carbohydrate protein-free diet for a 0-5-day period. No change in the amounts of mRNAs coding for the two isoelastases was observed, although previous results showed that the biosynthesis of anionic elastase was markedly increased, whereas the biosynthesis of cationic elastase decreased, suggesting the existence of a translational-control mechanism in response to nutritional substrates. In contrast, the levels of mRNAs specific for the three isotrypsins were significantly enhanced, possibly as a result of transcriptional regulation and/or a change in messenger stability. In combination with earlier observations of an overall decrease in cationic trypsin biosynthesis during the same nutritional manipulation, these results suggest that formation of this enzyme is also subject to translational control.
Collapse
Affiliation(s)
- N Dakka
- Centre de Biochimie et de Biologie Moléculaire du CNRS, Marseille, France
| | | | | |
Collapse
|
34
|
Abstract
The simplest dynamic algorithm for planar RNA folding searches for the maximum number of base pairs. The algorithm uses O(n3) steps. The more general case, where different weights (energies) are assigned to stacked base pairs and to the various types of single-stranded region topologies, requires a considerably longer computation time because of the partial backtracking involved. Limiting the loop size reduces the running time back to O(n3). Reduction in the number of steps in the calculations of the various RNA topologies has recently been suggested, thereby improving the time behavior. Here we show how a "jumping" procedure can be used to speed up the computation, not only for the maximal number of base pairs algorithm, but for the minimal energy algorithm as well.
Collapse
Affiliation(s)
- R Nussinov
- Sackler Institute for Molecular Medicine, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv, Israel
| | | | | | | |
Collapse
|
35
|
Claverie JM, Sauvaget I, Bougueleret L. K-tuple frequency analysis: from intron/exon discrimination to T-cell epitope mapping. Methods Enzymol 1990; 183:237-52. [PMID: 1690334 DOI: 10.1016/0076-6879(90)83017-4] [Citation(s) in RCA: 58] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
36
|
Abstract
The FASTA program can search the NBRF protein sequence library (2.5 million residues) in less than 20 min on an IBM-PC microcomputer and unambiguously detect proteins that shared a common ancestor billions of years in the past. FASTA is both fast and selective because it initially considers only amino acid identities. Its sensitivity is increased not only by using the PAM250 matrix to score and rescore regions with large numbers of identities but also by joining initial regions. The results of searches with FASTA compare favorably with results using NWS-based programs that are 100 times slower. FASTA is slightly less sensitive but considerably more selective. It is not clear that NWS-based programs would be more successful in finding distantly related members of the G-protein-coupled receptor family. The joining step by FASTA to calculate the initn score is especially useful for sequences that share regions of sequence similarity that are separated by variable-length loops. FASTP and FASTA were designed to identify protein sequences that have descended from a common ancestor, and they have proved very useful for this task. In many cases, a FASTA sequence search will result in a list of high scoring library sequences that are homologous to the query sequence, or the search will result in a list of sequences with similarity scores that cannot be distinguished from the bulk of the library. In either case, the question of whether there are sequences in the library that are clearly related to the query sequence has been answered unambiguously. Unfortunately, the results often will not be so clear-cut, and careful analysis of similarity scores, statistical significance, the actual aligned residues, and the biological context are required. In the course of analyzing the G-protein-coupled receptor family, several proteins were found that, because of a high initn score and a low init1 score that increased almost 2-fold with optimization, appeared to be members of this family which were not previously recognized. RDF2 analysis showed borderline z values, and only a careful examination of the sequence alignments that focused on the conserved residues provided convincing evidence that the high scores were fortuitous. As sequence comparison methods become more powerful by becoming more sensitive, they become more likely to mislead, and even greater care is required.
Collapse
|
37
|
Gautheret D, Major F, Cedergren R. Computer modeling and display of RNA secondary and tertiary structures. Methods Enzymol 1990; 183:318-30. [PMID: 1690337 DOI: 10.1016/0076-6879(90)83021-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
38
|
Le SY, Nussinov R, Maizel JV. Tree graphs of RNA secondary structures and their comparisons. COMPUTERS AND BIOMEDICAL RESEARCH, AN INTERNATIONAL JOURNAL 1989; 22:461-73. [PMID: 2776449 DOI: 10.1016/0010-4809(89)90039-6] [Citation(s) in RCA: 109] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
To facilitate comparison of RNA secondary structures each structure is represented as an ordered labeled tree. Several alternate secondary structures yielding a set of trees can be computed for any given RNA molecule (sequence). Frequently recurring subtrees are searched in this set of trees. The consensus structure motifs are then selected and used to construct a secondary structure model of the RNA. Given the difficulties involved in RNA secondary structure calculations, this procedure may significantly improve our predictive capabilities. In addition, the change of secondary structures between two different RNA sequences is described as a transformation of ordered trees. The transferable ratio of tree A from tree B is defined as a proportion of the largest common subtrees in trees A and B occurring in tree A. The method is applied to the study of the mechanism of human alpha 1 globin pre-mRNA splicing. In the study, two tentative splicing mechanisms, A and B, with different orders of intron excision from alpha 1 globin pre-mRNA have been stimulated. A possible relationship between the structural features of the secondary structures and the order of intron excision in the pathway of precursor splicing of human alpha 1 globin is discussed.
Collapse
Affiliation(s)
- S Y Le
- Division of Cancer Biology and Diagnosis, National Cancer Institute, Frederick, Maryland 21701
| | | | | |
Collapse
|
39
|
Benedetti G, De Santis P, Morosetti S. A new method to find a set of energetically optimal RNA secondary structures. Nucleic Acids Res 1989; 17:5149-61. [PMID: 2474795 PMCID: PMC318102 DOI: 10.1093/nar/17.13.5149] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
We present a computer method to determine nucleic acid secondary structures. It is based on three steps: 1) the search for all possible helical regions relied on a mathematical approach derived from the convolution theorem; it uses a tetradimensional complex vector representation of the bases along the sequence; 2) a 'tree' search for a set of minimum free energy structures, by the aid of an approximate energy evaluation to reduce the computer time requirements; 3) the exact calculation and refinement of the energies. A method to introduce the experimental data and reach an arrangement between them and the free energy minimization criterion is shown. In order to demonstrate the confidence of the program a test on four RNA sequences is performed. The method has computer time requirement proportional to N2, where N is the length of the sequence and retrieves a set of optimal free energy structures.
Collapse
Affiliation(s)
- G Benedetti
- Department of Chemistry, University of Rome, Italy
| | | | | |
Collapse
|
40
|
Abstract
An algorithm and a computer program have been prepared for determining RNA secondary structures within any prescribed increment of the computed global minimum free energy. The mathematical problem of determining how well defined a minimum energy folding is can now be solved. All predicted base pairs that can participate in suboptimal structures may be displayed and analyzed graphically. Representative suboptimal foldings are generated by selecting these base pairs one at a time and computing the best foldings that contain them. A distance criterion that ensures that no two structures are "too close" is used to avoid multiple generation of similar structures. Thermodynamic parameters, including free-energy increments for single-base stacking at the ends of helices and for terminal mismatched pairs in interior and hairpin loops, are incorporated into the underlying folding model of the above algorithm.
Collapse
Affiliation(s)
- M Zuker
- Division of Biological Sciences, National Research Council of Canada, Ottawa, Ontario
| |
Collapse
|
41
|
Nussinov R. The ordering of the nucleotides in DNA: computational problems in molecular biology. Comput Biol Med 1989; 19:269-81. [PMID: 2478335 DOI: 10.1016/0010-4825(89)90014-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Viewing DNA, RNA and proteins as strings of letters, various algorithms designed for their optimal alignments or their secondary structures have been developed. The results emanating from such sequence editing algorithms are often not correlated with the physio-chemical computations for calculating the detailed atomic coordinates of these molecules. These two aspects are often viewed as separate research entities. Here I attempt to relate various computational aspects of modern molecular biology. In particular, I attempt putting these (along with complementary experimental data) in the framework of a very basic biological question--what fixes the order of the bases in the DNA.
Collapse
Affiliation(s)
- R Nussinov
- Sackler Institute of Molecular Medicine, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv, Israel
| |
Collapse
|
42
|
|
43
|
Felenbok B, Sequeval D, Mathieu M, Sibley S, Gwynne DI, Davies RW. The ethanol regulon in Aspergillus nidulans: characterization and sequence of the positive regulatory gene alcR. Gene 1988; 73:385-96. [PMID: 3072264 DOI: 10.1016/0378-1119(88)90503-3] [Citation(s) in RCA: 64] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The regulatory gene, alcR, of Aspergillus nidulans, encodes a protein that induces the expression of the alcA and aldA genes. The alcR gene is inducible, autoregulated, and subject to carbon catabolite repression. We report the complete nucleotide sequence of the alcR gene and its 5' and 3' non-coding regions. In the 5' flanking region of the alcR gene, several repeats and inverted repeats were found, and small sequence similarities were also found with the 5' flanking regions of the alcA and aldA genes. One intron of small size interrupts the open reading frame. The start point of transcription was mapped 50 nucleotides upstream from the putative start codon, and a sequence CAATG was found 5' to the polyadenylation site of the transcript that could play a role in selection of the polyadenylation site. The putative alcR-encoded protein was identified in vivo as an inducible polypeptide of 96 kDa in a transformant carrying multiple copies of the alcR gene.
Collapse
Affiliation(s)
- B Felenbok
- Institut de Microbiologie (Laboratoire associé au CNRS 136), Université Paris-Sud, Orsay, France
| | | | | | | | | | | |
Collapse
|
44
|
Abstract
An approach for performing multiple alignments of large numbers of amino acid or nucleotide sequences is described. The method is based on first deriving a phylogenetic tree from a matrix of all pairwise sequence similarity scores, obtained using a fast pairwise alignment algorithm. Then the multiple alignment is achieved from a series of pairwise alignments of clusters of sequences, following the order of branching in the tree. The method is sufficiently fast and economical with memory to be easily implemented on a microcomputer, and yet the results obtained are comparable to those from packages requiring mainframe computer facilities.
Collapse
Affiliation(s)
- D G Higgins
- Department of Genetics, Trinity College, Dublin, Ireland
| | | |
Collapse
|
45
|
Abstract
We have used a rapid computer dot-matrix comparison method to identify all DNA regions which have been evolutionarily conserved between the completely sequenced chloroplast genomes of tobacco and a liverwort. Analysis of these regions reveals 74 homologous open reading frames (ORFs) which have been conserved as to length and amino acid sequence; these ORFs also have an excess of nucleotide substitutions at silent sites of codons. Since the nonfunctional parts of these genomes have become saturated with mutations and show no sequence similarity whatsoever, the homologous ORFs are almost certainly functional. A further four pairs of ORFs show homology limited to only a short part of their putative gene products. Amino acid sequence identities range between 50 and 99%; some chloroplast proteins are seen to be among the most slowly evolving of all known proteins. A search of the nucleotide and amino acid sequence databanks has revealed several previously unidentified genes in chloroplast sequences from other species, but no new homologies to prokaryotic genes.
Collapse
Affiliation(s)
- K H Wolfe
- Department of Genetics, Trinity College, Dublin, Ireland
| | | |
Collapse
|
46
|
Toda T, Cameron S, Sass P, Wigler M. SCH9, a gene of Saccharomyces cerevisiae that encodes a protein distinct from, but functionally and structurally related to, cAMP-dependent protein kinase catalytic subunits. Genes Dev 1988; 2:517-27. [PMID: 3290050 DOI: 10.1101/gad.2.5.517] [Citation(s) in RCA: 161] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
A new gene, SCH9, was isolated from Saccharomyces cerevisiae by its ability to complement a cdc25ts mutation. Sequence analysis indicates that it encodes a 90,000-dalton protein with a carboxy-terminal domain homologous to yeast and mammalian cAMP-dependent protein kinase catalytic subunits. In addition to suppressing loss of CDC25 function, multicopy plasmids containing SCH9 suppress the growth defects of strains lacking the RAS genes, the CYR1 gene, which encodes adenylyl cyclase, and the TPK genes, which encode the cAMP-dependent protein kinase catalytic subunits. Cells lacking SCH9 grow slowly and have a prolonged G1 phase of the cell cycle. This defect is suppressed by activation of the cAMP effector pathway. We propose that SCH9 encodes a protein kinase that is part of a growth control pathway which is at least partially redundant with the cAMP pathway.
Collapse
Affiliation(s)
- T Toda
- Cold Spring Harbor Laboratory, New York 11724
| | | | | | | |
Collapse
|
47
|
Abstract
We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.
Collapse
Affiliation(s)
- W R Pearson
- Department of Biochemistry, University of Virginia, Charlottesville 22908
| | | |
Collapse
|
48
|
Abstract
A multiple approach to the study of RNA secondary structure is described which provides for the independent drawing of structures using base-pairing lists, for the generation of local structures in the form of hairpins, and for the generation of global structures by both Monte Carlo and dynamic programming methodologies. User-adjustable parameters provide for limiting the size of hairpin loops, bulges and inner loops, and constraints can be imposed relative to position-dependent base pairing.
Collapse
Affiliation(s)
- H M Martinez
- Department of Biochemistry and Biophysics, University of California, San Franciso 94143
| |
Collapse
|
49
|
Complementary DNA cloning of cytochrome P-450s related to P-450(M-1) from the complementary DNA library of female rat livers. Predicted primary structures for P-450f, PB-1, and PB-1-related protein with a bizarre replacement block and their mode of transcriptional expression. J Biol Chem 1988. [DOI: 10.1016/s0021-9258(19)35409-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
50
|
Seidel U, Bober E, Winter B, Lenz S, Lohse P, Arnold HH. The complete nucleotide sequences of cDNA clones coding for human myosin light chains 1 and 3. Nucleic Acids Res 1987; 15:4989. [PMID: 3601661 PMCID: PMC305934 DOI: 10.1093/nar/15.12.4989] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
|