1
|
Pockrandt C, Alzamel M, Iliopoulos CS, Reinert K. GenMap: ultra-fast computation of genome mappability. Bioinformatics 2020; 36:3687-3692. [PMID: 32246826 PMCID: PMC7320602 DOI: 10.1093/bioinformatics/btaa222] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Revised: 03/23/2020] [Accepted: 03/31/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Computing the uniqueness of k-mers for each position of a genome while allowing for up to e mismatches is computationally challenging. However, it is crucial for many biological applications such as the design of guide RNA for CRISPR experiments. More formally, the uniqueness or (k, e)-mappability can be described for every position as the reciprocal value of how often this k-mer occurs approximately in the genome, i.e. with up to e mismatches. RESULTS We present a fast method GenMap to compute the (k, e)-mappability. We extend the mappability algorithm, such that it can also be computed across multiple genomes where a k-mer occurrence is only counted once per genome. This allows for the computation of marker sequences or finding candidates for probe design by identifying approximate k-mers that are unique to a genome or that are present in all genomes. GenMap supports different formats such as binary output, wig and bed files as well as csv files to export the location of all approximate k-mers for each genomic position. AVAILABILITY AND IMPLEMENTATION GenMap can be installed via bioconda. Binaries and C++ source code are available on https://github.com/cpockrandt/genmap.
Collapse
Affiliation(s)
- Christopher Pockrandt
- Center for Computational Biology, School of Medicine.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.,Department of Computer Science and Mathematics, Freie Universität Berlin.,Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Mai Alzamel
- Department of Informatics, King's College London, London, UK.,Department of Computer Science, King Saud University, Riyadh, Saudi Arabia
| | | | - Knut Reinert
- Department of Computer Science and Mathematics, Freie Universität Berlin.,Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
2
|
Affiliation(s)
- R. A. McIntosh
- University of Sydney; Plant Breeding Institute; Castle Hill, N.S.W. Australia
| | - Jane E. Cusick
- University of Sydney; Plant Breeding Institute; Castle Hill, N.S.W. Australia
| |
Collapse
|
3
|
Deumling B. Sequence arrangement of a highly methylated satellite DNA of a plant, Scilla: A tandemly repeated inverted repeat. Proc Natl Acad Sci U S A 2010; 78:338-42. [PMID: 16592953 PMCID: PMC319048 DOI: 10.1073/pnas.78.1.338] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
G+C-rich satellite DNA, representing about 19% of total nuclear DNA, was isolated from various tissues of the monocotyledonous plant, Scilla siberica, by using Ag(+)-Cs(2)SO(4) gradient techniques. This satellite DNA had an unusually high melting point and a high methylcytosine (m(5)C) content ( approximately 25% of total bases; m(5)C/cytosine ratio approximately 1.5) and was localized, by in situ hybridization, in the heterochromatin regions of the chromosomes. Digestion with restriction endonuclease Hae III yielded a series of fragments ranging from 35 to several hundred nucleotide pairs. The major fragments, I-IV (35, 50, 59, and 69, nucleotide pairs, respectively), were isolated, and their nucleotide sequences were determined. The dominant fragment I was a highly symmetrical molecule, with a basically palindromic arrangement. This sequence represented the basic unit of Scilla satellite DNA and was tandemly repeated many times, with some base substitutions and multiple successive insertions of the tetranucleotide G-T-C-C. The dinucleotide CpG was the commonest nearest-neighbor sequence. Thin layer chromatography, DNA sequence analysis, and gas chromatography combined with mass spectrometry showed the high m(5)C content (m(5)C/Cyt = 2.2 and 2.8, respectively, for fragments II and III). Identical cleavage fragments were found in satellite DNAs from two other species of this genus (S. amoena and S. ingridae), which suggests that this constitutively methylated sequence is evolutionarily stable. The sequence arrangement of this plant satellite DNA is compared with those reported for several animal satellite DNAs.
Collapse
Affiliation(s)
- B Deumling
- Department of Membrane Biology and Biochemistry, Institute of Cell and Tumor Biology, German Cancer Research Center, D-6900 Heidelberg, Federal Republic of Germany
| |
Collapse
|
4
|
Nayak A, Nair J, Hegde M, Ranjekar P, Pant U. Genome analysis of two mosquito species. ACTA ACUST UNITED AC 1991. [DOI: 10.1016/0020-1790(91)90122-u] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
5
|
Determination of 5-methylcytosine content of four Cucurbitaceae species using high-performance liquid chromatography. J Chromatogr A 1991. [DOI: 10.1016/s0021-9673(01)88829-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
6
|
Kumar LS, Hendre RR, Ranjekar PK. 5-Methylcytosine content and methylation status in six millet DNAs. J Biosci 1990. [DOI: 10.1007/bf02704712] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
7
|
Ranade SA, Lagu MD, Patankar SM, Dabak MM, Dhar MS, Gupta VS, Ranjekar PK. Identification of a dispersed MboI repeat family in five higher plant genomes. Biosci Rep 1988; 8:435-41. [PMID: 3233342 DOI: 10.1007/bf01121641] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Digestion of nuclear DNAs of five plants, namely Cucurbita maxima (red gourd), Trichosanthes anguina (snake gourd), Cucumis sativus (cucumber), Cajanus cajan (pigeon pea) and Phaseolus vulgaris (french bean) with the restriction endonuclease MboI yielded discrete size classes with molecular weights in the range of 0.5 to 5 kbp. The MboI digestion pattern of Cot 0.1 DNA in french bean is comparable with that of total DNA, indicating that these bands represented highly repeated DNA sequences. Cleavage of the DNAs with varying amounts of MboI indicated the dispersed nature of the repeat families. Southern hybridization studies using french bean highly repetitive DNA as a probe indicated more homology with repeats of pigeon pea and less homology with red gourd, snake gourd and cucumber repeats.
Collapse
Affiliation(s)
- S A Ranade
- Division of Biochemical Sciences, National Chemical Laboratory, Pune, India
| | | | | | | | | | | | | |
Collapse
|
8
|
Sivaraman L, Gupta VS, Ranjekar PK. DNA sequence organization in the genomes of three related millet plant species. PLANT MOLECULAR BIOLOGY 1986; 6:375-388. [PMID: 24307416 DOI: 10.1007/bf00027131] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/1985] [Revised: 02/18/1986] [Accepted: 03/04/1986] [Indexed: 06/02/2023]
Abstract
A major portion of the genomes of three millet species, namely, barn yard millet, fox tail millet and little millet has been shown to consist of interspersed repeat and single copy DNA sequences. The interspersed repetitive DNA sequences are both short (0.15-1.0 kilo base pairs, 62-64% and long (>1.5 kilo base pairs, 36-38%) in barn yard millet and little millet while in fox tail millet, only long interspersed repeats (>1.5 kilo base pairs) are present. The length of the interspersed single copy DNA sequences varies in the range of 1.6-2.6 kilo base pairs in all the three species. The repetitive duplexes isolated after renaturation of 1.5 kilo base pairs and 20 kilo base pairs long DNA fragments exhibit a high thermal stability with Tms either equal to or greater than the corresponding native DNAs. The S1 nuclease resistant repetitive DNA duplexes also are thermally stable and reveal the presence of only 1-2% sequence divergence.The present data on the modes of sequence arrangement in millets substantiates the proposed trend in plants, namely, plants with 1C nuclear DNA content of less than 5 picograms have diverse patterns of sequence organization while those with 1C nuclear DNA content greater than 5 picograms have predominantly a short period interspersion pattern.
Collapse
Affiliation(s)
- L Sivaraman
- Biochemistry Division, National Chemical Laboratory, Pune, 411 008, India
| | | | | |
Collapse
|
9
|
Chakrabarti T, Subrahmanyam NC. Characterization and localization of cryptic satellite DNAs in barley (Hordeum vulgare). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 1986; 73:31-39. [PMID: 24240744 DOI: 10.1007/bf00273715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/1985] [Accepted: 04/15/1986] [Indexed: 06/02/2023]
Abstract
Three satellites on the heavy side of the main band and two satellites on the light side were isolated in a pure from by preparative ultracentrifugation of H. vulgare DNA in Ag(+)/Cs2SO4 density gradients. The satellites were characterised in terms of their buoyant densities in CsCl and their thermal dissocation temperature in both native and reassociated forms to Cot 4. In CsCl gradients, heavy satellites formed a single peak whereas light satellites resolved into more than one component. Thermal transitions of some satellites indicated the presence of more than one molecular species. The multicomponent nature of thermal denaturation profiles was evident on differential analysis. Radioactive RNAs complementary to the three heavy satellites of H. vulgare were localised by in situ hybridization onto its nuclei and chromosomes. One heavy satellite (H3) was found to be distributed on all chromosomes, although one pair showed less hybridization compared to the others. The other satellite (H1) appeared to be present in a much lower amount on the chromosomes.
Collapse
Affiliation(s)
- T Chakrabarti
- Department of Genetics, Research School of Biological Sciences, The Australian National University, 2601, Canberra, A.C.T., Australia
| | | |
Collapse
|
10
|
Chakrabarti T, Subrahmanyam NC, Doy CH. Analysis and in situ hybridization of cryptic satellites in Hordeum arizonicum. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 1986; 73:40-46. [PMID: 24240745 DOI: 10.1007/bf00273716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/1985] [Accepted: 04/15/1986] [Indexed: 06/02/2023]
Abstract
Three satellites, one (H1) on the heavy side of the main band of Hordeum arizonicum DNA and two (L1, L2) on the lighter side were purified using preparative silver-cesium sulphate density gradients. The native and the reassociated satellite DNAs were analysed in terms of buoyant densities and thermal dissociation. In cesium chloride gradients the H1 and L1 satellites formed single peaks corresponding to buoyant densities of 1.700 and 1.701 g · cm(-3) respectively while the L2 satellite gave two peaks (1.680 and 1.661 g · cm(-3)). The H1 satellite showed three thermal components (Tm=82.5 °C, 87 °C and 91.5 °C) while the L1 and L2 had three (86.5, 92, 97.5 °C) and two (86, 95 °C) respectively. The H1 satellite was localized on the nuclei and chromosomes. The distribution of H1 onto approximately on third of the complement may reflect the genome specific origin of this satellite.
Collapse
Affiliation(s)
- T Chakrabarti
- Department of Genetics, Research School of Biological Sciences, The Australian National University, 2600, Canberra, A.C.T., Australia
| | | | | |
Collapse
|
11
|
|
12
|
Molecular analysis of Cucurbitaceae genomes: I. Comparison of DNA reassociation kinetics in six plant species. ACTA ACUST UNITED AC 1984. [DOI: 10.1016/0304-4211(84)90002-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Mehra U, Ranjekar PK. Analysis of bovidae genomes: Arrangement of repeated and single copy DNA sequences in bovine, goat and sheep. J Biosci 1982. [DOI: 10.1007/bf02702587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
14
|
|
15
|
|
16
|
Dennis ES, Gerlach WL, Peacock WJ. Identical polypyrimidine-polypurine satellite DNAs in wheat and barley. Heredity (Edinb) 1980. [DOI: 10.1038/hdy.1980.33] [Citation(s) in RCA: 90] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
17
|
|
18
|
Rimpau J, Smith DB, Flavell RB. Sequence organisation in barley and oats chromosomes revealed by interspecies DNA/DNA hybridisation. Heredity (Edinb) 1980. [DOI: 10.1038/hdy.1980.12] [Citation(s) in RCA: 36] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
19
|
Flavell R, O'Dell M, Smith D. Repeated sequence DNA comparisons between Triticum and Aegilops species. Heredity (Edinb) 1979. [DOI: 10.1038/hdy.1979.34] [Citation(s) in RCA: 37] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
20
|
Wimpee CF, Rawson JR. Characterization of the nuclear genome of pearl millet. BIOCHIMICA ET BIOPHYSICA ACTA 1979; 562:192-206. [PMID: 444525 DOI: 10.1016/0005-2787(79)90165-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The nuclear genome of pearl millet has been characterized with respect to its size, buoyant density in CsCl equilibrium density gradients, melting temperature, reassociation kinetics and sequence organization. The genome size is 0.22 pg. The mol percent G + C of the DNA is calculated from the buoyant density and the melting temperature to be 44.9 and 49.7%, respectively. The reassociation kinetics of fragments of DNA 300 nucleotides long reveals three components: a rapidly renaturing fraction composed of highly repeated and/or foldback DNA, middle repetitive DNA and single copy DNA. The single copy DNA consists of 17% of the genome. 80% of the repetitive sequences are at least 5000 nucleotide pairs in length. Thermal denaturation profiles of the repetitive DNA sequences show high Tm values implying a high degree of sequence homogeneity. About half of the single copy DNA is short (750--1400 nucleotide paris) and interspersed with long repetitive DNA sequences. The remainder of the single copy sequences vary in size from 1400 to 8600 nucleotide pairs.
Collapse
|
21
|
|
22
|
Ranjekar PK, Pallotta D, Lafontaine JG. Analysis of plant genomes. V. Comparative study of molecular properties of DNAs of seven Allium species. Biochem Genet 1978; 16:957-70. [PMID: 743197 DOI: 10.1007/bf00483747] [Citation(s) in RCA: 29] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The genomes of seven plant species belonging to the genus Allium and exhibiting a threefold variation in their nuclear DNA content were analyzed by studying their reassociation kinetics, equilibrium centrifugation behavior in neutral CsCl gradients, and melting properties. The reassociation kinetics experiments revealed the presence of 44-65% repeated DNA sequences. A comparison between DNA contents and the proportion of repeated DNA sequences indicated that, in Allium, increase in the genome size is not exclusively due to variations in the proportions of repetitive DNA. The total DNA as well as the various repetitive DNA fractions in all the Allium species examined exhibited, in spite of a few differences, a gross similarity in their behavior in neutral CsCl gradients and in their melting properties.
Collapse
|
23
|
Ranjekar PK, Pallotta D, Lafontaine JG. Analysis of plant genomes. III. Denaturation and reassociation properties of cryptic satellite DNAs in barley (Hordeum vulgare) and wheat (Triticum aestivum). BIOCHIMICA ET BIOPHYSICA ACTA 1978; 520:103-10. [PMID: 698222 DOI: 10.1016/0005-2787(78)90011-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
A cryptic satellite fraction was isolated from barley and wheat by preparatory ultracentrifugation of total DNA in Ag+-Cs2SO4 density gradients and was characterized by studying its denaturation-reassociation properties. Wheat satellite DNA underwent thermal denaturation as a single component with a Tm of 81 degrees C while barley satellite DNA consisted of one major (Tm = 82.5 degrees C) and one minor (Tm = 91 degrees C) component. When the barley and wheat satellites were reassociated and then melted, the Tm values were found to be 6--7 degrees C lower than those of the corresponding native DNA preparations. Examination of the C0t curves of these two satellite DNAs revealed the presence of a major, fast reassociating and a minor, slow reassociating fraction. The fast reassociating DNA fraction of barley was found to have a complexity of 9.7 . 10(5) daltons while that of wheat satellite was 5.8 . 10(5) daltons. Since these satellites reassociated with about 4--5% base mismatching, as judged by their deltsTm (6--7 degrees C), they each appear to consist of rather similar base sequences.
Collapse
|
24
|
Rimpau J, Smith D, Flavell R. Sequence organisation analysis of the wheat and rye genomes by interspecies DNA/DNA hybridisation. J Mol Biol 1978; 123:327-59. [PMID: 691051 DOI: 10.1016/0022-2836(78)90083-9] [Citation(s) in RCA: 75] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
25
|
Biochemical evidence that leghaemoglobin genes are present in the soybean but not in Rhizobium genome. Nature 1978. [DOI: 10.1038/273558a0] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
26
|
|
27
|
Lurquin PF. Integration versus degradation of exogenous DNA in plants: an open question. PROGRESS IN NUCLEIC ACID RESEARCH AND MOLECULAR BIOLOGY 1977; 20:161-207. [PMID: 333511 DOI: 10.1016/s0079-6603(08)60473-0] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|