1
|
Rather MA, Agarwal D, Bhat TA, Khan IA, Zafar I, Kumar S, Amin A, Sundaray JK, Qadri T. Bioinformatics approaches and big data analytics opportunities in improving fisheries and aquaculture. Int J Biol Macromol 2023; 233:123549. [PMID: 36740117 DOI: 10.1016/j.ijbiomac.2023.123549] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/05/2023]
Abstract
Aquaculture has witnessed an excellent growth rate during the last two decades and offers huge potential to provide nutritional as well as livelihood security. Genomic research has contributed significantly toward the development of beneficial technologies for aquaculture. The existing high throughput technologies like next-generation technologies generate oceanic data which requires extensive analysis using appropriate tools. Bioinformatics is a rapidly evolving science that involves integrating gene based information and computational technology to produce new knowledge for the benefit of aquaculture. Bioinformatics provides new opportunities as well as challenges for information and data processing in new generation aquaculture. Rapid technical advancements have opened up a world of possibilities for using current genomics to improve aquaculture performance. Understanding the genes that govern economically relevant characteristics, necessitates a significant amount of additional research. The various dimensions of data sources includes next-generation DNA sequencing, protein sequencing, RNA sequencing gene expression profiles, metabolic pathways, molecular markers, and so on. Appropriate bioinformatics tools are developed to mine the biologically relevant and commercially useful results. The purpose of this scoping review is to present various arms of diverse bioinformatics tools with special emphasis on practical translation to the aquaculture industry.
Collapse
Affiliation(s)
- Mohd Ashraf Rather
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India.
| | - Deepak Agarwal
- Institute of Fisheries Post Graduation Studies OMR Campus, Vaniyanchavadi, Chennai, India
| | | | - Irfan Ahamd Khan
- Division of Fish Genetics and Biotechnology, Faculty of Fisheries Ganderbal, Sher-e- Kashmir University of Agricultural Science and Technology, Kashmir, India
| | - Imran Zafar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Sujit Kumar
- Department of Bioinformatics and Computational Biology, Virtual University Punjab, Pakistan
| | - Adnan Amin
- Postgraduate Institute of Fisheries Education and Research Kamdhenu University, Gandhinagar-India University of Kurasthra, India; Department of Aquatic Environmental Management, Faculty of Fisheries Rangil- Ganderbel -SKUAST-K, India
| | - Jitendra Kumar Sundaray
- ICAR-Central Institute of Freshwater Aquaculture, Kausalyaganga, Bhubaneswar, Odisha 751002, India
| | - Tahiya Qadri
- Division of Food Science and Technology, SKUAST-K, Shalimar, India
| |
Collapse
|
2
|
Bhardwaj A, Bag SK. PLANET-SNP pipeline: PLants based ANnotation and Establishment of True SNP pipeline. Genomics 2019; 111:1066-1077. [PMID: 31533899 DOI: 10.1016/j.ygeno.2018.07.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 06/10/2018] [Accepted: 07/02/2018] [Indexed: 12/30/2022]
Abstract
Acute prediction of SNPs (Single Nucleotide Polymorphisms) from high throughput sequencing data is a challenging problem, having potential to explore possible variation within plants species. For the extraction of profitable information from bulk of data, machine learning (ML) could lead to development of accurate model based on the learning of prior information. We performed state of art, in-depth learning on six different plant species. Comparative evaluation of five different algorithms showed that Random Forest substantially outperformed in selection of potential SNPs, with markedly improved prediction accuracy via 10-fold cross validation technique and integrated in system known as PLANET-SNP. We present the accurate method to extract the potential SNPs with user specific customizable parameters. It will facilitate the identification of efficient and functional SNPs in most easy and intuitive way. PLANET-SNP pipeline is very flexible in terms of data input and output formats. PLANET-SNP Pipeline is available at http://www.ncgd.nbri.res.in/PLANET-SNP-Pipeline.aspx.
Collapse
Affiliation(s)
- Archana Bhardwaj
- Academy of Scientific and Innovative Research (AcSIR), CSIR-NBRI Campus, Lucknow, India; Computational Biology Lab, Council of Scientific and Industrial Research - National Botanical Research Institute (CSIR-NBRI), Rana Pratap Marg, Lucknow, Uttar Pradesh 226001, India
| | - Sumit K Bag
- Academy of Scientific and Innovative Research (AcSIR), CSIR-NBRI Campus, Lucknow, India; Computational Biology Lab, Council of Scientific and Industrial Research - National Botanical Research Institute (CSIR-NBRI), Rana Pratap Marg, Lucknow, Uttar Pradesh 226001, India.
| |
Collapse
|
3
|
Abstract
An integrated database with a variety of Web-based systems named WheatGenome.info hosting wheat genome and genomic data has been developed to support wheat research and crop improvement. The resource includes multiple Web-based applications, which are implemented as a variety of Web-based systems. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This portal provides links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/ .
Collapse
|
4
|
May A, Abeln S, Buijs MJ, Heringa J, Crielaard W, Brandt BW. NGS-eval: NGS Error analysis and novel sequence VAriant detection tooL. Nucleic Acids Res 2015; 43:W301-5. [PMID: 25878034 PMCID: PMC4489229 DOI: 10.1093/nar/gkv346] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 04/03/2015] [Indexed: 02/04/2023] Open
Abstract
Massively parallel sequencing of microbial genetic markers (MGMs) is used to uncover the species composition in a multitude of ecological niches. These sequencing runs often contain a sample with known composition that can be used to evaluate the sequencing quality or to detect novel sequence variants. With NGS-eval, the reads from such (mock) samples can be used to (i) explore the differences between the reads and their references and to (ii) estimate the sequencing error rate. This tool maps these reads to references and calculates as well as visualizes the different types of sequencing errors. Clearly, sequencing errors can only be accurately calculated if the reference sequences are correct. However, even with known strains, it is not straightforward to select the correct references from databases. We previously analysed a pyrosequencing dataset from a mock sample to estimate sequencing error rates and detected sequence variants in our mock community, allowing us to obtain an accurate error estimation. Here, we demonstrate the variant detection and error analysis capability of NGS-eval with Illumina MiSeq reads from the same mock community. While tailored towards the field of metagenomics, this server can be used for any type of MGM-based reads. NGS-eval is available at http://www.ibi.vu.nl/programs/ngsevalwww/.
Collapse
Affiliation(s)
- Ali May
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands
| | - Sanne Abeln
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands
| | - Mark J Buijs
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics (IBIVU), VU University Amsterdam, Amsterdam, The Netherlands AIMMS Amsterdam Institute for Molecules Medicines and Systems, VU University Amsterdam, Amsterdam, The Netherlands
| | - Wim Crielaard
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands
| | - Bernd W Brandt
- Department of Preventive Dentistry, Academic Centre for Dentistry Amsterdam (ACTA), University of Amsterdam and VU University Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
5
|
Ruperao P, Edwards D. Bioinformatics: identification of markers from next-generation sequence data. Methods Mol Biol 2015; 1245:29-47. [PMID: 25373747 DOI: 10.1007/978-1-4939-1966-6_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
With the advent of sequencing technology, next-generation sequencing (NGS) technology has dramatically revolutionized plant genomics. NGS technology combined with new software tools enables the discovery, validation, and assessment of genetic markers on a large scale. Among different markers systems, simple sequence repeats (SSRs) and Single nucleotide polymorphisms (SNPs) are the markers of choice for genetics and plant breeding. SSR markers have been a choice for large-scale characterization of germplasm collections, construction of genetic maps, and QTL identification. Similarly, SNPs are the most abundant genetic variations with higher frequencies throughout the genome of plant species. This chapter discusses various tools available for genome assembly and widely focuses on SSR and SNP marker discovery.
Collapse
Affiliation(s)
- Pradeep Ruperao
- School of Agriculture and Food Sciences, University of Queensland, Brisbane, QLD, Australia
| | | |
Collapse
|
6
|
Patel DA, Zander M, Dalton-Morgan J, Batley J. Advances in plant genotyping: where the future will take us. Methods Mol Biol 2015; 1245:1-11. [PMID: 25373745 DOI: 10.1007/978-1-4939-1966-6_1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Genetic diversity between individuals can be tracked and monitored using a range of molecular markers. These markers can detect variation ranging in scale from a single base pair up to duplications and translocations of entire chromosomal regions. The genotyping of individuals allows the detection of this variation and it has been successfully applied in plant science for many years. The increasing amounts of sequence data able to be generated using next-generation sequencing (NGS) technologies have produced a vast expansion in the rate of discovery of polymorphisms, with single nucleotide polymorphisms (SNPs) predominating as the marker of choice. This increase in polymorphic marker resources through efficient discovery, coupled with the utility of SNPs, has enabled the shift to high-throughput genotyping assays and these methods are reviewed and discussed here, alongside the recent innovations allowing increased throughput.
Collapse
Affiliation(s)
- Dhwani A Patel
- School of Agriculture and Food Sciences, University of Queensland, Brisbane, QLD, Australia
| | | | | | | |
Collapse
|
7
|
Abstract
Molecular genetic markers represent one of the most powerful tools for the analysis of variation between plant genomes. Molecular marker technology has developed rapidly over the last decade, with the introduction of new DNA sequencing methods and the development of high-throughput genotyping methods. Single nucleotide polymorphisms (SNPs) now dominate applications in modern plant genetic analysis. The reducing cost of DNA sequencing and increasing availability of large sequence data sets permit the mining of this data for large numbers of SNPs. These may then be used in applications such as genetic linkage analysis and trait mapping, diversity analysis, association studies, and marker-assisted selection. Here we describe automated methods for the discovery of SNP molecular markers and new technologies for high-throughput, low-cost molecular marker genotyping. Examples include SNP discovery using autoSNPdb and wheatgenome.info as well as SNP genotyping using Illumina's GoldenGate™ and Infinium™ methods.
Collapse
|
8
|
Lorenc MT, Hayashi S, Stiller J, Lee H, Manoli S, Ruperao P, Visendi P, Berkman PJ, Lai K, Batley J, Edwards D. Discovery of Single Nucleotide Polymorphisms in Complex Genomes Using SGSautoSNP. BIOLOGY 2012; 1:370-82. [PMID: 24832230 PMCID: PMC4009776 DOI: 10.3390/biology1020370] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Revised: 08/09/2012] [Accepted: 08/10/2012] [Indexed: 01/01/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are becoming the dominant form of molecular marker for genetic and genomic analysis. The advances in second generation DNA sequencing provide opportunities to identify very large numbers of SNPs in a range of species. However, SNP identification remains a challenge for large and polyploid genomes due to their size and complexity. We have developed a pipeline for the robust identification of SNPs in large and complex genomes using Illumina second generation DNA sequence data and demonstrated this by the discovery of SNPs in the hexaploid wheat genome. We have developed a SNP discovery pipeline called SGSautoSNP (Second-Generation Sequencing AutoSNP) and applied this to discover more than 800,000 SNPs between four hexaploid wheat cultivars across chromosomes 7A, 7B and 7D. All SNPs are presented for download and viewing within a public GBrowse database. Validation suggests an accuracy of greater than 93% of SNPs represent polymorphisms between wheat cultivars and hence are valuable for detailed diversity analysis, marker assisted selection and genotyping by sequencing. The pipeline produces output in GFF3, VCF, Flapjack or Illumina Infinium design format for further genotyping diverse populations. As well as providing an unprecedented resource for wheat diversity analysis, the method establishes a foundation for high resolution SNP discovery in other large and complex genomes.
Collapse
Affiliation(s)
- Michał T Lorenc
- Australian Centre for Plant Functional Genomics, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.
| | - Satomi Hayashi
- Centre for Integrative Legume Research, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.
| | - Jiri Stiller
- CSIRO Plant Industry, Brisbane, QLD 4072, Australia.
| | - Hong Lee
- Australian Centre for Plant Functional Genomics, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.
| | - Sahana Manoli
- Australian Centre for Plant Functional Genomics, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.
| | - Pradeep Ruperao
- Australian Centre for Plant Functional Genomics, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.
| | - Paul Visendi
- Australian Centre for Plant Functional Genomics, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.
| | | | - Kaitao Lai
- Australian Centre for Plant Functional Genomics, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.
| | - Jacqueline Batley
- Centre for Integrative Legume Research, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.
| | - David Edwards
- Australian Centre for Plant Functional Genomics, School of Agriculture and Food Science, University of Queensland, Brisbane, QLD 4072, Australia.
| |
Collapse
|
9
|
Hayward A, Mason AS, Dalton-Morgan J, Zander M, Edwards D, Batley J. SNP discovery and applications in Brassica napus. ACTA ACUST UNITED AC 2012. [DOI: 10.5010/jpb.2012.39.1.049] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
10
|
Lai K, Berkman PJ, Lorenc MT, Duran C, Smits L, Manoli S, Stiller J, Edwards D. WheatGenome.info: an integrated database and portal for wheat genome information. PLANT & CELL PHYSIOLOGY 2012; 53:e2. [PMID: 22009731 DOI: 10.1093/pcp/pcr141] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.
Collapse
Affiliation(s)
- Kaitao Lai
- School of Agriculture and Food Sciences and Australian Centre for Plant Functional Genomics, University of Queensland, Brisbane, QLD 4072, Australia
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Paux E, Sourdille P, Mackay I, Feuillet C. Sequence-based marker development in wheat: advances and applications to breeding. Biotechnol Adv 2011; 30:1071-88. [PMID: 21989506 DOI: 10.1016/j.biotechadv.2011.09.015] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Revised: 08/24/2011] [Accepted: 09/25/2011] [Indexed: 01/04/2023]
Abstract
In the past two decades, the wheat community has made remarkable progress in developing molecular resources for breeding. A wide variety of molecular tools has been established to accelerate genetic and physical mapping for facilitating the efficient identification of molecular markers linked to genes and QTL of agronomic interest. Already, wheat breeders are benefiting from a wide range of techniques to follow the introgression of the most favorable alleles in elite material and develop improved varieties. Breeders soon will be able to take advantage of new technological developments based on Next Generation Sequencing. In this paper, we review the molecular toolbox available to wheat scientists and breeders for performing fundamental genomic studies and breeding. Special emphasis is given on the production and detection of single nucleotide polymorphisms (SNPs) that should enable a step change in saturating the wheat genome for more efficient genetic studies and for the development of new selection methods. The perspectives offered by the access to an ordered full genome sequence for further marker development and enhanced precision breeding is also discussed. Finally, we discuss the advantages and limitations of marker-assisted selection for supporting wheat improvement.
Collapse
Affiliation(s)
- Etienne Paux
- INRA-UBP 1095, Genetics Diversity and Ecophysiology of Cereals, 234 Avenue du Brézet, Clermont-Ferrand, France
| | | | | | | |
Collapse
|
12
|
Duran C, Eales D, Marshall D, Imelfort M, Stiller J, Berkman PJ, Clark T, McKenzie M, Appleby N, Batley J, Basford K, Edwards D. Future tools for association mapping in crop plants. Genome 2011; 53:1017-23. [PMID: 21076517 DOI: 10.1139/g10-057] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Association mapping currently relies on the identification of genetic markers. Several technologies have been adopted for genetic marker analysis, with single nucleotide polymorphisms (SNPs) being the most popular where a reasonable quantity of genome sequence data are available. We describe several tools we have developed for the discovery, annotation, and visualization of molecular markers for association mapping. These include autoSNPdb for SNP discovery from assembled sequence data; TAGdb for the identification of gene specific paired read Illumina GAII data; CMap3D for the comparison of mapped genetic and physical markers; and BAC and Gene Annotator for the online annotation of genes and genomic sequences.
Collapse
Affiliation(s)
- Chris Duran
- University of Queensland, Australian Centre for Plant Functional Genomics, School of Land, Crop and Food Sciences, Brisbane, Australia
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Orsini L, Jansen M, Souche EL, Geldof S, De Meester L. Single nucleotide polymorphism discovery from expressed sequence tags in the waterflea Daphnia magna. BMC Genomics 2011; 12:309. [PMID: 21668940 PMCID: PMC3146954 DOI: 10.1186/1471-2164-12-309] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2010] [Accepted: 06/13/2011] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Daphnia (Crustacea: Cladocera) plays a central role in standing aquatic ecosystems, has a well known ecology and is widely used in population studies and environmental risk assessments. Daphnia magna is, especially in Europe, intensively used to study stress responses of natural populations to pollutants, climate change, and antagonistic interactions with predators and parasites, which have all been demonstrated to induce micro-evolutionary and adaptive responses. Although its ecology and evolutionary biology is intensively studied, little is known on the functional genomics underpinning of phenotypic responses to environmental stressors. The aim of the present study was to find genes expressed in presence of environmental stressors, and target such genes for single nucleotide polymorphic (SNP) marker development. RESULTS We developed three expressed sequence tag (EST) libraries using clonal lineages of D. magna exposed to ecological stressors, namely fish predation, parasite infection and pesticide exposure. We used these newly developed ESTs and other Daphnia ESTs retrieved from NCBI GeneBank to mine for SNP markers targeting synonymous as well as non synonymous genetic variation. We validate the developed SNPs in six natural populations of D. magna distributed at regional scale. CONCLUSIONS A large proportion (47%) of the produced ESTs are Daphnia lineage specific genes, which are potentially involved in responses to environmental stress rather than to general cellular functions and metabolic activities, or reflect the arthropod's aquatic lifestyle. The characterization of genes expressed under stress and the validation of their SNPs for population genetic study is important for identifying ecologically responsive genes in D. magna.
Collapse
Affiliation(s)
- Luisa Orsini
- Laboratory of Aquatic Ecology and Evolutionary Biology, K,U, Leuven, Ch, Deberiotstraat 32, 3000 Leuven, Belgium.
| | | | | | | | | |
Collapse
|
14
|
Guerrero D, Bautista R, Villalobos DP, Cantón FR, Claros MG. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences. Algorithms Mol Biol 2010; 5:24. [PMID: 20525162 PMCID: PMC2902484 DOI: 10.1186/1748-7188-5-24] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Accepted: 06/02/2010] [Indexed: 01/09/2023] Open
Abstract
Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer.
Collapse
|
15
|
Wang J, Zou Q, Guo MZ. Mining SNPs from EST sequences using filters and ensemble classifiers. GENETICS AND MOLECULAR RESEARCH 2010; 9:820-34. [PMID: 20449815 DOI: 10.4238/vol9-2gmr765] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method to rapidly find true SNPs in public-available EST (expressed sequence tag) databases; this method is implemented as SNPDigger. EST sequences are clustered and aligned. SNP candidates are then obtained according to a measure of redundant frequency. Several new informative biological features, such as the structural neighbor profiles and the physical position of the SNP, were extracted from EST sequences, and the effectiveness of these features was demonstrated. An ensemble classifier, which employs a carefully selected feature set, was included for the imbalanced training data. The sensitivity and specificity of our method both exceeded 80% for human genetic data in the cross validation. Our method enables detection of SNPs from the user's own EST dataset and can be used on species for which there is no genome data. Our tests showed that this method can effectively guide SNP discovery in ESTs and will be useful to avoid and save the cost of biological analyses.
Collapse
Affiliation(s)
- J Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, P.R. China
| | | | | |
Collapse
|
16
|
Vasemägi A, Gross R, Palm D, Paaver T, Primmer CR. Discovery and application of insertion-deletion (INDEL) polymorphisms for QTL mapping of early life-history traits in Atlantic salmon. BMC Genomics 2010; 11:156. [PMID: 20210987 PMCID: PMC2838853 DOI: 10.1186/1471-2164-11-156] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 03/08/2010] [Indexed: 01/17/2023] Open
Abstract
Background For decades, linkage mapping has been one of the most powerful and widely used approaches for elucidating the genetic architecture of phenotypic traits of medical, agricultural and evolutionary importance. However, successful mapping of Mendelian and quantitative phenotypic traits depends critically on the availability of fast and preferably high-throughput genotyping platforms. Several array-based single nucleotide polymorphism (SNP) genotyping platforms have been developed for genetic model organisms during recent years but most of these methods become prohibitively expensive for screening large numbers of individuals. Therefore, inexpensive, simple and flexible genotyping solutions that enable rapid screening of intermediate numbers of loci (~75-300) in hundreds to thousands of individuals are still needed for QTL mapping applications in a broad range of organisms. Results Here we describe the discovery of and application of insertion-deletion (INDEL) polymorphisms for cost-efficient medium throughput genotyping that enables analysis of >75 loci in a single automated sequencer electrophoresis column with standard laboratory equipment. Genotyping of INDELs requires low start-up costs, includes few standard sample handling steps and is applicable to a broad range of species for which expressed sequence tag (EST) collections are available. As a proof of principle, we generated a partial INDEL linkage map in Atlantic salmon (Salmo salar) and rapidly identified a number of quantitative trait loci (QTLs) affecting early life-history traits that are expected to have important fitness consequences in the natural environment. Conclusions The INDEL genotyping enabled fast coarse-mapping of chromosomal regions containing QTL, thus providing an efficient means for characterization of genetic architecture in multiple crosses and large pedigrees. This enables not only the discovery of larger number of QTLs with relatively smaller phenotypic effect but also provides a cost-effective means for evaluation of the frequency of segregating QTLs in outbred populations which is important for further understanding how genetic variation underlying phenotypic traits is maintained in the wild.
Collapse
Affiliation(s)
- Anti Vasemägi
- Department of Biology, 20014, University of Turku, Finland.
| | | | | | | | | |
Collapse
|
17
|
Yang CH, Chuang LY, Cheng YH, Wen CH, Chang HW. Dynamic programming for single nucleotide polymorphism ID identification in systematic association studies. Kaohsiung J Med Sci 2010; 25:165-76. [PMID: 19502133 DOI: 10.1016/s1607-551x(09)70057-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) play an important role in personalized medicine. However, the SNP data reported in many association studies provide only the SNP nucleotide/amino acid position, without providing the SNP ID recorded in National Center for Biotechnology Information databases. A tool with the ability to provide SNP ID identification, with a user-friendly interface, is needed. In this paper, a dynamic programming algorithm was used to compare homologs when the processed input sequence is aligned with the SNP FASTA database. Our novel system provides a web-based tool that uses the National Center for Biotechnology Information dbSNP database, which provides SNP sequence identification and SNP FASTA formats. Freely selectable sequence formats for alignment can be used, including general sequence formats (ACGT, [dNTP1/dNTP2] or IUPAC formats) and orientation with bidirectional sequence matching. In contrast to the National Center for Biotechnology Information SNP-BLAST, the proposed system always provides the correct targeted SNP ID (SNP hit), as well as nearby SNPs (flanking hits), arranged in their chromosomal order and contig positions. The system also solves problems inherent in SNP-BLAST, which cannot always provide the correct SNP ID for a given input sequence. Therefore, this system constitutes a novel application which uses dynamic programming to identify SNP IDs from the literature and keyed-in sequences for systematic association studies. It is freely available at http://bio.kuas.edu.tw/SNPosition/.
Collapse
Affiliation(s)
- Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Applied Sciences, Kaohsiung, Taiwan
| | | | | | | | | |
Collapse
|
18
|
Kim C, Yoon U, Lee G, Park S, Seol YJ, Lee H, Hahn J. An integrated database to enhance the identification of SNP markers for rice. Bioinformation 2009; 4:269-70. [PMID: 20975922 PMCID: PMC2951715 DOI: 10.6026/97320630004269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2009] [Accepted: 12/15/2009] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED The National Academy of Agricultural Science (NAAS) has developed a web-based marker database to provide information about SNP markers in rice. The database consists of three major functional categories: map viewing, marker searching and gene annotation. It provides 12,829 SNP markers information including gene location information on 12 chromosomes in rice. The annotation of SNP marker provides information such as marker name, EST number, gene definition and general marker information. Users are assisted in tracing any new structures of the chromosomes and gene positional functions using specific SNP markers. AVAILABILITY The database is available for free at http://nabic.niab.go.kr/SNP/
Collapse
Affiliation(s)
- Changkug Kim
- Genomics Division, National Academy of Agricultural Science (NAAS), Suwon 441-707, Korea
| | | | | | | | | | | | | |
Collapse
|
19
|
"PolyMin": software for identification of the minimum number of polymorphisms required for haplotype and genotype differentiation. BMC Bioinformatics 2009; 10:176. [PMID: 19515225 PMCID: PMC2707369 DOI: 10.1186/1471-2105-10-176] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Accepted: 06/10/2009] [Indexed: 11/10/2022] Open
Abstract
Background Analysis of allelic variation for relevant genes and monitoring chromosome segment transmission during selection are important approaches in plant breeding and ecology. To minimize the number of required molecular markers for this purpose is crucial due to cost and time constraints. To date, software for identification of the minimum number of required markers has been optimized for human genetics and is only partly matching the needs of plant scientists and breeders. In addition, different software packages with insufficient interoperability need to be combined to extract this information from available allele sequence data, resulting in an error-prone multi-step process of data handling. Results PolyMin, a computer program combining the detection of a minimum set of single nucleotide polymorphisms (SNPs) and/or insertions/deletions (INDELs) necessary for allele differentiation with the subsequent genotype differentiation in plant populations has been developed. Its efficiency in finding minimum sets of polymorphisms is comparable to other available program packages. Conclusion A computer program detecting the minimum number of SNPs for haplotype discrimination and subsequent genotype differentiation has been developed, and its performance compared to other relevant software. The main advantages of PolyMin, especially for plant scientists, is the integration of procedures from sequence analysis to polymorphism selection within a single program, including both haplotype and genotype differentiation.
Collapse
|
20
|
Jayashree B, Bhanuprakash A, Jami A, Reddy PS, Nayak S, Varshney RK. Perl module and PISE wrappers for the integrated analysis of sequence data and SNP features. BMC Res Notes 2009; 2:92. [PMID: 19463194 PMCID: PMC2694205 DOI: 10.1186/1756-0500-2-92] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 05/24/2009] [Indexed: 11/13/2022] Open
Abstract
Background There is a need for software scripts and modules for format parsing, data manipulation, statistical analysis and annotation especially for tasks related to marker identification from sequence data and sequence diversity analysis. Results Here we present several new Perl scripts and a module for sequence data diversity analysis. To enable the use of these software with other public domain tools, we also make available PISE (Pasteur Institute Software Environment) wrappers for these Perl scripts and module. This enables the user to generate pipelines for automated analysis, since PISE is a web interface generator for bioinformatics programmes. Conclusion A new set of modules and scripts for diversity statistic calculation, format parsing and data manipulation are available with PISE wrappers that enable pipelining of these scripts with commonly used contig assembly and sequence feature prediction software, to answer specific sequence diversity related questions.
Collapse
Affiliation(s)
- B Jayashree
- Bioinformatics Unit, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru 502 324, Andhra Pradesh, India.
| | | | | | | | | | | |
Collapse
|
21
|
|
22
|
Mining SNPs from DNA sequence data; computational approaches to SNP discovery and analysis. Methods Mol Biol 2009; 578:73-91. [PMID: 19768587 DOI: 10.1007/978-1-60327-411-1_4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation and are the basis for most molecular markers. Before these SNPs can be used for direct sequence-based SNP detection or in a derived SNP assay, they need to be identified. For those regions or species where no validated SNPs are available in the public databases, a good alternative is to mine them from DNA sequences. The alignment of multiple sequence fragments originating from different genotypes representing the same region on the genome will allow for the discovery of sequence variants. The corresponding nucleotide mismatches are likely to be SNPs or insertions/deletions. A large amount of sequence data to be mined is present in the public databases (both expressed sequence tags and genomic sequences) and is free to use without having to do large-scale sequencing oneself. However, with the appearance of the next-generation sequencing machines (Roche GS/454, Illumina GA/Solexa, SOLiD), high-throughput sequencing is becoming widely available. This will allow for the sequencing of polymorphic genotypes on specific target areas and consequent SNP identification. In this paper we discuss the bioinformatics tools required to analyze DNA sequence data for SNP mining. A general approach for the consecutive steps in the mining process is described and commonly used SNP discovery pipelines are presented.
Collapse
|
23
|
Appleby N, Edwards D, Batley J. New technologies for ultra-high throughput genotyping in plants. Methods Mol Biol 2009; 513:19-39. [PMID: 19347650 DOI: 10.1007/978-1-59745-427-8_2] [Citation(s) in RCA: 88] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Molecular genetic markers represent one of the most powerful tools for the analysis of plant genomes and the association of heritable traits with underlying genetic variation. Molecular marker technology has developed rapidly over the last decade, with the development of high-throughput genotyping methods. Two forms of sequence-based marker, simple sequence repeats (SSRs), also known as microsatellites and single nucleotide polymorphisms (SNPs) now predominate applications in modern plant genetic analysis, along the anonymous marker systems such as amplified fragment length polymorphisms (AFLPs) and diversity array technology (DArT). The reducing cost of DNA sequencing and increasing availability of large sequence data sets permits the mining of this data for large numbers of SSRs and SNPs. These may then be used in applications such as genetic linkage analysis and trait mapping, diversity analysis, association studies and marker-assisted selection. Here, we describe automated methods for the discovery of molecular markers and new technologies for high-throughput, low-cost molecular marker genotyping. Genotyping examples include multiplexing of SSRs using Multiplex-Ready marker technology (MRT); DArT genotyping; SNP genotyping using the Invader assay, the single base extension (SBE), oligonucleotide ligation assay (OLA) SNPlex system, and Illumina GoldenGate and Infinium methods.
Collapse
Affiliation(s)
- Nikki Appleby
- Australian Centre for Plant Functional Genomics, Institute for Molecular Biosciences and School of Land, Crop and Food Sciences, University of Queensland, Brisbane, Australia
| | | | | |
Collapse
|
24
|
Abstract
Molecular genetic markers represent one of the most powerful tools for the analysis of genomes and the association of heritable traits with underlying genetic variation. The development of high-throughput methods for the detection of single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) has led to a revolution in their use as molecular markers. The availability of large sequence data sets permits mining for these molecular markers, which may then be used for applications such as genetic trait mapping, diversity analysis and marker assisted selection in agriculture. Here we describe web-based automated methods for the discovery of SSRs using SSR taxonomy tree, the discovery of SNPs from sequence data using SNPServer and the identification of validated SNPs from within the dbSNP database. SSR taxonomy tree identifies pre-determined SSR amplification primers for virtually all species represented within the GenBank database. SNPServer uses a redundancy based approach to identify SNPs within DNA sequences. Following submission of a sequence of interest, SNPServer uses BLAST to identify similar sequences, CAP3 to cluster and assemble these sequences and then the SNP discovery software autoSNP to detect SNPs and insertion/deletion (indel) polymorphisms. The NCBI dbSNP database is a catalogue of molecular variation, hosting validated SNPs for several species within a public-domain archive.
Collapse
Affiliation(s)
- Jacqueline Batley
- Australian Centre for Plant Functional Genomics, School of Land, Crop and Food Sciences, University of Queensland, Brisbane, Australia
| | | |
Collapse
|
25
|
Duran C, Appleby N, Clark T, Wood D, Imelfort M, Batley J, Edwards D. AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants. Nucleic Acids Res 2008; 37:D951-3. [PMID: 18854357 PMCID: PMC2686484 DOI: 10.1093/nar/gkn650] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) may be considered the ultimate genetic marker as they represent the finest resolution of a DNA sequence (a single nucleotide), are generally abundant in populations and have a low mutation rate. Analysis of assembled EST sequence data provides a cost-effective means to identify large numbers of SNPs associated with functional genes. We have developed an integrated SNP discovery pipeline, which identifies SNPs from assembled EST sequences. The results are maintained in a custom relational database along with EST source and annotation information. The current database hosts data for the important crops rice, barley and Brassica. Users may rapidly identify polymorphic sequences of interest through BLAST sequence comparison, keyword searches of annotations derived from UniRef90 and GenBank comparisons, GO annotations or in genes corresponding to syntenic regions of reference genomes. In addition, SNPs between specific varieties may be identified for targeted mapping and association studies. SNPs are viewed using a user-friendly graphical interface. The database is freely accessible at http://autosnpdb.qfab.org.au/.
Collapse
Affiliation(s)
- Chris Duran
- Australian Centre for Plant Functional Genomics, Brisbane, School of Land, Crop and Food Sciences, University of Queensland, Brisbane, QLD 4072, Australia
| | | | | | | | | | | | | |
Collapse
|
26
|
Tang J, Leunissen JAM, Voorrips RE, van der Linden CG, Vosman B. HaploSNPer: a web-based allele and SNP detection tool. BMC Genet 2008; 9:23. [PMID: 18307806 PMCID: PMC2288614 DOI: 10.1186/1471-2156-9-23] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2007] [Accepted: 02/28/2008] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Single nucleotide polymorphisms (SNPs) and small insertions or deletions (indels) are the most common type of polymorphisms and are frequently used for molecular marker development. Such markers have become very popular for all kinds of genetic analysis, including haplotype reconstruction. Haplotypes can be reconstructed for whole chromosomes but also for specific genes, based on the SNPs present. Haplotypes in the latter context represent the different alleles of a gene. The computational approach to SNP mining is becoming increasingly popular because of the continuously increasing number of sequences deposited in databases, which allows a more accurate identification of SNPs. Several software packages have been developed for SNP mining from databases. From these, QualitySNP is the only tool that combines SNP detection with the reconstruction of alleles, which results in a lower number of false positive SNPs and also works much faster than other programs. We have build a web-based SNP discovery and allele detection tool (HaploSNPer) based on QualitySNP. RESULTS HaploSNPer is a flexible web-based tool for detecting SNPs and alleles in user-specified input sequences from both diploid and polyploid species. It includes BLAST for finding homologous sequences in public EST databases, CAP3 or PHRAP for aligning them, and QualitySNP for discovering reliable allelic sequences and SNPs. All possible and reliable alleles are detected by a mathematical algorithm using potential SNP information. Reliable SNPs are then identified based on the reconstructed alleles and on sequence redundancy. CONCLUSION Thorough testing of HaploSNPer (and the underlying QualitySNP algorithm) has shown that EST information alone is sufficient for the identification of alleles and that reliable SNPs can be found efficiently. Furthermore, HaploSNPer supplies a user friendly interface for visualization of SNP and alleles. HaploSNPer is available from http://www.bioinformatics.nl/tools/haplosnper/.
Collapse
Affiliation(s)
- Jifeng Tang
- Laboratory of Bioinformatics, Wageningen University, PO Box 8128, 6700 ET Wageningen, The Netherlands.
| | | | | | | | | |
Collapse
|
27
|
Sullivan JC, Reitzel AM, Finnerty JR. Upgrades to StellaBase facilitate medical and genetic studies on the starlet sea anemone, Nematostella vectensis. Nucleic Acids Res 2007; 36:D607-11. [PMID: 17982171 PMCID: PMC2238866 DOI: 10.1093/nar/gkm941] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The starlet sea anemone, Nematostella vectensis, is a basal metazoan organism that has recently emerged as an important model system in developmental biology and evolutionary genomics. StellaBase, the Nematostella Genomics Database (http://stellabase.org), was developed in 2005 as a resource to support the Nematostella research community. Recently, it has become apparent that Nematostella may be a particularly useful system for studying (i) microevolutionary variation in natural populations, and (ii) the functional evolution of human disease genes. We have developed two new databases that will foster such studies: StellaBase Disease (http://stellabase.org/disease) is a relational database that houses 155 904 invertebrate homologous isoforms of human disease genes from four leading genomic model systems (fly, worm, yeast and Nematostella), including 14 874 predicted genes from the sea anemone itself. StellaBase SNP (http://stellabase.org/SNP) is a relational database that describes the location and underlying type of mutation for 20 063 single nucleotide polymorphisms.
Collapse
|
28
|
Pumpernik D, Oblak B, Borstnik B. Replication slippage versus point mutation rates in short tandem repeats of the human genome. Mol Genet Genomics 2007; 279:53-61. [PMID: 17926066 DOI: 10.1007/s00438-007-0294-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2007] [Accepted: 09/19/2007] [Indexed: 10/22/2022]
Abstract
Short tandem repeats (STRs) are subjected to two kinds of mutational modifications: point mutations and replication slippages. The latter is found to be the more frequent cause of STR modifications, but a satisfactory quantitative measure of the ratio of the two processes has yet to be determined. The comparison of entire genome sequences of closely enough related species enables one to obtain sufficient statistics by counting the differences in the STR regions. We analyzed human-chimpanzee DNA sequence alignments to obtain the counts of point mutations and replication slippage modifications. The results were compared with the results of a computer simulation, and the parameters quantifying the replication slippage probability as well as the probabilities of point mutations within the repeats were determined. It was found that within the STRs with repeated units consisting of one, two or three nucleotides, point mutations occur approximately twice as frequently as one would expect on the basis of the 1.2% difference between the human and chimpanzee genomes. As expected, the replication slippage probability is negligible below a 10-bp threshold and grows above this level. The replication slippage events outnumber the point mutations by one or two orders of magnitude, but are still lower by one order of magnitude relative to the mutability of the markers that are used for genotyping purposes.
Collapse
Affiliation(s)
- Danilo Pumpernik
- National Institute of Chemistry, Hajdrihova 19, 1001, Ljubljana, Slovenia
| | | | | |
Collapse
|
29
|
Mitchell RAC, Castells-Brooke N, Taubert J, Verrier PJ, Leader DJ, Rawlings CJ. Wheat Estimated Transcript Server (WhETS): a tool to provide best estimate of hexaploid wheat transcript sequence. Nucleic Acids Res 2007; 35:W148-51. [PMID: 17439966 PMCID: PMC1933201 DOI: 10.1093/nar/gkm220] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Wheat biologists face particular problems because of the lack of genomic sequence and the three homoeologous genomes which give rise to three very similar forms for many transcripts. However, over 1.3 million available public-domain Triticeae ESTs (of which ∼850 000 are wheat) and the full rice genomic sequence can be used to estimate likely transcript sequences present in any wheat cDNA sample to which PCR primers may then be designed. Wheat Estimated Transcript Server (WhETS) is designed to do this in a convenient form, and to provide information on the number of matching EST and high quality cDNA (hq-cDNA) sequences, tissue distribution and likely intron position inferred from rice. Triticeae EST and hq-cDNA sequences are mapped onto rice loci and stored in a database. The user selects a rice locus (directly or via Arabidopsis) and the matching Triticeae sequences are assembled according to user-defined filter and stringency settings. Assembly is achieved initially with the CAP3 program and then with a single nucleotide polymorphism (SNP)-analysis algorithm designed to separate homoeologues. Alignment of the resulting contigs and singlets against the rice template sequence is then displayed. Sequences and assembly details are available for download in fasta and ace formats, respectively. WhETS is accessible at http://www4.rothamsted.bbsrc.ac.uk/whets.
Collapse
Affiliation(s)
- Rowan A C Mitchell
- Biomathematics and Bioinformatics Division, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UK.
| | | | | | | | | | | |
Collapse
|
30
|
Batley J, Jewell E, Edwards D. Automated discovery of single nucleotide polymorphism and simple sequence repeat molecular genetic markers. Methods Mol Biol 2007; 406:473-94. [PMID: 18287708 DOI: 10.1007/978-1-59745-535-0_23] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Molecular genetic markers represent one of the most powerful tools for the analysis of genomes. Molecular marker technology has developed rapidly over the last decade, and two forms of sequence-based markers, simple sequence repeats (SSRs), also known as microsatellites, and single nucleotide polymorphisms (SNPs), now predominate applications in modern genetic analysis. The availability of large sequence data sets permits mining for SSRs and SNPs, which may then be applied to genetic trait mapping and marker-assisted selection. Here, we describe Web-based automated methods for the discovery of these SSRs and SNPs from sequence data. SSRPrimer enables the real-time discovery of SSRs within submitted DNA sequences, with the concomitant design of PCR primers for SSR amplification. Alternatively, users may browse the SSR Taxonomy Tree to identify predetermined SSR amplification primers for any species represented within the GenBank database. SNPServer uses a redundancy-based approach to identify SNPs within DNA sequence data. Following submission of a sequence of interest, SNPServer uses BLAST to identify similar sequences, CAP3 to cluster and assemble these sequences, and then the SNP discovery software autoSNP to detect SNPs and insertion/deletion (indel) polymorphisms.
Collapse
Affiliation(s)
- Jacqueline Batley
- Australian Centre for Plant Functional Genomics, School of Land, Crop and Food Sciences and ARC Centre of Excellence for Intergrative Legume Research, CILR, The University of Queensland, Brisbane, Australia
| | | | | |
Collapse
|
31
|
Abstract
Genomics and bioinformatics have great potential to help address numerous topics in ecology and evolution. Expressed sequence tags (ESTs) can bridge genomics and molecular ecology because they can provide a means of accessing the gene space of almost any organism. We review how ESTs have been used in molecular ecology research in the last several years by providing sequence data for the design of molecular markers, genome-wide studies of gene expression and selection, the identification of candidate genes underlying adaptation, and the basis for studies of gene family and genome evolution. Given the tremendous recent advances in inexpensive sequencing technologies, we predict that molecular ecologists will increasingly be developing and using EST collections in the years to come. With this in mind, we close our review by discussing aspects of EST resource development of particular relevance for molecular ecologists.
Collapse
Affiliation(s)
- Amy Bouck
- Department of Biology, Box 90338, Duke University, Durham, NC 27708, USA.
| | | |
Collapse
|